ä¸çªæåã«æ³¨æç¹ã1ã¤
ä¿åãããµã¤ããå¤é¨CDNãªã©ãå©ç¨ãã¦ããå ´åã¯ãwgetãããã¡ã¤ã«ããã¼ã«ã«ãã¡ã¤ã«ã·ã¹ãã ä¸ã§ã¯ãªããWebãµã¼ãçµç±ã§é²è¦§ããªãã¨é¨åçã«æ£ãã表示ããã¾ãããfile://
ã¨ãã¦Webãã©ã¦ã¶ã§éããå ´åã¯æ£ããåä½ããªãããã§ãã
åæ©
æ¹æ³
wget
ãç¨ãã¾ãããªãã·ã§ã³ãããããã£ããå¤ãã®ã§ãã©ã®ãªãã·ã§ã³ã使ãã®ããæ¤è¨ãã¾ããæ«å°¾ã« 1.20.1
ã® --help
ã§è¡¨ç¤ºããããã«ããæ·»ä»ãã¾ãã®ã§ï¼é·ãã§ããï¼ãåç
§ãã ããã
追è¨çµè«: çµå±ããã¡ãã®ã³ãã³ãã®ã»ããè¯ããã
- ããããããã£ãçµæã@suin ããã示ãã¦ããã³ãã³ãï¼ããã¼ã¹ã«ãã¦ï¼ãæã£åãæ©ããã§ãâ¦â¦
- åå¾æ¼ããè¦ãæããªãã£ãã®ã§â¦â¦
- ã§ãã§ããã³ãã³ãã次ã®ã¨ããã§ã
$ wget --mirror --page-requisites --span-hosts --show-progress --no-parent --convert-links --adjust-extension --execute robots=off --verbose --output-file=wget_log.log --tries=2 --no-if-modified-since --random-wait --waitretry=5 --backup-converted --domain=example.com https://foobar.example.com/
æåã«åºããçµè«
çµè«ã§ãããç§ã®å ´åã¯ä»¥ä¸ã®ãããªã³ãã³ãã«ãªãã¾ããã
$ wget --output-file=wget_log.log --verbose --tries=2 --timestamping --no-if-modified-since --random-wait --waitretry=5 --adjust-extension --referer="http://www.geocities.co.jp/" --recursive --level=5 --convert-links --backup-converted --page-requisites "http://www.geocities.co.jp/FOOBAR/"
ã²ã¨ã¤ã²ã¨ã¤èª¬æãã¾ãã
--output-file
ãã°ãæ示çã«æ®ãããã§ãããã®ãªãã·ã§ã³ãæå®ããã¨æ¨æºåºåã«ã¯ãã°ãåºåãããªããªãã¾ãã
--verbose
ãã°ãã©ãã ã詳細ã«åºããã§ããããã©ã«ãã®ã¾ã¾ã§ããã§ããããæ示çã«æå®ãã¦ããã¾ãã
--tries
ãªãã©ã¤ãããç¶æ³ã«ãªã£ãå ´åãæé«ã§ä½åãªãã©ã¤ããããæå®ãã¾ãããªãã©ã¤ãããã¨ã§æ£ããä¿åã§ãããã¨ã¨ããã®ã¯ç¾ä»£ã§ã¯ãã¾ããªãã¨æãã¾ããã念ã®ããã«ã¤ãã¦ãã¾ãã
--timestamping 㨠--no-if-modified-since
ãã¡ã¤ã«ã®ã¿ã¤ã ã¹ã¿ã³ããè¦ã¦ãååããã®æ´æ°ããããã®ã ããä¿åããããã®ãªãã·ã§ã³ã§ããç¶ç¶çã«ä¿åããªãã®ãªãã°å®è³ªçã«ã¯ä¸è¦ã§ãã
--random-wait
ãªã¯ã¨ã¹ãéã§ã®å¾ ã¡æéãã©ã³ãã ã®å¤ã«ãã¾ããåºå®ã«ãããã¨ãªã©ãã§ããã®ã§è©³ããã¯ãã«ããã覧ä¸ããã
--waitretry
ãªãã©ã¤ãããç¶æ³ã«ãªã£ãå ´åãããã§æå®ãããç§æ°ã ãå¾ ã£ã¦ãããªãã©ã¤ããã¾ãã
--adjust-extension
æ¡å¼µåãé©åãªååã«å¤æ´ãã¦ããã¾ãã
--referer
åç §å ãæ示çã«è¨å®ãã¾ããåç §å ãé©åã§ãªãã¨å¼¾ããããã¼ã¸ãç»åãªã©ã«å½¹ã«ç«ã¡ã¾ããå®éã«ã¯ã»ã¼æå³ã¯ãªãã¨æãã¾ããã念ã®ããã«ã¤ããæãã§ãã
--recursive 㨠--level
ãªã³ã¯ããã©ã£ã¦ããããã®ãªãã·ã§ã³ã§ãããå¿
é ã§ãã--level
ã¯ãã©ãé層æ°ã表ãã¾ãã0
ã§ç¡éã«ãã©ããã¨ã示ãã¦ãã¾ããããªã³ã¯ã®ãªã³ã¯ããã©ã£ã¦ãã£ã¦ãã¾ãæ°¸ä¹
ã«ã¼ãã«é¥ãå¯è½æ§ãããã®ã§ãããç¨åº¦æ§ãããªå¤ãæå®ããã»ããããã§ããã*1ã
ãã ãæ§ãããããã¨å¸æã®ã³ã³ãã³ãã«ãã©ãçããªããªããã¨ãããã¾ãããã㦠0
ã 㨠--convert-links
ãæ£ããå¦çãããªãç¾è±¡ã確èªãã¦ããã®ã§ãç¡éã«ã¼ãé²æ¢ã®æå³ãå«ã㦠0
ã¯æå®ããªãã»ããããããã§ãã
--convert-links
åãªã³ã¯ããã¼ã«ã«ã®ç¸å¯¾ãã¹ã«å¤æããã¾ããå¿ é ã§ãã
--backup-converted
ãã®ãªãã·ã§ã³ã¯æå¼·ã ã¨æãã¾ãã--convert-links
ãç¨ããã¨ãåãªã³ã¯ããã¼ã«ã«å
ã§ã®ç¸å¯¾ãã¹ã«å¤æããã¾ããããã®ãªãã·ã§ã³ãä»ä¸ãããã¨ã«ããå¤æåã®ãã¡ã¤ã«ãä¿åãã¦ãããä¸ãä¸ã®ã¨ãã®ä¿éºã¨ãã¦æç¨ã§ãã
--page-requisites
å¤é¨ãµã¤ãã®ç»åãå«ãããWebãã¼ã¸å ã®ç»åãåå¾ãã¾ããå¿ é ã§ãããã
è£è¶³
- æã®ãµã¤ãã«ä»ããã®ã®ãBBSãã¯ãã§ã«åå¨ããªããSPAMã ããã®ãã¨ããããã§ãããç¡éã«ã¼ãã«é¥ããããã®ã§ã
--exclude-directories=cgi,cgi-bin
ãªã©ãæå®ããã®ãããã§ã- å¦ã«æéãããã£ã¦ãããããªããã°ãè¦ã¦ãã®ç¹ãçã£ã¦ã¿ã¾ããã
- æã®ãµã¤ãã«ä»ããã®ã®ããã¬ã¼ã ãã使ããã¦ããã¨çµæ§å¤§å¤ã§ã
-follow-tags=frame
ãªãã·ã§ã³ã対象ãã¼ã¸ãã¨ã«å®è¡ããå¿ è¦ãããã¾ã- ããããããã«æã£ã¦ããããã«ã³ãã³ãã®å®è¡ãå¿ è¦ã«ãªãå ´åãããã¾ã
- æã®ãµã¤ãã«ä»ããã®ã®
~
ï¼ãã«ãï¼è¾¼ã¿ã® URL ãæ£ããåå¾ã§ããªãå ´åã¯--no-iri
ãã¤ããã¨æ¹åããå ´åãããã¾ã --mirror
ãªãã·ã§ã³ã¯å¾è¿°ã®ã¨ããã--timestamping --recursive --level=0 --no-remove-listing
ã¨åå¤ã§ã- ã¨ããWebãã¼ã¸ï¼ãµã¤ãï¼ã®HTMLã®ã¿ã°ããã¹ã¦å¤§æåã§æ¸ããã¦ããï¼
<A href>
ã<IMG src>
ã<P>
ãªã©ï¼ã®ã§ãããããã ã¨ãªã³ã¯ãæ¾ã£ã¦ãããªãã£ãã®ã§ããããã¨ãã¯ãæä¸ãã£ã½ãã§ã*2ãwget ãå é¨çã«ç¨ãã¦ããä»ã®ã¢ããªã±ã¼ã·ã§ã³ã§ãåãããæä¸ãã§ãããã- âãã®å¾ãã¿ã°ã®å¤§æåãåå ã§ç¡ããã¨ãåããã¾ããããã¨ãã« wget ã ã¨åå¾æ¼ãããããã¨ã¯èªèãã¦ããããã»ããè¯ãããã§ã
- ãã¦ã³ãã¼ãã® URL ãæå®ããéã«ã¯æ¥µåãã¡ã¤ã«åã¾ã§ãå«ããã»ããããã§ãï¼
index.html
ãªã©ã¾ã§ï¼- ç¶æ³*3ã«ãã£ã¦ã¯
--convert-links
ãæ£å¸¸ã«åä½ãã¾ããã§ãã--span-hosts
ã¨--level=2
ã¨--no-parent
ã¨-Dfoo.example.com,bar.example.com
ãä»ä¸ããããã¾ãããã¾ãããã確å®çã§ã¯ãªãã§ã-D
ãªãã·ã§ã³ã¯ãµããã¡ã¤ã³ãæå®ãã¦ãã ãï¼ï¼ãã¡ã¤ã³ã®ã¿ã対象ï¼ï¼
- ç¶æ³*3ã«ãã£ã¦ã¯
åè
wget --help (1.20.1)
$ wget --help GNU Wget 1.20.1, é対話çãããã¯ã¼ã¯è»¢éã½ãã 使ãæ¹: wget [ãªãã·ã§ã³]... [URL]... é·ããªãã·ã§ã³ã§ä¸å¯æ¬ ãªå¼æ°ã¯çããªãã·ã§ã³ã§ãä¸å¯æ¬ ã§ãã ã¹ã¿ã¼ãã¢ãã: -V, --version ãã¼ã¸ã§ã³æ å ±ã表示ãã¦çµäºãã -h, --help ãã®ãã«ãã表示ãã -b, --background ã¹ã¿ã¼ãå¾ã«ããã¯ã°ã©ã¦ã³ãã«ç§»è¡ãã -e, --execute=COMMAND `.wgetrc'å½¢å¼ã®ã³ãã³ããå®è¡ãã ãã°ã¨å ¥åãã¡ã¤ã«: -o, --output-file=FILE ãã°ã FILE ã«åºåãã -a, --append-output=FILE ã¡ãã»ã¼ã¸ã FILE ã«è¿½è¨ãã -q, --quiet ä½ãåºåããªã -v, --verbose åé·ãªåºåããã (ããã©ã«ã) -nv, --no-verbose åé·ã§ã¯ãªããã --report-speed=TYPE 帯åå¹ ã TYPE ã§åºåãã¾ããTYPE 㯠'bits' ãæå®ã§ãã¾ãã -i, --input-file=FILE FILE ã®ä¸ã«æå®ããã URL ããã¦ã³ãã¼ããã -F, --force-html å ¥åãã¡ã¤ã«ã HTML ã¨ãã¦æ±ã -B, --base=URL HTML ã§å ¥åããããã¡ã¤ã«(-i -F)ã®ãªã³ã¯ã æå®ãã URL ã®ç¸å¯¾ URL ã¨ãã¦æ±ã --config=FILE è¨å®ãã¡ã¤ã«ãæå®ãã --no-config è¨å®ãã¡ã¤ã«ãèªã¿ãã¾ãªã --rejected-log=FILE æå¦ãããçç±ããã° FILE ã«ä¿åãã ãã¦ã³ãã¼ã: -t, --tries=NUMBER ãªãã©ã¤åæ°ã®ä¸éãæå® (0 ã¯ç¡å¶é). --retry-connrefused æ¥ç¶ãæå¦ããã¦ããªãã©ã¤ãã --retry-on-http-error=ERRORS ã³ã³ãåºåãã§æå®ããHTTPã®ã¨ã©ã¼ã®å ´åãªãã©ã¤ãã -O, --output-document=FILE FILE ã«ææ¸ãæ¸ããã -nc, --no-clobber åå¨ãã¦ãããã¡ã¤ã«ããã¦ã³ãã¼ãã§ä¸æ¸ãããªã --no-netrc .netrc ããèªè¨¼æ å ±ãåå¾ããªã -c, --continue é¨åçã«ãã¦ã³ãã¼ããããã¡ã¤ã«ã®ç¶ãããå§ãã --start-pos=OFFSET OFFSET ãããã¦ã³ãã¼ããéå§ãã --progress=TYPE é²è¡è¡¨ç¤ºã²ã¼ã¸ã®ç¨®é¡ã TYPE ã«æå®ãã --show-progress ã©ã®ã¢ã¼ãã§ãé²æãã¼ã表示ãã -N, --timestamping ãã¼ã«ã«ã«ãããã¡ã¤ã«ãããæ°ãããã¡ã¤ã«ã ãåå¾ãã --no-if-modified-since ã¿ã¤ã ã¹ã¿ã³ãã¢ã¼ãã®æã«ã if-modified-since get ãªã¯ã¨ã¹ãã使ããªã --no-use-server-timestamps ãã¼ã«ã«å´ã®ãã¡ã¤ã«ã®ã¿ã¤ã ã¹ã¿ã³ãã« ãµã¼ãã®ãã®ã使ããªã -S, --server-response ãµã¼ãã®å¿çã表示ãã --spider ä½ããã¦ã³ãã¼ãããªã -T, --timeout=SECONDS å ¨ã¦ã®ã¿ã¤ã ã¢ã¦ãã SECONDS ç§ã«è¨å®ãã --dns-timeout=SECS DNS åãåããã®ã¿ã¤ã ã¢ã¦ãã SECS ç§ã«è¨å®ãã --connect-timeout=SECS æ¥ç¶ã¿ã¤ã ã¢ã¦ãã SECS ç§ã«è¨å®ãã --read-timeout=SECS èªã¿è¾¼ã¿ã¿ã¤ã ã¢ã¦ãã SECS ç§ã«è¨å®ãã -w, --wait=SECONDS ãã¦ã³ãã¼ãæ¯ã« SECONDS ç§å¾ 㤠--waitretry=SECONDS ãªãã©ã¤æ¯ã« 1ãSECONDS ç§å¾ 㤠--random-wait ãã¦ã³ãã¼ãæ¯ã« 0.5*WAITã1.5*WAIT ç§å¾ 㤠--no-proxy ããã¯ã·ã使ããªã -Q, --quota=NUMBER ãã¦ã³ãã¼ããããã¤ãæ°ã®ä¸éãæå®ãã --bind-address=ADDRESS ãã¼ã«ã«ã¢ãã¬ã¹ã¨ã㦠ADDRESS (ãã¹ãåã IP) ã使ã --limit-rate=RATE ãã¦ã³ãã¼ãé度ã RATE ã«å¶éãã --no-dns-cache DNS ã®åãåããçµæããã£ãã·ã¥ããªã --restrict-file-names=OS OS ã許ãã¦ãããã¡ã¤ã«åã«å¶éãã --ignore-case ãã¡ã¤ã«å/ãã£ã¬ã¯ããªåã®æ¯è¼ã§å¤§æåå°æåãç¡è¦ãã -4, --inet4-only IPv4 ã ãã使ã -6, --inet6-only IPv6 ã ãã使ã --prefer-family=FAMILY æå®ãããã¡ããª(IPv6, IPv4, none)ã§æåã«æ¥ç¶ãã --user=USER ftp, http ã®ã¦ã¼ã¶åãæå®ãã --password=PASS ftp, http ã®ãã¹ã¯ã¼ããæå®ãã --ask-password ãã¹ã¯ã¼ããå¥éå ¥åãã --use-askpass=COMMAND èªè¨¼æ å ±(ã¦ã¼ã¶åã¨ãã¹ã¯ã¼ã)ãåå¾ãããã³ãã©ãæå®ãã¾ãã COMMAND ãæå®ãããªãå ´åã¯ã ç°å¢å¤æ° WGET_ASKPASS ã SSH_ASKPASS ã 使ããã¾ãã --no-iri IRI ãµãã¼ãã使ããªã --local-encoding=ENC æå®ãã ENC ã IRI ã®ãã¼ã«ã«ã¨ã³ã³ã¼ãã£ã³ã°ã«ãã --remote-encoding=ENC æå®ãã ENC ãããã©ã«ãã®ãªã¢ã¼ãã¨ã³ã³ã¼ãã£ã³ã°ã«ãã --unlink ä¸æ¸ãããåã«ãã¡ã¤ã«ãåé¤ãã --xattr turn on storage of metadata in extended file attributes ãã£ã¬ã¯ããª: -nd, --no-directories ãã£ã¬ã¯ããªãä½ããªã -x, --force-directories ãã£ã¬ã¯ããªãå¼·å¶çã«ä½ã -nH, --no-host-directories ãã¹ãåã®ãã£ã¬ã¯ããªãä½ããªã --protocol-directories ãããã³ã«åã®ãã£ã¬ã¯ããªãä½ã -P, --directory-prefix=PREFIX ãã¡ã¤ã«ã PREFIX/ 以ä¸ã«ä¿åãã --cut-dirs=NUMBER ãªã¢ã¼ããã£ã¬ã¯ããªåã® NUMBER é層åãç¡è¦ãã HTTP ãªãã·ã§ã³: --http-user=USER http ã¦ã¼ã¶åã¨ã㦠USER ã使ã --http-password=PASS http ãã¹ã¯ã¼ãã¨ã㦠PASS ã使ã --no-cache ãµã¼ãããã£ãã·ã¥ãããã¼ã¿ã許å¯ããªã --default-page=NAME ããã©ã«ãã®ãã¼ã¸åã NAME ã«å¤æ´ãã¾ã é常㯠`index.html' ã§ã -E, --adjust-extension HTML/CSS ææ¸ã¯é©åãªæ¡å¼µåã§ä¿åãã --ignore-length `Content-Length' ããããç¡è¦ãã --header=STRING éä¿¡ãããããã« STRING ã追å ãã --compression=TYPE å§ç¸®ã¢ã«ã´ãªãºã ã®æå®: autoãgzipãnone(ããã©ã«ãã¯none) --max-redirect ãã¼ã¸ã§è¨±å¯ããæ大転éåæ° --proxy-user=USER ããã¯ã·ã¦ã¼ã¶åã¨ã㦠USER ã使ã --proxy-password=PASS ããã¯ã·ãã¹ã¯ã¼ãã¨ã㦠PASS ã使ã --referer=URL Referer ã URL ã«è¨å®ãã --save-headers HTTP ã®ãããããã¡ã¤ã«ã«ä¿åãã -U, --user-agent=AGENT User-Agent ã¨ã㦠Wget/VERSION ã§ã¯ãªã AGENT ã使ã --no-http-keep-alive HTTP ã® keep-alive (æç¶çæ¥ç¶) æ©è½ã使ããªã --no-cookies ã¯ããã¼ã使ããªã --load-cookies=FILE ã¯ããã¼ã FILE ããèªã¿ãã --save-cookies=FILE ã¯ããã¼ã FILE ã«ä¿åãã --keep-session-cookies ã»ãã·ã§ã³ã ãã§ç¨ããã¯ããã¼ãä¿æãã --post-data=STRING POST ã¡ã½ãããç¨ã㦠STRING ãéä¿¡ãã --post-file=FILE POST ã¡ã½ãããç¨ã㦠FILE ã®ä¸å³ãéä¿¡ãã --method=HTTPMethod "HTTPMethod" ããããã®ã¡ã½ããã¨ãã¦ä½¿ãã¾ã --body-data=STRING STRING ããã¼ã¿ã¨ãã¦éãã--method ãæå®ãã¦ãã ããã --body-file=FILE ãã¡ã¤ã«ã®ä¸å³ãéãã--method ãæå®ãã¦ãã ããã --content-disposition Content-Disposition ããããããã° ãã¼ã«ã«ã®ãã¡ã¤ã«åã¨ãã¦ç¨ãã (å®é¨ç) --content-on-error ãµã¼ãã¨ã©ã¼æã«åä¿¡ããå 容ãåºåãã --auth-no-challenge ãµã¼ãããã®ãã£ã¬ã³ã¸ãå¾ ããã«ã Basicèªè¨¼ã®æ å ±ãéä¿¡ãã¾ãã HTTPS (SSL/TLS) ãªãã·ã§ã³: --secure-protocol=PR ã»ãã¥ã¢ãããã³ã«ãé¸æãã (auto, SSLv2, SSLv3, TLSv1, TLSv1_1, TLSv1_2, PFS) --https-only å®å ¨ãª HTTPS ã®ãªã³ã¯ã ããã©ã --no-check-certificate ãµã¼ã証ææ¸ãæ¤è¨¼ããªã --certificate=FILE ã¯ã©ã¤ã¢ã³ã証ææ¸ã¨ã㦠FILE ã使ã --certificate-type=TYPE ã¯ã©ã¤ã¢ã³ã証ææ¸ã®ç¨®é¡ã TYPE (PEM, DER) ã«è¨å®ãã --private-key=FILE ç§å¯éµã¨ã㦠FILE ã使ã --private-key-type=TYPE ç§å¯éµã®ç¨®é¡ã TYPE (PEM, DER) ã«è¨å®ãã --ca-certificate=FILE CA 証ææ¸ã¨ã㦠FILE ã使ã --ca-directory=DIR CA ã®ããã·ã¥ãªã¹ããä¿æããã¦ãããã£ã¬ã¯ããªãæå®ãã --crl-file=FILE CRL ãã¡ã¤ã«ãæå®ãã --pinnedpubkey=FILE/HASHES å ¬ééµ (PEM/DER) ãã¡ã¤ã«ããããã¯ãbase64ã§ã¨ã³ã³ã¼ããã sha256ããã·ã¥å¤(sha256//ã§å§ã¾ãã»ãã³ãã³åºåã)ãæå®ãã¦ã ç¸æãèªè¨¼ãã¾ãã --random-file=FILE SSL PRNG ã®åæåãã¼ã¿ã«ä½¿ããã¡ã¤ã«ãæå®ãã --egd-file=FILE EGD ã½ã±ããã¨ã㦠FILE ã使ã --ciphers=STR GnuTLSã®åªå 度ãOpenSSLã®æå·ãªã¹ããç´æ¥æå®ãã 注æãã¦ä½¿ã£ã¦ãã ããã--secure-protocol ãä¸æ¸ããã¾ãã ãã©ã¼ããããææ³ã¯ SSL/TLS å®è£ ã«ä¾åãã¾ãã HSTS ãªãã·ã§ã³: --no-hsts HSTS ã使ããªã --hsts-file HSTS ãã¼ã¿ãã¼ã¹ã®ãã¹ (ããã©ã«ããä¸æ¸ã) FTP ãªãã·ã§ã³: --ftp-user=USER ftp ã¦ã¼ã¶ã¨ã㦠USER ã使ã --ftp-password=PASS ftp ãã¹ã¯ã¼ãã¨ã㦠PASS ã使ã --no-remove-listing `.listing' ãã¡ã¤ã«ãåé¤ããªã --no-glob FTP ãã¡ã¤ã«åã®ã°ãããç¡å¹ã«ãã --no-passive-ftp "passive" 転éã¢ã¼ãã使ããªã --preserve-permissions ãªã¢ã¼ãã®ãã¡ã¤ã«ãã¼ããã·ã§ã³ãä¿åãã --retr-symlinks å帰åå¾ä¸ã«ãã·ã³ããªãã¯ãªã³ã¯ã§ãªã³ã¯ãããå ã®ãã¡ã¤ã«ãåå¾ãã FTPS ãªãã·ã§ã³: --ftps-implicit implicit FTPS ã使ã (ããã©ã«ããã¼ã㯠990) --ftps-resume-ssl å¶å¾¡æ¥ç¶ã§éå§ãã SSL/TLS ã»ãã·ã§ã³ã ãã¼ã¿æ¥ç¶ã§åéãã --ftps-clear-data-connection å¶å¾¡ãã£ãã«ã ãæå·åãã(ãã¼ã¿ã¯å¹³æã«ãªã) --ftps-fallback-to-ftp ãµã¼ãã FTPS ã«å¯¾å¿ãã¦ããªãå ´å㯠FTP ã«ãã WARC ãªãã·ã§ã³: --warc-file=FILENAME ãªã¯ã¨ã¹ã/ã¬ã¹ãã³ã¹ãã¼ã¿ã .warc.gz ãã¡ã¤ã«ã«ä¿åãã --warc-header=STRING warcinfo record ã« STRING ã追å ãã --warc-max-size=NUMBER WARC ãã¡ã¤ã«ã®ãµã¤ãºã®æ大å¤ã NUMBER ã«è¨å®ãã --warc-cdx CDX ã¤ã³ããã¯ã¹ãã¡ã¤ã«ãæ¸ã --warc-dedup=FILENAME æå®ãã CDX ãã¡ã¤ã«ã«è¼ã£ã¦ãã record ã¯ä¿åããªã --no-warc-compression WARC ãã¡ã¤ã«ã GZIP ã§å§ç¸®ããªã --no-warc-digests SHA1 ãã¤ã¸ã§ã¹ããè¨ç®ããªã --no-warc-keep-log WARC record ã«ãã°ãã¡ã¤ã«ãä¿åããªã --warc-tempdir=DIRECTORY WARC æ¸è¾¼æã®ä¸æãã¡ã¤ã«ãç½®ããã£ã¬ã¯ããªãæå®ãã å帰ãã¦ã³ãã¼ã: -r, --recursive å帰ãã¦ã³ãã¼ããè¡ã -l, --level=NUMBER å帰æã®é層ã®æ大ã®æ·±ãã NUMBER ã«è¨å®ãã (0 ã§ç¡å¶é) --delete-after ãã¦ã³ãã¼ãçµäºå¾ããã¦ã³ãã¼ããããã¡ã¤ã«ãåé¤ãã -k, --convert-links HTML ã CSS ä¸ã®ãªã³ã¯ããã¼ã«ã«ãæãããã«å¤æ´ãã --convert-file-only URLã®ãã¡ã¤ã«åé¨åã ãå¤æãã (ããããbasename) --backups=N ãã¡ã¤ã«ã«æ¸ãããæã« N ãã¡ã¤ã«ã®ããã¯ã¢ããããã¼ãã¼ã·ã§ã³ããã -K, --backup-converted ãªã³ã¯å¤æåã®ãã¡ã¤ã«ã .orig ã¨ãã¦ä¿åãã -m, --mirror -N -r -l 0 --no-remove-listing ã®çç¥å½¢ -p, --page-requisites HTML ã表示ããã®ã«å¿ è¦ãªå ¨ã¦ã®ç»åçãåå¾ãã --strict-comments HTML ä¸ã®ã³ã¡ã³ãã®å¦çãå³å¯ã«ãã å帰ãã¦ã³ãã¼ãæã®ãã£ã«ã¿: -A, --accept=LIST ãã¦ã³ãã¼ãããæ¡å¼µåãã³ã³ãåºåãã§æå®ãã -R, --reject=LIST ãã¦ã³ãã¼ãããªãæ¡å¼µåãã³ã³ãåºåãã§æå®ãã --accept-regex=REGEX 許容ãã URL ã®æ£è¦è¡¨ç¾ãæå®ãã --reject-regex=REGEX æå¦ãã URL ã®æ£è¦è¡¨ç¾ãæå®ãã --regex-type=TYPE æ£è¦è¡¨ç¾ã®ã¿ã¤ã (posix) -D, --domains=LIST ãã¦ã³ãã¼ããããã¡ã¤ã³ãã³ã³ãåºåãã§æå®ãã --exclude-domains=LIST ãã¦ã³ãã¼ãããªããã¡ã¤ã³ãã³ã³ãåºåãã§æå®ãã --follow-ftp HTML ææ¸ä¸ã® FTP ãªã³ã¯ãåå¾å¯¾è±¡ã«ãã --follow-tags=LIST åå¾å¯¾è±¡ã«ããã¿ã°åãã³ã³ãåºåãã§æå®ãã --ignore-tags=LIST åå¾å¯¾è±¡ã«ããªãã¿ã°åãã³ã³ãåºåãã§æå®ãã -H, --span-hosts å帰ä¸ã«å¥ã®ãã¹ãããã¦ã³ãã¼ã対象ã«ãã -L, --relative ç¸å¯¾ãªã³ã¯ã ãåå¾å¯¾è±¡ã«ãã -I, --include-directories=LIST åå¾å¯¾è±¡ã«ãããã£ã¬ã¯ããªãæå®ãã --trust-server-names ãã¡ã¤ã«åã¨ãã¦ãªãã¤ã¬ã¯ãå ã®URLã®æå¾ã®é¨åã使ã -X, --exclude-directories=LIST åå¾å¯¾è±¡ã«ããªããã£ã¬ã¯ããªãæå®ãã -np, --no-parent 親ãã£ã¬ã¯ããªãåå¾å¯¾è±¡ã«ããªã ãã°å ±åã質åãè°è«ã¯<[email protected]>㸠ããã¨(ãããã¯ã¾ãã¯) https://savannah.gnu.org/bugs/?func=additem&group=wget ã«ç»é²ãã¦ãã ããã
対象ãµã¤ãã®æ°ã 10 ã¨ã 20 ã¨ãã«ãªã£ã¦ãã©ããæããã®ã§ã·ã¥ãã¨ã³ã¼ããæ¸ãã
geo_downloader.rb
class GeoDownloader def initialize(uri, filename_based_on_uri) @uri = uri @filename_based_on_uri = filename_based_on_uri end def execute command = %Q(mkdir #{@filename_based_on_uri} && cd #{@filename_based_on_uri} && wget --output-file=#{@filename_based_on_uri}.log --verbose --tries=2 --timestamping --no-if-modified-since --random-wait --waitretry=5 --adjust-extension --referer="http://www.geocities.co.jp/" --recursive --level=5 --convert-links --backup-converted --page-requisites "#{@uri}") `#{command}` puts "#{@uri}: Done!" end end
app.rb
require './geo_downloader' GeoDownloader.new('http://www.geocities.co.jp/FOO-BAR/12345/', 'FOO-BAR_12345').execute
ä¸è¨ã®ä¾ã®å ´åã§å®è¡ããã¨ãå®è¡ãããã£ã¬ã¯ããªé
ä¸ã« FOO-BAR_12345
ãä½ããã¦ããã®ä¸ã«å
¥ã£ã¦ãã wget
ãã¾ããwget
ã®å¯¾è±¡ã¯ http://www.geocities.co.jp/FOO-BAR/12345/
ã§ãã
ã³ã³ã¹ãã©ã¯ã¿ã«æ¸¡ãå¼æ°ãããæãã«é åãªãããã·ã¥ãªãã«è©°ãè¾¼ãã°ãããã¨æãã¾ãã