ãããã¤ã³ã¹ãã¼ã«ä»£è¡¨ã®ã©ã¤ãããã¯ããã°
Webãã¼ã¸ã®èªåã«ãã´ã©ã¤ãº ã®ç¶ãã ååæ¸ããã¨ããããã¹ãã©ãã¯ã§è¡ã£ã¦ãã Web ãã¼ã¸ã®ã«ãã´ã©ã¤ãºã§ã¯ãWeb ãã¼ã¸ã®æ¬ææ½åºãã²ã¨ã¤ã®éµã«ãªã£ã¦ãã¾ããä»åã¯ãã®æ¬ææ½åºã¢ã¸ã¥ã¼ã«ãå ¬éãã¤ã¤ã使ã£ã¦ããææ³ããã£ãã解説ãªã©ãã¦ã¿ã¾ãã æ¬ã¢ã¸ã¥ã¼ã«ã®å©ç¨ã¯è³æ¥µç°¡åãrequire ã㦠analyse ã¡ã½ããã«è§£æããã html ãä¸ããã ããæåã³ã¼ã㯠UTF-8 ã§ãã ã追è¨ã大äºãªãã¨æ¸ãå¿ããæ¬ã¢ã¸ã¥ã¼ã«ã¯ Ruby1.8.5 ã§åä½ç¢ºèªãã¦ãã¾ãããç¹å¥ãªãã¨ã¯ãã¦ããªãã®ã§ã1.8.x ãªãåãã¨æãã¾ãã $KCODE="u" # æåã³ã¼ã㯠utf-8 require 'extractcontent.rb' # ãªãã·ã§ã³å¤ã®æå® opt = {:waste_expressions => /ãåãåãã|ä¼ç¤¾æ¦è¦/} ExtractCont
ããã°çã«ã¯æ¬¡ã®ã¦ã§ãã®åºæ¬æ§è³ªã«é¢ããè°è«ãæµ®ä¸ãã¦ãããTim O'Reillyã¯ä½ã«ã§ããã¼ã¸ã§ã³ãä»ãããWeb 3.0ããªã©ã¨å¼ã¶æµè¡ãæ¹å¤ããã·ã°ãã«ãéã£ã¦ããããã¨ã«ãã次ã«ä½ãæ¥ããã«ã¤ãã¦ã¯åæã¯å¾ããã¦ããªãã®ãç¾ç¶ã ããããã¯ã次ã«æ¥ãã®ã¯1ã¤ã®ãã®ã§ã¯ãªããããã¤ãã®å¤§ããªãã¼ãã«ãã£ã¦ç¹å¾´ã¥ããããã¨èãã¦ããã ã¦ã§ãã®æ°ããé²æ©ã«ã¯ãã»ãã³ãã£ã¯ã¹ãã¢ãã³ã·ã§ã³ï¼ç¡æèçãªè¡åï¼ãå人åãããã次ã®ã¦ã§ããã©ãå¼ã¶ãã¯ã¨ãããã¨ãã¦ãããã§ã®æ å ±ã¯ããæå³ããããããèªåçã§ãããããã²ã¨ãã²ã¨ãã«åãããåãããããã®ã«ãªãã 次ã®ã¦ã§ãã®é²åã§æ¬ ãããªãã®ã¯ãæ§é åãããæ å ±ã®åãè¾¼ã¿ã¨ããè¦ç´ ã ããã®æ¦å¿µã¯ãããã人éã«ã¨ã£ã¦ã¯ãããã¾ãã®ãã®ã ããã³ã³ãã¥ã¼ã¿ã«ã¨ã£ã¦ã¯ããã§ã¯ãªãã¨ããäºå®ãå®å ¨ã«è¦éããã¦ãã¾ã£ã¦ããã人éãAmazonã§æ¸ç±ãè¦
Nvu - The Complete Web Authoring System for Linux, Macintosh and Windows Finally! A complete Web Authoring System for Linux desktop users as well as Microsoft Windows and Macintosh users to rival programs like FrontPage and Dreamweaver. ããªã¼ã§ä½¿ããDreamWeaver風HTMLã¨ãã£ã¿ãNvuãã WYSIWYGã§ç·¨éã§ããHTML TagsãSourceããã¬ãã¥ã¼ã®ã¿ããªãããããã CSSã¨ãã£ã¿æ©è½ãHTMLããªãã¼ã¿ã¨ã®é£æºãç¬èªæ¡å¼µã®è¿½å æ©è½ãªã©ãããã 使ããããHTMLã¨ãã£ã¿ã¯ãªããï¼ã¨ããæ¹ã¯æ¯é試ãã¦é ãããã½ããã§ãã
ãã©ã¦ã¶ããã®ãã¼ã¿ãã¼ã¿ã¯ãç¹å¥ãªå½¢å¼ã§ã¨ã³ã³ã¼ãããã¦éä¿¡ããããã¨ã«ãªã£ã¦ãã¾ããç»é¢ã«è¡¨ç¤ºããããã©ã¼ã ã®ã³ã³ããã¼ã«ã§å ¥åãã以å¤ã«ããé ããããã¼ã¿ã®æ å ±ãéã£ããããã©ã¼ã ã使ããã«éä¿¡ãããªã©ãããã¤ãã®æ¹æ³ãããã¾ãã ç®æ¬¡ï¼ ä½è ã®è¨å®ãããã¼ã¿ãéã ãã¼ã¿éä¿¡ã¨URLã¨ã³ã³ã¼ã ãã©ã¼ã ããç´æ¥ã¡ã¼ã«éä¿¡ ã¨ã³ã³ã¼ãã®ã¿ã¤ãã¨ãã¡ã¤ã«éä¿¡ GETã¡ã½ããã¨POSTã¡ã½ãã GETã使ã£ã¦ç´æ¥ãã¼ã¿ãéã ä½è ã®è¨å®ãããã¼ã¿ãéã inputè¦ç´ ãªã©ã§ç¨æãããã³ã³ããã¼ã«ãã¯ã¦ã¼ã¶ã¼ããããæä½ãã¦ãã¼ã¿ãå ¥åããããã®ãã®ã§ããããããå ´åã«ãã£ã¦ã¯HTMLã®å¶ä½è ãæå®ãããã¼ã¿ãããã°ã©ã ã«éä¿¡ãããå ´åãããã§ãããããã®ãããªãã¼ã¿ã¯ã¦ã¼ã¶ã¼ãæä½ããå¿ è¦ã¯ãªããã¦ã¼ã¶ã¼ã®ç®ã«è§¦ããªãã»ããé½åããããã®ã§ãããã®ããã«ãinputè¦ç´ ã®ã¿ã¤ãã®ä¸ã¤ã¨ã
Rubyã«ããWeb Scrapingã©ã¤ãã©ãªã®æ å ±ãã¾ã¨ããããã®Wikiã§ãã HpricotHTMLããRubyããããæ±ãã©ã¤ãã©ãª MechanizeWebãµã¤ãã¸èªåã§ã¢ã¯ã»ã¹ããããã®ã©ã¤ãã©ãª scRUBYt!DSLã使ã£ã¦ç°¡åã«ã¹ã¯ã¬ã¤ãã³ã°ãè¡ãã©ã¤ãã©ãª feedalizerhtmlããRSSãã£ã¼ããä½ãã®ã«å½¹ç«ã¤ã©ã¤ãã©ãª scrAPIãã¼ãµãå®ç¾©ãããã¨ã§HTMLã解æããã©ã¤ãã©ãª ã¦ã§ããµã¤ãããå¿ è¦ãªãã¼ã¿ãæ½åºãããã¨ã(Scrape = åãåã) ã©ã¤ãã©ãªã«ãã£ã¦ã¯ãåä¿¡ãããã¼ã¿ã®è§£æã ãã§ãªããã¼ã¿ã®éä¿¡ã«ã対å¿ãã¦ããã ä¾ï¼ RSSãé ä¿¡ãã¦ããªãã¦ã§ããµã¤ãã®HTMLãã¹ã¯ã¬ã¤ãã³ã°ãã¦éè¯RSSãä½ã Googleã®æ¤ç´¢çµæãã¹ã¯ã¬ã¤ãã³ã°ãã¦èªåã§Googleæ¤ç´¢ããã¹ã¯ãªãããæ¸ã ããã°ã®æ稿ãã¼ã¸ã解æãã¦ãã³ãã³ãã©ã¤ã³ãã
ããå°ãæéããã£ãã®ã§ç»åãHTML(ã¢ã¹ãã¼ã¢ã¼ã?)ã«å¤æãããã¼ã«ãä½ã£ã¦ã¿ã¾ããã 使ãæ¹ã¯"ç»åã®URL"ã«å¤æãããç»åã®URLãå ¥ãã¦"æå"ã«é©å½ãªä¸æåãå ¥ãã¾ãã ã§ã"HTMLå¤æ"ãã¿ã³ãã¯ãªãã¯ããã¨ãã®ç»åã"æå"ã§è¡¨ãããHTMLã¨ãã¦è¡¨ç¤ºããã¾ãã ããã©ã«ãã§ãã¯ã·ã£ã®ãã´ç»åãå¤æãããããã«ãªã£ã¦ããã®ã§ã¨ãããããã®ã¾ã¾ãã¿ã³ãæ¼ãã¦ã¿ã¦ãã ããã ã©ããã¦ãç»åã表示ã§ããªãç°å¢ãç»åãå«ããHTMLãç¡çç¢çä¸æã®HTMLã«åããã ããªãç¹æ®ãªç¶æ³ä¸ã«ãã人ã«ã¯ä½¿ããã¨æãã¾ãã ç»åã®URLï¼ æåï¼ å¤§ããï¼ãã¯ã»ã« 縮å°ï¼% ãã¢ã¬ã¹ãã¬ã¤ãã¼ ãã¤ãªãã¢ è¡¨ç¤ºï¼ ç»å ã½ã¼ã¹
ã©ã³ãã³ã°
ã©ã³ãã³ã°
ã©ã³ãã³ã°
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}