ãã¾ããã£ããèªãã§ãªãã¦ã¹ã«ã¼ãã¦ããã®ã§ããï¼ http://wota.jp/ac/?date=20070115#p01 ã§ä½¿ããã¦ãããããªç¹å®ã®ãã¼ã¸ã«CSSã»ã¬ã¯ã¿ãããããããªã±ã¼ã¹ãªãã¾ãã«Hpricotãé©ä»»ã ã¨æãã¾ãã ScrAPIã®è¯ãã¨ããã¯ãããç°¡åã«Parserç¨ã®ã¯ã©ã¹ãåãããããæ¸ããã¨ããã«ããã¨æãã®ã§å¤§éã®ãã¼ã¸ãã¯ãã¼ã«ãã¦å®åã®ãã¼ã¿ã貯ããããããã¨ããéã«ã¯ãã®APIããããªãã¨æãã¾ãã ã¨ãããã¨ã§ä¸ã®ãã¼ã¸ã¨åããã¨ãHpricotã§ãã£ã¦ã¿ã¾ãã require 'kconv' #=> true require 'open-uri' #=> true require 'hpricot' # 以ä¸ã®ä¾ã¯version 0.5以ä¸ãæ³å®ãã¦ã¾ã #=> true $KCODE = 'u' #=> "u" maiha = Hpricot.p
scrAPIããã使ããããæãã®Hpricotã§ããããinnerTextããä¸æãHTMLã¨ã³ãã£ãã£ã¼ãæ»ãã¦ãããªãã®ã§ãéãã¡ã½ãããã¤ãã¦ã¿ã¾ããã require "rubygems" require 'hpricot' class Hpricot::Elem def [](a) CGI.unescapeHTML(get_attribute(a)) end def to_text r = [] traverse_text{|text| case text when Hpricot::CData r << text.content else r << CGI.unescapeHTML(text.inner_text.gsub("\n"," ").gsub(/ +/," ").strip) end } r.join end end hp = Hpricot('<html><bog
ãªãªã¼ã¹ãé害æ å ±ãªã©ã®ãµã¼ãã¹ã®ãç¥ãã
ææ°ã®äººæ°ã¨ã³ããªã¼ã®é ä¿¡
å¦çãå®è¡ä¸ã§ã
j次ã®ããã¯ãã¼ã¯
kåã®ããã¯ãã¼ã¯
lãã¨ã§èªã
eã³ã¡ã³ãä¸è¦§ãéã
oãã¼ã¸ãéã
{{#tags}}- {{label}}
{{/tags}}