HTML::Feature - éè¦é¨åãæ½åºããã¢ã¸ã¥ã¼ã« -
以åããCPANã§å ¬éãã¦ããã¢ã¸ã¥ã¼ã«ããããã§ãããæ¥æ¬èªã§ã®è§£èª¬ããã¥ã¡ã³ãããªãã£ãã®ã¨ãæè¿å¤§å¹ ã«ãã©ãã·ã¥ã¢ããããã®ã§ããã£ãããªã®ã§ç´¹ä»è¨äºãæ¸ãã¾ãã
HTML::Feature - Extract Feature Sentences From HTML Documents
ãããã¡ã¦ããããã::ãµãã¼ã¡ãã¼ãã¨èªã¿ã¾ãã
ããã°ããã¥ã¼ã¹è¨äºãªã©æ§ã
ãªHTMLææ¸ãããéè¦é¨åããæ¨æ¸¬ãã¦æ½åºãã¦ããã perl ã¢ã¸ã¥ã¼ã«ã§ãã
ãéè¦é¨åãã¨ã¯ãããããæ¬æãã®ãã¨ã§ãããæ¬ææ½åºã¨ãç¦ç¹æ½åºã¨ãè²ã
ãªè¨ãæ¹ããããã¨æãã¾ãããã¾ãè¦ããã«ç¹å¾´çãªé¨åãæ¨æ¸¬ãã¦æ½åºããããã§ãã
ã©ããããã®ãã
ä¾ãã°ããã°è¨äºãããããã¼ãããã¿ã¼ããã®ä»ã®ããã²ã¼ã·ã§ã³ãããã¯ãé¤ãããè¨äºãããé¨åãã ããåãåããããã¨ãã¾ãã
ã±ã£ã¨æãã¤ãã®ã¯ãç¹å®ã®ã³ã¡ã³ãã¿ã°ã«ã¯ãã¾ããé¨åãæ£è¦è¡¨ç¾ã§ãã£ã½ãã¨æãããããªãã¨ãèãä»ãã¾ããããã¡ãããæã¿ã®ã³ã¡ã³ãã¿ã°ãä¸ã®ä¸ã®HTMLææ¸ã«å ¥ã£ã¦ããã¨ã¯éãã¾ãããã¨ããããæå¾ ããã³ã¡ã³ãã¿ã°ã«æ¬æãæã¾ãã¦ãããªã©ã¨ããé½åã®ãããã¨ã¯ã¾ããããã¾ããã
ããã§ããå°ãé å¼µã£ã¦ã¿ã¦ãå½¢æ ç´ è§£æãã¦ããTF/IDFã§åèªã®éã¿ãè¨ç®ãã¦éè¦ãªãã¼ã¯ã¼ããå«ã¾ãã¦ããããã¹ãé¨åãåè£ã¨ãããã¨ãããããªæ¹æ³ãæãæµ®ãã³ã¾ãããããããã«ãè½ã¨ãç©´ãããã¾ããå½¢æ ç´ è§£æå¨ãããããã©ãã®å½ã®è¨èã«ãããã¦ä½ããã¦ãããã«ãã£ã¦ã解æã§ããHTMLããã¥ã¡ã³ãã®ç¨®é¡ã決å®ããã¦ãã¾ãã¾ããè¦ããã«æ¥æ¬èªã§å½¢æ ç´ è§£æãããå¤å½èªã®ãµã¤ãã¯å¯¾è±¡å¤ã«ãªã£ã¦ãã¾ããã¨ãããã¨ã§ããå½ããåã®å¶ç´äºé ã¨è¨ãã°ããã¾ã§ã®ãã¨ãªãã§ãããã¨ã¯ããå¤å½èªã®ãµã¤ãã対象ã«ãããã¨æãã¨ãã®æ¹æ³ã§ã¯ã¡ãã£ã¨å¤§å¤ã§ãã
HTML::Featureã«ã¯ãããã®å¶ç´äºé
ãããã¾ããã
HTMLã®ãæ§é ãã解æãã¦éè¦ãããªé¨åãæ¨æ¸¬ããä»çµã¿ã«ãªã£ã¦ãã¾ãã
ã©ããããã¸ãã¯ãªã®ãã¯å¾è¿°ããã¨ãã¦ãHTML::Featureã¯ã©ãã®å½ã®ãµã¤ãã§ããã¾ãäºåã«ç¹å®ã®ã³ã¡ã³ãã¿ã°ãªã©ãä»è¾¼ãã§ãããããªåªåãããã¨ããéè¦é¨åãæ½åºã§ãã¾ãã
使ãæ¹
ãããªæã
use HTML::Feature; my $feature = HTML::Feature->new; my $result = $feature->parse("http://hogehoge.com"); print $result->text();
ããã§hogehoge.comã®éè¦é¨åãæ½åºã§ãã¾ããparse()ã«æ¸¡ãã®ã¯URLã§ãè¯ãããHTMLããã¥ã¡ã³ããã®ãã®ã§ãè¯ãããHTTP::Responseãªãã¸ã§ã¯ãã§ãè¯ãã§ãã
ã»ãã«ãè²ã
ãªå¼ã³åºãæ¹ãããã®ã§è©³ããã¯CPANãã¿ã¦ãã ããã
http://search.cpan.org/~miki/HTML-Feature-2.0.2/
å©ç¨ã·ã¼ã³
ã¯ãã¼ã©ã§éãã¦ããHTMLããã¥ã¡ã³ããDBã«çªã£è¾¼ãåã®ãã£ã«ã¿ã¼ã¨ãã¦å©ç¨ããããã¾ãã
æ¤ç´¢ã¨ã³ã¸ã³ãæå³è§£æç³»ã®ã·ã¹ãã ã§å©ç¨ããã¨ã解æ精度ãããã¨é«ã¾ããã¯ãããã¶ãã
ä»çµã¿
HTML::Featureã¯å é¨ã§HTML::TreeBuiderã使ã£ã¦HTMLãæ§é æ¨ã«ããä¸ã§ãåãã¼ããç·ãªãã«ããªãããç¬èªã®ãã¸ãã¯ã§ã¹ã³ã¢ãªã³ã°ãã¦ããã¾ããæçµçã«ãã£ã¨ãã¹ã³ã¢ã®é«ãã£ããã¼ãããéè¦é¨åãã¨ãããã¨ã«ãªãã¾ãã
ã§ãèå¿ã®ã¹ã³ã¢ãªã³ã°ã®ãã¸ãã¯ã§ãããå®ã¯ã¨ã¦ãã·ã³ãã«ã§ãã
- ã¿ã°ã¨ããã¹ãé¨åã®ãã¤ãæ°ãæ¯è¼ãã¦ãããã¹ãçããé«ãã»ã©éè¦é¨åã®å¯è½æ§ãé«ã
- HTMLã¿ã°ã®é層ãæ·±ãé¨åã§ããã»ã©ãéè¦é¨åã§ããå¯è½æ§ãé«ã
- HTMLææ¸ã®ãªãã§ãã®ãã¼ããç»å ´ããé åºãæ©ãã»ã©ãéè¦é¨åã§ããå¯è½æ§ãé«ã
ãã®3é
ç®ã§åãã¼ããè¨ç®ãã¦ããã¨ãæå¤ã¨ç°¡åã«éè¦é¨åããã¶ãåºããã¦ãã¾ãã
ã»ã¨ãã©ã®æ¹ãããããããªãã§éè¦é¨åãæ½åºã§ããã®ï¼ãã¨æããããã¨ã
å®éã«ã¯HTML::Featureã®æ½åºç²¾åº¦ã¯7ã8å²ã£ã¦ã¨ããã§ãã
ããã°ããã¥ã¼ã¹è¨äºã®ãããªæ¬æé¨åãæ確ã«ããææ¸ã¯å¾æã§ããããã¼ã¿ã«ãµã¤ãã®ããããã¼ã¸ã®ããã«è¨äºé¨åããã¾ããªãã¦ã»ã¨ãã©ãªã³ã¯ã°ããã®ãµã¤ãã¯è¦æã§ãã
ã¾ãå½ãããå «å¦å½ããã¬ãå «å¦ã¨ãã£ãã¨ããã§ã¯ããã¾ããã
èå³ã®ããæ¹ã¯ãã²è©¦ãã¦ã¿ã¦ãã ããã
Special Thanks to Daisuke Maki !!