ãæè¿ã®ç¹å¾´èªæ½åºã®ææ³ã£ã¦ã©ããªãããªã¨æã£ã¦ãã¯ã¦ãªã®æ³¨ç®ã®ã¨ã³ããªã¼ãè¦ã¦ã¿ã¾ããã
タグ「特徴語抽出」を含む注目エントリー
ãé¢ç½ããã®ãç®ç½æ¼ãã§ãã
ç¹ã«Gigazinizeã¨ããã®ã¯ãç§é¸ã§ããé¢ç½ããªããã
http://blog.fulltext-search.biz/articles/2007/09/03/gigazinize
Gigazinize のなかみ
ãç§ãä½ãä½ã£ã¦ã¿ãããªãã¾ãããæ¸ç±ãã¼ã¿ããWikipediaããã®ãã¼ã¿ã¯è²ã ããã®ã§ããã®ããããçµã¿åããã¦ã¿ãããã¨æãã¾ãããã®åã«ç¹å¾´èªãæ½åºããçºã®ãã¼ã«ã§ããï¼ã ãã¶æã«ä½ã£ããã®ã§ãããï¼ãã¸ãã¯ã¯ããã¤ããããç´¹ä»ãã¦ããä¸è¨ã®æ¹æ³ã§ãã
形態素解析と検索APIとTF-IDFでキーワード抽出
ãå½¢æ
ç´ è§£æã¨Yahoo APIã使ã£ã¦ç¹å¾´èªãæ½åºããæ¹æ³ã§ãããã£ããã¨ä½ãã¦ããããªãã«ä½¿ãã¾ãããã ãã¤ã©Webæ¤ç´¢ã«è¡ã£ã¦ããã®ã§ãããªãã«é
ãã§ããè¦ã¯æç« ã®æ¯éå£ãããã°ããã®ã§ãYahoo APIã®é¨åãWikiPediaããä½ã£ãã³ã¼ãã¹çã«ç½®ãæãããããªãæ©ããªãã¨æãã¾ãã
ä¸ã¯Yahoo APIã使ã£ãå ´åã®ãµã³ãã«ã§ãã
use LWP::Simple; use MeCab; my $file = shift @ARGV; my $population = 19200000000; my $hit; my $content; my @feature; my %result; $content = readFile($file); @feature = &parseText($content); %result = countFeature(@feature); foreach $key (keys %result) { $hit = &get_num($key); $tfidf = calcTFIDF($result{$key},$hit,$population); print "key=$key,hit=$hit,tfidf=$tfidf\n"; } sub parseText { my $content = shift @_; my $m = new MeCab::Tagger ("-Ochasen"); my @feature = (); for (my $n = $m->parseToNode ($content); $n; $n = $n->{next}) { if ($n->{feature} =~ /åè©/) { push @feature,$n->{surface}; } } return @feature; } sub countFeature { my @feature = sort(@_); my $str=""; my $prev=""; my $cnt=0; my %feature; foreach $str (@feature) { if ( defined $feature{$str} ) { $feature{$str}++; } else { $feature{$str}=1; } } return %feature; } sub readFile { my $file = shift @_; open FH, "<$file"; my $content = join '', <FH>; close FH; return $content; } sub get_num { # æ¤ç´¢ãããæ°ç²å¾ by Yahoo! API my ($key) = @_; # UTF-8 $key =~ s/([^0-9A-Za-z_ ])/'%'.unpack('H2',$1)/ge; my $url = "http://api.search.yahoo.com/WebSearchService/V1/". "webSearch?appid=YahooDemo&query=$key&results=1"; my $c; ($c = get($url)) or die "Can't get $url\n"; my ($num) = ($c =~ /totalResultsAvailable="(\d+)"/); return $num; } sub calcTFIDF { my $tf = shift @_; my $df = shift @_; my $n = shift @_; my $tfidf; eval { $tfidf = $tf*log($n/$df); }; return $tfidf; }
次ã¯ãã³ã¼ãã¹é¨åãå¤æ´ãããã¨æãã¾ãã
Yahoo APIããWikipedia