ãã³ã§ãããã¹ãåé¡å¨ã®Rubyã©ã¤ãã©ãªãçæã§ãã便å©ãã¼ã«ãä½ã£ã
ãã¾ãç´°ãããã¨ã¯æ°ã«ããããã¹ãåé¡å¨ã®Rubyã©ã¤ãã©ãªã1ã³ãã³ãã§èªåçæãã便å©ãã¼ã«ãä½ãã¾ããã
ããããè¿·èµ°ãã¦ããéã«ã
gem install nekoneko_gen
ã§ã¤ã³ã¹ãã¼ã«ã§ãã¾ãã
ãªã«ããããã®ãªã®ããã¡ãã£ã¨åããã«ããã®ã§ãä¾ã§èª¬æãã¾ãã
ï¼ã¡ããããã®æ稿ããã©ã®ã¹ã¬ããã®æ稿ãå¤å®ããã©ã¤ãã©ãªãçæãã
ä¾ã¨ãã¦ãï¼ã¡ããããã«æ稿ããããã¼ã¿ãããæ稿ï¼ã¬ã¹ï¼ãã©ã®ã¹ã¬ããã®ã¬ã¹ãå¤å®ããã©ã¤ãã©ãªãçæãã¦ã¿ã¾ãã
æºå
ã¾ã
gem install nekoneko_gen
ã§ã¤ã³ã¹ãã¼ã«ãã¾ãã
Ruby 1.8.7ã§ã1.9.2ã§ãåãã¾ãã1.9.2ã®ã»ãã5åãããéãã®ã§1.9.2以éãããããã§ãã
ç°å¢ã¯ãããã§ã¯Ubuntuãæ³å®ãã¾ãããWindowsã§ã使ãã¾ããï¼WindowsXP, ruby 1.9.3p0ã§ç¢ºèªï¼
ãã¼ã¿ã¯åãç¨æãã¦ããã®ã§ãé©å½ã«dataã¨ãããã£ã¬ã¯ããªãä½ã£ã¦ãã¦ã³ãã¼ããã¾ãã
% mkdir data % cd data % wget -i http://www.udp.jp/misc/2ch_data/ % cd ..
ã§ãã¦ã³ãã¼ãããã¾ãã
ãããããã¦ã³ãã¼ãããã¾ãããã¨ããããããã©ã¯ã¨è³ªåã¹ã¬ã¨ã©ããã©ã¹è³ªåã¹ã¬ã®2æã«ãããã¨æãã®ã§ã以ä¸ã®ãã¡ã¤ã«ã使ç¨ãã¾ãã
ãããã使ã£ã¦ãå
¥åãããæç« ããã©ã¯ã¨è³ªåã¹ã¬ã®ã¬ã¹ããã©ããã©ã¹è³ªåã¹ã¬ã®ã¬ã¹ãå¤å®ããã©ã¤ãã©ãªãçæãã¾ãã
- dragon_quest.txt
- ãã©ã´ã³ã¯ã¨ã¹ããªãã§ã質åã¹ã¬ã®ãã¼ã¿ï¼ç´3ä¸ä»¶)
- dragon_quest_test.txt
- dragon_quest.txtãããã¹ãç¨ã«500件æããã¬ã¹ï¼dragon_quest.txtã«ã¯å«ã¾ããªãï¼
- dragon_quest_test2.txt
- dragon_quest_test.txtã®2ã¬ã¹ã1è¡ã«ãããã¼ã¿
- loveplus.txt
- ã©ããã©ã¹è³ªåã¹ã¬ã®ãã¼ã¿ï¼ç´2.5ä¸ä»¶)
- loveplus_test.txt
- loveplus.txtãããã¹ãç¨ã«500件æããã¬ã¹
- loveplus_test2.txt
- loveplus_test.txtã®2ã¬ã¹ã1è¡ã«ãããã¼ã¿
å
¥åãã¼ã¿ã®ãã©ã¼ãããã¯ã1è¡1ãã¼ã¿ã§ãããã®ãã¼ã¿ã®å ´åã¯ã1ã¬ã¹ä¸ã®æ¹è¡ã³ã¼ããæ¶ãã¦1è¡1ã¬ã¹ã«ãã¦ãã¦ãã¾ãã
ãã¼ã¿ã®æ´åã¯ã¢ã³ã«ã¼ï¼>>1ã®ãããªãªã³ã¯ï¼ãæ¶ããã ããªã®ã§ãããµã³ã¯ã¹ããæ»ã¬ããããã§ããã¿ãããªã©ãèãã¦ãåé¡ç¡çã ãã¿ãããªãã¼ã¿ãå«ã¾ãã¦ãã¾ããã¾ãçªç¶èãããç»å ´ãã¦ã¹ã¬ã¨é¢ä¿ãªãã¯ã½ã¬ã¹ãç¹°ãè¿ãã¦ãããããã¾ãã
*_test.txtã¨*_test2.txtã¯çæãããã©ã¤ãã©ãªã®ç¢ºèªç¨ã§ãã*_test.txtã®ãã¡ããã¤æ£è§£ã§ãããæ°ããã®ã«ä½¿ãã¾ãã*_test2.txtã¯ã*_test.txtã®2ã¬ã¹ã1ãã¼ã¿ã«ãããã®ã§ããï¼ã¡ããã®æ稿ã¯çããã¦ãã¾ãå¤å®ã§ããªããã¨ãå¤ãã®ã§ã¯ï¼ãã¨æãã®ã§ããªã2ã¬ã¹ããã°å¤å®ã§ããã®ãï¼ãã¨ãã確èªç¨ã§ãã
çæãã¦ã¿ã
% nekoneko_gen -n game_thread_classifier data/dragon_quest.txt data/loveplus.txt
nekoneko_genã¨ããã³ãã³ãã§çæãã¾ãã
-nã§çæããåé¡å¨ã®ååãæå®ãã¾ããããã¯".rb"ãä»ãã¦ãã¡ã¤ã«åã«ãªãã®ã¨ããã£ãã¿ã©ã¤ãºãã¦ã¢ã¸ã¥ã¼ã«åã«ãªãã¾ããçæå
ãã£ã¬ã¯ããªãæå®ãããå ´åã¯ãç´æ¥ãã¡ã¤ã«åã§ãæå®ã§ãã¾ãã
ãã®å¾ãã«åé¡ï¼å¤å®ï¼ããã種é¡ãã¨ã«å¦ç¿ç¨ã®ãã¡ã¤ã«ãæå®ãã¾ããæä½2ãã¡ã¤ã«ã§ããã以ä¸ãªãããã¤ã§ãæå®ã§ãã¾ãã
ã¡ãã£ã¨æéããããã®ã§ãå¾ ã¡ã¾ãã2åãããã
% nekoneko_gen -n game_thread_classifier data/dragon_quest.txt data/loveplus.txt loading data/dragon_quest.txt... 35.5426s loading data/loveplus.txt... 36.0522s step 0... 0.879858, 3.7805s step 1... 0.919624, 2.2018s step 2... 0.932147, 2.1174s step 3... 0.940959, 2.0569s step 4... 0.946985, 1.8876s step 5... 0.950891, 1.8564s step 6... 0.953541, 1.8398s step 7... 0.955464, 1.8204s step 8... 0.957427, 1.8008s step 9... 0.959056, 1.7912s step 10... 0.961098, 1.8027s step 11... 0.961745, 1.7716s step 12... 0.962943, 1.7633s step 13... 0.963610, 1.7477s step 14... 0.964611, 1.6216s step 15... 0.965259, 1.7291s step 16... 0.965730, 1.7271s step 17... 0.966613, 1.7225s step 18... 0.967241, 1.5861s step 19... 0.967712, 1.7113s DRAGON_QUEST, LOVEPLUS : 71573 features done nyan!
çµãã£ãã -nã§æå®ããååã®ãã¡ã¤ã«ã«Rubyã®ã³ã¼ããçæããã¦ãã¾ãã
% ls -la ... -rw-r--r-- 1 ore users 2555555 2012-05-28 08:10 game_thread_classifier.rb ...
2.5MBãããããã¾ãããçµæ§ãã«ãã
ãã®ãã¡ã¤ã«ã«ã¯ãGameThreadClassifierï¼æå®ããååããã£ãã¿ã©ã¤ãºãããã®ï¼ã¨ããModuleãå®ç¾©ããã¦ãã¦ãself.predict(text)ã¨ããã¡ã½ãããæã£ã¦ãã¾ãããã®ã¡ã½ããã«æååã渡ãã¨ãäºæ¸¬çµæã¨ãã¦GameThreadClassifier::DRAGON_QUESTãGameThreadClassifier::LOVEPLUSãè¿ãã¾ãããã®å®æ°åã¯ãã³ãã³ãã«æå®ãããã¼ã¿ãã¡ã¤ã«åã大æåã«ãããã®ã§ãã
試ãã¦ã¿ã
çæãããã©ã¤ãã©ãªã使ã£ã¦ã¿ã¾ãããã
注æã¨ãã¦ãRuby 1.8.7ã®å ´åã¯ã$KCODEã'u'ã«ãã¦ãããªãã¨åãã¾ããããã¨å
¥åã®æåã³ã¼ããutf-8ã®ã¿ã§ãã
# coding: utf-8 if (RUBY_VERSION < '1.9.0') $KCODE = 'u' end require './game_thread_classifier' $stdout.sync = true loop do print "> " line = $stdin.readline label = GameThreadClassifier.predict(line) puts "#{GameThreadClassifier::LABELS[label]}ã®è©±é¡ã§ã!!!" end
ãããªã³ã¼ãã console.rb ã¨ãã¦ä½ãã¾ãã
GameThreadClassifier.predictã¯äºæ¸¬ãããã¯ã©ã¹ã®ã©ãã«çªå·ãè¿ãã¾ãã
GameThreadClassifier::LABELSã«ã¯ãã©ãã«çªå·ã«å¯¾å¿ããã©ãã«åãå
¥ã£ã¦ããã®ã§ãããã表示ãã¦ã¿ã¾ãã
% ruby console.rb > 彼女ããã¡ã¼ã«ãæ¥ã LOVEPLUSã®è©±é¡ã§ã!!! > æ¥ææ¥ã¯ãã¼ããã¦ã¾ãã LOVEPLUSã®è©±é¡ã§ã!!! > é欲ãã DRAGON_QUESTã®è©±é¡ã§ã!!! > çæ§ã«ãªããã DRAGON_QUESTã®è©±é¡ã§ã!!! > ã¹ã©ã¤ã DRAGON_QUESTã®è©±é¡ã§ã!!! > ã¹ã©ã¤ã ã彼女ã«ãã¬ã¼ã³ã LOVEPLUSã®è©±é¡ã§ã!!! >
ã§ãã¦ãã£ã½ãã§ãããCTRL+Dã¨ãCTRL+Cã¨ãã§é©å½ã«çµããã¾ãã
æ£è§£çã調ã¹ã¦ã¿ã
*_test.txtã*_test2.txtã®ä½%ãããæ£è§£ã§ããã調ã¹ã¦ã¿ã¾ãã
if (RUBY_VERSION < '1.9.0') $KCODE = 'u' end require './game_thread_classifier' labels = Array.new(GameThreadClassifier.k, 0) file = ARGV.shift File.open(file) do |f| until f.eof? l = f.readline.chomp label = GameThreadClassifier.predict(l) labels[label] += 1 end end count = labels.reduce(:+) labels.each_with_index do |c, i| printf "%16s: %f\n", GameThreadClassifier::LABELS[i], c.to_f / count.to_f end
å¼æ°ã«æå®ãããã¡ã¤ã«ã1è¡ãã¤predictã«æ¸¡ãã¦ãäºæ¸¬ãããã©ãã«çªå·ã®æ°ãæ°ãã¦ãã¯ã©ã¹ãã¨ã«å
¨ä½ã®ä½å²ãã表示ããã ãã®ã³ã¼ãã§ãã
GameThreadClassifier.kã¯ãã¯ã©ã¹æ°ï¼ãã®å ´åãDRAGON_QUESTã¨LOVEPLUSã§2ï¼ãè¿ãã¾ãã
% ruby test.rb data/dragon_quest_test.txt DRAGON_QUEST: 0.932000 LOVEPLUS: 0.068000
data/dragon_quest_test.txtã«ã¯ããã©ã¯ã¨è³ªåã¹ã¬ã®ãã¼ã¿ãããªãã®ã§ããã¹ã¦æ£è§£ã§ããã°ãDRAGON_QUEST: 1.0ã«ãªãã¯ãã§ãã
DRAGON_QUEST: 0.932000ãªã®ã§ã93.2%ã¯æ£è§£ãã¦ã6.8%ã¯ã©ããã©ã¹ã¨ééãããã¨ãåããã¾ãã
åãããã«ãã¹ã¦è©¦ãã¦ã¿ã¾ãããã
% ruby test.rb data/dragon_quest_test.txt DRAGON_QUEST: 0.932000 LOVEPLUS: 0.068000 % ruby test.rb data/loveplus_test.txt DRAGON_QUEST: 0.124000 LOVEPLUS: 0.876000 % % ruby test.rb data/dragon_quest_test2.txt DRAGON_QUEST: 0.988000 LOVEPLUS: 0.012000 % ruby test.rb data/loveplus_test2.txt DRAGON_QUEST: 0.012048 LOVEPLUS: 0.987952
ã©ããã©ã¹ã¯ã¡ãã£ã¨æªãã¦ã87%ãããã§ãããå¹³åããã¨ã90%ãããæ£è§£ãã¦ãã¾ãã
ã¾ã2ã¬ã¹ã§å¤å®ããã¨98%以ä¸æ£è§£ãããã¨ãåããã¾ããã2ã¬ã¹ããã°ãããããã©ã¯ã¨ã¹ã¬ããã©ããã©ã¹ã¹ã¬ããã»ã¨ãã©ééãããã¨ãªãå¤å®ã§ããã£ã½ãã§ããã
ã¾ã¨ã
ããã¾ã§èªãã§ããã ããã°ãã©ããããã®ãåãã£ãã¨æãã¾ãã
ç¨æãããã¼ã¿ãã¡ã¤ã«ãå¦ç¿ãã¦ãæå®ããæååãã©ã®ãã¼ã¿ãã¡ã¤ã«ã®ãã¼ã¿ã¨ä¼¼ã¦ãããå¤å®ããRubyã©ã¤ãã©ãªãçæãã¾ãã
çæãããã©ã¤ãã©ãªã¯ãRubyã®æ¨æºã©ã¤ãã©ãªä»¥å¤ã§ã¯ã json 㨠bimyou_segmenter ã«ä¾åãã¦ãã¾ãã
gem install json bimyou_segmenter
C Extensionã使ããªãç°å¢ã ã¨ã
gem install json_pure bimyou_segmenter
ã¨ããã°ãããããªç°å¢ã§çæããã©ã¤ãã©ãªã使ããããã«ãªãã¾ãã
ã¡ãªã¿ã« bimyou_segmenter ã¨ããååãããã¦ãããããªã©ã¤ãã©ãªã¯ãããã¨ä¼¼ããããªæ¹æ³ã§èªåçæããæ¥æ¬èªåãã¡æ¸ãã®ã©ã¤ãã©ãªã§ãã
ãã£ã¨è©¦ãï¼ï¼
ãã¼ã¿ã¯ä»ã« skyrim.txt (ã¹ã«ã¤ãªã ã®è³ªåã¹ã¬ï¼ãmhf.txt (ã¢ã³ã¹ã¿ã¼ãã³ã¿ã¼ããã³ãã£ã¢ãªã³ã©ã¤ã³ã®è³ªåã¹ã¬ï¼ãç¨æãã¦ããã®ã§ãããããå¦ç¿ã§ãã¾ãã
% nekoneko_gen -n game_thread_classifier data/dragon_quest.txt data/loveplus.txt data/skyrim.txt data/mhf.txt
åç´ã«æå®ãããã¡ã¤ã«ãå¢ããã ãã§ãã
çæãããã³ã¼ããå¤å®çµæãå¢ããã ããªã®ã§ãä¸ã§ä½ã£ãconsole.rbãtest.rbããã®ã¾ã¾ä½¿ãã¾ãã
% nekoneko_gen -n game_thread_classifier data/dragon_quest.txt data/loveplus.txt data/skyrim.txt data/mhf.txt loading data/dragon_quest.txt... 35.4695s loading data/loveplus.txt... 36.5006s loading data/skyrim.txt... 148.8504s loading data/mhf.txt... 94.2842s step 0... 0.885344, 29.5712s step 1... 0.918844, 24.0811s step 2... 0.927274, 22.0760s step 3... 0.932804, 20.7306s step 4... 0.936590, 20.4044s step 5... 0.939495, 19.2658s step 6... 0.942164, 19.1920s step 7... 0.943754, 19.1084s step 8... 0.945903, 18.9361s step 9... 0.948293, 18.8840s step 10... 0.949483, 18.1423s step 11... 0.950827, 18.6365s step 12... 0.951693, 18.2945s step 13... 0.952915, 18.0946s step 14... 0.953600, 17.9010s step 15... 0.954284, 17.8173s step 16... 0.955062, 17.7265s step 17... 0.956281, 17.0873s step 18... 0.956424, 17.5843s step 19... 0.957648, 17.5608s DRAGON_QUEST : 181402 features LOVEPLUS : 171552 features SKYRIM : 199655 features MHF : 194066 features done nyan! % ruby test.rb data/dragon_quest_test.txt DRAGON_QUEST: 0.862000 LOVEPLUS: 0.042000 SKYRIM: 0.056000 MHF: 0.040000 % ruby test.rb data/loveplus_test.txt DRAGON_QUEST: 0.068000 LOVEPLUS: 0.836000 SKYRIM: 0.052000 MHF: 0.044000 % ruby test.rb data/skyrim_test.txt DRAGON_QUEST: 0.044000 LOVEPLUS: 0.040000 SKYRIM: 0.844000 MHF: 0.072000 % ruby test.rb data/mhf_test.txt DRAGON_QUEST: 0.052000 LOVEPLUS: 0.024000 SKYRIM: 0.058000 MHF: 0.866000 % % ruby test.rb data/dragon_quest_test2.txt DRAGON_QUEST: 0.964000 LOVEPLUS: 0.016000 SKYRIM: 0.012000 MHF: 0.008000 % ruby test.rb data/loveplus_test2.txt DRAGON_QUEST: 0.004016 LOVEPLUS: 0.987952 SKYRIM: 0.008032 MHF: 0.000000 % ruby test.rb data/skyrim_test2.txt DRAGON_QUEST: 0.000000 LOVEPLUS: 0.020000 SKYRIM: 0.964000 MHF: 0.016000 % ruby test.rb data/mhf_test2.txt DRAGON_QUEST: 0.008032 LOVEPLUS: 0.000000 SKYRIM: 0.016064 MHF: 0.975904
1ã¬ã¹ã®å ´åã¯ãé¸æè¢ãå¢ããåæªããªã£ã¦ãã¾ããå¹³åããã¨æ£è§£ã¯85%ãããã§ããããã2ã¬ã¹ã®å ´åã¯ãã¾ã 97%以ä¸æ£è§£ãã¦ãã¾ãã
ç°¡åããã¯ãã¿ï¼èªä½èªæ¼ï¼
ãªããããªãã®ãä½ã£ãã®ã
åã®ãå¥èªç¹ã®ãªãæååãæåä½ã«åºåãããä½ã£ãã¨ãã«LIBLINEARã使ã£ã¦ã¦ãLIBLINEARã¯éããç°¡åãªã®ã§ããã³ã§ããããLIBLINEARã¿ãããªæãã§ãã²ç´¹ä»ãããã¨æã£ã¦ããã®ä¾ã¨ãã¦ããã¹ãåé¡ãããããã¨æã£ãã®ã§ãããMeCabãKyotoCabinetãã¤ã³ã¹ãã¼ã«ãã¦ãã¾ãã¾ãªä½æ¥ã¹ã¯ãªãããæ¸ãã¦LIBLINEARã§ä½¿ããã¢ã«ã´ãªãºã ã®éãããã©ã¡ã¼ã¿ã®æå³ã«ã¤ãã¦ç解ããå¿
è¦ããã£ãããã¦â¦â¦ãããªããã³ã«åãããããªãâ¦ã«ããâ¦ã¨æã£ãã
ããã§ã1ã³ãã³ãã§ä½¿ãã¦ãã¤ã³ã¿ã¼ãã§ã¼ã¹ã ãç¥ã£ã¦ããã°ä¸èº«ãç¥ãå¿
è¦ããªããã©ãã¯ããã¯ã¹çã«ä½¿ããã¸ã§ãã¬ã¼ã¿ã¼ã§ä¾åé¢ä¿ã®å°ãªãã©ã¤ãã©ãªã³ã¼ããçæã§ããã°ããã³ã§ãã¡ãã£ã¨ããããã¹ãåé¡å¨ãä½ãããã便å©ãªã®ã§ã¯ï¼ãã¨æã£ãã®ã§ããã
å®è£ ã¯ãã¾ãåãã¡æ¸ãã®ã©ã¤ãã©ãªãTinySegmenter: Javascriptだけで実装されたコンパクトな分かち書きソフトウェアãåèã«é空æ庫ã®ãã¼ã¿ã§å¦ç¿ãã¾ãããããã使ã£ã¦æç« ã®ç¹å¾´ãã¯ãã«ã¨ãã¦Bag of wordsãä½ã£ã¦ããã®ç¹å¾´ãã¯ãã«ã [機械学習] AROWのコードを書いてみた - tsubosakaの日記ã§ç´¹ä»ããã¦ããAROWã¨ããå¦ç¿ã¢ã«ã´ãªãºã ãå¤ã¯ã©ã¹ã«ãããã®ã§å¦ç¿ãã¦ãå¦ç¿ãããã¢ãã«ãRubyã®ã³ã¼ããã³ãã¬ã¼ãã«åãè¾¼ãã§ããã ãã§ããããããé©å½ã«ã¤ãªããã ããªã®ã§ãããèªä½ã¯ç¹ã«é¢ç½ãã¨ããã¯ãªãã¨æãã¾ããã©ããããã£ããã¼ã«ã®ã便ãå©ãããã®ãå¯ãè½ãæ§ãã¿ãããªãã®ã¯ä¼ãã£ãã®ã§ã¯ãã¨æãã¾ãã
ã¡ãªã¿ã«åãã¡æ¸ãã®ã©ã¤ãã©ãªããpure rubyã§æ¸ãã¦ããã®ã§ããã®ã¸ããã¡ãã£ã¨ç§»æ¤ãã¦ãã³ãã¬ã¼ããä½ãã°ãJavaScriptãPHPãªã©ä»ã®è¨èªã®ã³ã¼ããçæã§ãã¾ããä»ã¯ã§ãã¾ãããã
æå¾ã«
ããã¯ã¨ãããã¨ãã¦ãLIBLINEARã¯éããç°¡åãªã®ã§ãªã¹ã¹ã¡ãããã
nekoneko_gen ã§ã使ã£ã¦ãã bimyou_segmenter ã¯LIBLINEARã®å¦ç¿çµæ(Model)ããRubyã©ã¤ãã©ãªãçæããããã°ã©ã ã§çæãã¦ããã®ã§ãä»åº¦ãã®è©±ãæ¸ãããã¨æãã¾ãã
çµå±æ¸ãã®ã§ nekoneko_gen ã¨ã¯ä¸ä½ãªãã ã£ãã®ãã¨ãä»ã«ãªã£ã¦èãã¦ãã¾ãã