æ¥æ¬èªå½¢æ ç´ è§£æã©ã¤ãã©ãª Kuromoji ã®ã³ãã³ãã©ã¤ã³ç¨ã¤ã³ã¿ã¼ãã§ã¤ã¹ãæ¸ãã
Java ã§æ¸ãããæ¥æ¬èªå½¢æ ç´ è§£æã©ã¤ãã©ãª Kuromoji ã Perl ãã使ã£ã¦ã¿ãããªã¼ãã¨æã£ã¦ãå¥ã®è¨èªã®ããã°ã©ã ãã使ãããããããªã¤ã³ã¿ã¼ãã§ã¤ã¹ãæ¸ãã¾ããã
Kuromoji ã«ã¤ãã¦
Java ã§æ¸ããããªã¼ãã³ã½ã¼ã¹ã®æ¥æ¬èªã®å½¢æ ç´ è§£æã©ã¤ãã©ãªã§ãã æ¤ç´¢ç¨ã®ã©ã¤ãã©ãªãããã¦ãApache Lucene 㨠Apache Solr ã«çµã¿è¾¼ã¾ãã¦ãããããã§ãã ãã¡ããæ¤ç´¢ã¨ã³ã¸ã³ã¨ã¯ç¬ç«ãã¦å©ç¨ãããã¨ãå¯è½ã§ãã
ç¬ç«ãã¦ä½¿ç¨ããå ´åã¯ãåã« jar ãã¡ã¤ã«ããã¦ã³ãã¼ããã¦ã㦠Java ã®ã©ã¤ãã©ãªãã¹ã«è¿½å ããã ã (ãããã¯å ¬éããã¦ãã Maven ãªãã¸ããªã使ç¨ãã¦ä¾åé¢ä¿ã解決ããã ã) ã§å½¢æ ç´ è§£æã§ããããã«ãªãã¾ãã 便å©ã§ããã
Java 㧠Kuromoji ã使ãæ¹æ³ã«é¢ãã¦ã¯ä¸è¨ã®è¨äºãããããããã£ãã§ãã
ã©ã¤ã»ã³ã¹ã¯ Apache License, Version 2 ã§ãã
ã³ãã³ãã©ã¤ã³ç¨ã®ã¤ã³ã¿ã¼ãã§ã¤ã¹ã«ã¤ãã¦
ãã¦ãä¸è¿°ã®ããã« Java ãã使ãã ããªãç°¡åã«å½¢æ ç´ è§£æã§ããããã«ãªãããã§ãããä»ã®è¨èªããã¯ãã®ã¾ã¾ã§ã¯ä½¿ããªãã®ã§ãç°¡åã«ä½¿ããããã«ã³ãã³ãã©ã¤ã³ç¨ã®ã¤ã³ã¿ã¼ãã§ã¤ã¹ãæ¸ãã¾ãã *1ã
- GitHub ä¸ã®ãªãã¸ããª: nobuoka/shino · GitHub
åç´ã«æ¨æºå ¥åã«å ¥åãããæååã 1 è¡ãã¨ã«å½¢æ ç´ è§£æãããã®çµæã JSON å½¢å¼ã§æ¨æºåºåã«æµããã¨ãããã®ã§ãã
ã©ã¤ã»ã³ã¹ã¯ Apache License, Version 2 ã§ãã
使ãæ¹
ä¸è¨ ZIP ãã¡ã¤ã«ããã¦ã³ãã¼ããã¦å±éãã¦ãã ããã å®è¡ãã¡ã¤ã«ã¯ shino/bin/shino ã§ãã
å®è¡å¾ãæ¨æºå ¥åã«æååãå ¥åããã¨ããããå½¢æ ç´ è§£æãã㦠JSON å½¢å¼ã§çµæãè¿ã£ã¦ãã¾ãã
$ ./shino/bin/shino ãæ°´ããããã [{"position":0,"all_features":["æ¥é è©","åè©æ¥ç¶","*","*","*","*","ã","ãª","ãª"], "reading":"ãª","is_known":true,"is_user":false,"part_of_speech":"æ¥é è©,åè©æ¥ç¶,*,*", "is_unknown":false,"surface_form":"ã","base_form":"ã"}, {"position":1,"all_features":["åè©","ä¸è¬","*","*","*","*","æ°´","ããº","ããº"], "reading":"ããº","is_known":true,"is_user":false,"part_of_speech":"åè©,ä¸è¬,*,*", "is_unknown":false,"surface_form":"æ°´","base_form":"æ°´"}, {"position":2,"all_features":["形容è©","èªç«","*","*","形容è©ã»ã¤æ®µ","åºæ¬å½¢","ãããã","ãªã¤ã·ã¤","ãªã¤ã·ã¤"], "reading":"ãªã¤ã·ã¤","is_known":true,"is_user":false,"part_of_speech":"形容è©,èªç«,*,*", "is_unknown":false,"surface_form":"ãããã","base_form":"ãããã"}, {"position":6,"all_features":["è¨å·","å¥ç¹","*","*","*","*","ã","ã","ã"], "reading":"ã","is_known":true,"is_user":false,"part_of_speech":"è¨å·,å¥ç¹,*,*", "is_unknown":false,"surface_form":"ã","base_form":"ã"}]
ä¸ã¯ ããæ°´ãããããã ã¨ããæ¨æºå ¥åã«å¯¾ã㦠JSON å½¢å¼ã§çµæãè¿ãããä¾ *2ã
Ruby ããã°ã©ã ãªã©ããã®ä½¿ãæ¹
Ruby ãªã©ãã使ãå ´åã¯ãåæ¹åã®ãã¤ããéãã¦ãæ¨æºå ¥åã«å½¢æ ç´ è§£æãããæååã渡ãã¦ãæ¨æºåºåããçµæãåãåºããã¨ãããããªãã¨ãããã°ããã§ãã 以ä¸ã¯ Ruby ã®ä¾ã§ãæååã®ä¸ã®åè©ãåãåºãã¨ãããã®ã§ãã
# coding: utf-8 require 'json' PATH_TO_SHINO = './shino/bin/shino' nouns = [] IO.popen(PATH_TO_SHINO, 'r+') do |io| ['åã¨ã¤ãªãã RPGã', 'ä»æ¥ã¯ãã天æ°ã§ããã', 'éã®çã«æ±ããã¦æ¶ããã'].each do |str| io.puts str tokens = JSON.parse(io.gets()) tokens.each do |token| nouns.push token if token['all_features'][0] == 'åè©' end end end p nouns.map{|token| token['surface_form'] } #=> ["å", "RPG", "ä»æ¥", "天æ°", "é", "ç"]