Igoã®NoSuchMethodErrorã«ããã
æ¥æ¬èªã®å½¢æ
ç´ è§£æã½ããã¨ãã¦ã¯MeCabãChasenãæåã§ããï¼Javaã§æ¸ããããã®ã¨ãã¦ã¯SenãGoSenãæåã§ãï¼
Windows上で形態素解析Sen - なぜか数学者にはワイン好きが多い
ã¨ãããã¡ã³ãããã¦ããªããã®ãå¤ãã®ã§ï¼æ¯è¼çæ°ããIgoã試ãã¦ã¿ã¾ããï¼
Igo - a morphological analyzer
Apache Luceneã®Analyzerãä½ãããã£ãã®ã§ãï¼
Japanese_Luceneã¨ãããã£ã¬ã¯ããªä»¥ä¸ã«è²ã
ã¨å±éãããã¨ã«ãã¾ãï¼
Luceneã¯æ®éã«ãã¦ã³ãã¼ããã¦å±éï¼
> wget http://ftp.jaist.ac.jp/pub/apache/lucene/java/lucene-3.0.3.tar.gz > tar xvf lucene-3.0.3.tar.gz > mkdir -p Japanese_Lucene/lib > cp -v lucene-3.0.3/lucene-core-3.0.3.jar Japanese_Lucene/lib/
Igoãjarãåã£ã¦ãã¦libã«æ ¼ç´ï¼
> cd Japanese_Lucene/lib > wget http://iij.dl.sourceforge.jp/igo/46696/igo-0.4.2.jar > wget http://jaist.dl.sourceforge.jp/igo/46413/igo-analyzer-0.0.1.jar
NAISTè¾æ¸ãåã£ã¦ãã¦å½¢æ ç´ è§£æç¨ã«å¤æãã¾ãï¼
> cd ../.. > wget http://iij.dl.sourceforge.jp/naist-jdic/48487/mecab-naist-jdic-0.6.3-20100801.tar.gz > tar xvf mecab-naist-jdic-0.6.3-20100801.tar.gz > cd mecab-naist-jdic-0.6.3-20100801 > grep -v -E '^\"' naist-jdic.csv > naist-jdic.tmp; mv naist-jdic.tmp naist-jdic.csv > make clean; ./configure; make; cd .. > java -cp ./Japanese_Lucene/lib/igo-0.4.2.jar net.reduls.igo.bin.BuildDic ipadic mecab-naist-jdic-0.6.3-20100801 EUC-JP > ls -l ipadic/
ãªãããã¡ã¤ã«ãåºæ¥ã¦ã¾ãï¼
> java -Dfile.encoding=UTF-8 -cp ./Japanese_Lucene/lib/igo-0.4.2.jar net.reduls.igo.bin.Igo ipadic ãããããããããã®ãã¡ ããã åè©,ä¸è¬,*,*,*,*,ããã,ã¹ã¢ã¢,ã¹ã¢ã¢,, ã å©è©,ä¿å©è©,*,*,*,*,ã,ã¢,ã¢,, ãã åè©,ä¸è¬,*,*,*,*,ãã,ã¢ã¢,ã¢ã¢,, ã å©è©,ä¿å©è©,*,*,*,*,ã,ã¢,ã¢,, ãã åè©,ä¸è¬,*,*,*,*,ãã,ã¢ã¢,ã¢ã¢,, ã® å©è©,é£ä½å,*,*,*,*,ã®,ã,ã,, ãã¡ åè©,éèªç«,å¯è©å¯è½,*,*,*,ãã¡,ã¦ã,ã¦ã,, EOS
ãªãããã¾ããã£ã¦ããæ°ããã¾ãï¼
ã§ã¯ãã£ããIgoã®Analyzerãåã£ã¦ããã®ã§ï¼Luceneã®Indexserã¨Searcherãä½ãã¾ãï¼
ï¼ããã§æåï¼ã¯ã©ã¹åãMyIndexerãããªãã¦Indexerã«ãã¦ï¼Luceneã®ååã®ã¯ã©ã¹åã¨è¢«ã£ã¦3åããã£ãï¼
import java.io.File; import java.io.FileNotFoundException; import java.io.IOException; import java.util.Date; import java.util.List; import java.io.FileReader; import java.io.StringReader; import java.io.BufferedReader; import java.io.InputStreamReader; import org.apache.lucene.document.DateTools; import org.apache.lucene.document.Field; import org.apache.lucene.analysis.Analyzer; import net.reduls.igo.Tagger; import net.reduls.igo.analysis.ipadic.IpadicAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.index.FilterIndexReader; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.util.Version; public class MyIndexer { public MyIndexer() {} static final File INDEX_DIR = new File("index"); static public void main(String[] args) { final File docDir = new File("data"); try { final Tagger tagger = new Tagger("ipadic"); IndexWriter writer = new IndexWriter(FSDirectory.open(INDEX_DIR),new IpadicAnalyzer(tagger), indexWriter.MaxFieldLength.LIMITED); indexDocs(writer, docDir); writer.optimize(); writer.close(); } catch (IOException e) { System.out.println(" caught a " + e.getClass() +"\n with message: " + e.getMessage()); } } static void indexDocs(IndexWriter writer, File file) throws IOException { if (file.canRead()) { if (file.isDirectory()) { String[] files = file.list(); if (files != null) { for (int i = 0; i < files.length; i++) { indexDocs(writer, new File(file, files[i])); } } } else { System.out.println("adding " + file); writer.addDocument(FileDocument.Document(file)); } } } private static class FileDocument { public static Document Document(File f) throws java.io.FileNotFoundException { Document doc = new Document(); doc.add(new Field("path", f.getPath(), Field.Store.YES,Field.Index.NOT_ANALYZED)); doc.add(new Field("modified", DateTools.timeToString(f.lastModified(),DateTools.Resolution.MINUTE), Field.Store.YES, Field.Index.NOT_ANALYZED)); doc.add(new Field("contents", new FileReader(f))); return doc; } private FileDocument() {} } }
ãã«ããã¦jarã«åºãã¡ããã¾ãï¼
> javac -cp Japanese_Lucene/lib/igo-0.4.2.jar:Japanese_Lucene/lib/igo-analyzer-0.0.1.jar:Japanese_Lucene/lib/lucene-core-3.0.3.jar MyIndexer.java > jar cvf MyIndexer.jar MyIndexer.class MyIndexer$FileDocument.class
ãã£ã¬ã¯ããªãä½ã£ã¦æ¥æ¬èªããã¹ããã¡ã¤ã«ãæ¾ãè¾¼ã¿ã¾ãï¼
> mkdir data ã¦ãã¨ã¼ãªãã¡ã¤ã«ãdataã®ä¸ã«å ¥ãã¦ãã ãã
ããã¦Luceneã®ã¤ã³ããã¯ã¹ãä½ãã¨ï¼ï¼ï¼
> java -cp .:Japanese_Lucene/lib/igo-0.4.2.jar:Japanese_Lucene/lib/igo-analyzer-0.0.1.jar:Japanese_Lucene/lib/lucene-ore-3.0.3.jar MyIndexer adding data/1040.txt Exception in thread "main" java.lang.NoSuchMethodError: net.reduls.igo.Tagger.parse(Ljava/lang/String;)Ljava/util/List; at net.reduls.igo.analysis.ipadic.IpadicTokenizer.readMorpheme(Unknown Source) at net.reduls.igo.analysis.ipadic.IpadicTokenizer.incrementToken(Unknown Source) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:137) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:246) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:826) at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:802) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1998) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1972) at MyIndexer.indexDocs(MyIndexer.java:57) at MyIndexer.indexDocs(MyIndexer.java:51) at MyIndexer.main(MyIndexer.java:33)
ãã®ã¨ã©ã¼ã§3æéã¯ãããã¾ããï¼
æçµçã«åãã£ããã¨ã¯ï¼http://jaist.dl.sourceforge.jp/igo/46413/igo-analyzer-0.0.1.jarã使ã£ã¦ã¯ãããªãï¼ï¼ï¼
http://jaist.dl.sourceforge.jp/igo/46413/igo-analyzer-0.0.1-src.tar.gzãåã£ã¦ãã¦ï¼antã§ãã«ãããçµæã§ããigo-analyzer-0.0.1.jarã使ãã¨ï¼
> java -cp .:Japanese_Lucene/lib/igo-0.4.2.jar:Japanese_Lucene/lib/igo-analyzer-0.0.1.jar:Japanese_Lucene/lib/lucene-core-3.0.3.jar MyIndexer adding data/1040.txt adding data/2498.txt adding data/5587.txt adding data/2457.txt adding data/12296.txt adding data/6715.txt adding data/812.txt adding data/8456.txt adding data/8917.txt
ã©ã©ã¼ã£ã¨åºåãåºã¦ï¼ls -l indexã§ä½ããã¡ã¤ã«ãåºæ¥ã¦ããã®ã確èªã§ãã¾ããï¼
Calbee ã®ã¶ã®ã¶ããã ãµã¯ã¼ã¯ãªã¼ã ãªããªã³
ã¿ããã®ã®é¦ããè¯ãæãã§ããã
ãµã¯ã¼ã¨ããã¾ããï¼é
¸å³ã¯ã»ã¨ãã©ç¡ãã¦é
¸å³ã®å¼·ãã¯ã¤ã³ã®ãããªãé
ã®ãããã«åãã¾ãï¼
åææåã¯ï¼ããããããï¼éºä¼åçµæãã§ãªãï¼ããããã¹ããªã³ããé£å¡©ãããªããªã³ãã¦ãã¼ï¼å¤§è±ãå«ãï¼ããç²æ«é ¢ãããã¨ã¤ãã¦ãã¼ãããµã¯ã¼ã¯ãªã¼ã ãã¦ãã¼ãããã¼ãºãã¦ãã¼ããç ç³ãããã»ãªããªã¼ã¯ããã¬ã¼ãªãã¯ãã¦ãã¼ããé¦æï¼åµã»ãªã¬ã³ã¸ãå«ãï¼ãçã§ï¼ã«ããªã¼ã¯60ã°ã©ã å½ãã332kcalï¼ãããªã¦ã ã®é£å¡©ç¸å½éã¯0.8ã°ã©ã ã ããã§ãï¼