¤³¤ó¤Ë¤Á¤Ï ¡£ ¸¡º÷ ´ØÏ¢ ¤ò ôÅö ¤·¤Æ ¤¤¤ë ¤ä¤Þ¤·¡¼ ¤Ç¤¹ ¡£
º£²ó¤Ï livedoor ¤ÇÄ󶡤·¤Æ¤¤¤ë¥µ¡¼¥Ó¥¹¤ÎÃæ¤Î¡Ö¸¡º÷´ØÏ¢¡×¤Ë¤Ä¤¤¤Æ½ñ¤¤Þ¤¹¡£
¤³¤Î¥Ö¥í¥°¤Ç¤â²áµî¤Ë²¿ÅÙ¤«¼è¤ê¾å¤²¤é¤ì¤Æ¤¤¤Þ¤¹¤¬¡¢livedoor ¤Ç¤Ï¸¡º÷¥¨¥ó¥¸¥ó¤È¤·¤Æ HyperEstraier¡¢lucene¡¢mysql + senna¡¢Namazu¡¢SUFARY ¤Ê¤É¤òÍøÍѤ·¤Æ¤¤¤Þ¤¹¡£
¤½¤ÎÃæ¤Ç lucene ¤ÎÍøÍÑÊýË¡¤äµ¡Ç½³ÈÄ¥¤Ë¤Ä¤¤¤ÆÀâÌÀ¤·¤Þ¤¹¡£
lucene ¤È¤Ï
Apache Lucene ¤Ï¡¢Java ¤Ç½ñ¤«¤ì¤¿¹âÀǽ¤Ç¹âµ¡Ç½¤Ê¸¡º÷¥¨¥ó¥¸¥ó¥é¥¤¥Ö¥é¥ê¤Ç¤¹¡£Á´Ê¸¸¡º÷¤ò¡ÊÆä˥¯¥í¥¹¥×¥é¥Ã¥È¥Õ¥©¡¼¥à¤Ç¡ËɬÍפȤ¹¤ë¤Û¤È¤ó¤É¤Î¥¢¥×¥ê¥±¡¼¥·¥ç¥ó¤ËŬ¤·¤Æ¤¤¤ëµ»½Ñ¤Ç¤¹¡£¢¨ ¸ø¼°¥µ¥¤¥È¤«¤éÈ´¿è
¥¤¥ó¥Ç¥Ã¥¯¥¹¤ÎºîÀ®
lucene ¤ÏžÃÖ¥¤¥ó¥Ç¥Ã¥¯¥¹·¿¤Î¸¡º÷¥¨¥ó¥¸¥ó¤Ê¤Î¤Ç¡¢¥É¥¥å¥á¥ó¥È¤ò¸¡º÷¤¹¤ë¤Ë¤Ï¡¢¤Þ¤º¥¤¥ó¥Ç¥Ã¥¯¥¹¤ÎºîÀ®¤ò¤·¤Þ¤¹¡£
¥É¥¥å¥á¥ó¥È¤ò¥¤¥ó¥Ç¥¯¥·¥ó¥°¤¹¤ëή¤ì¤Ï¼¡¤Î¤è¤¦¤Ê´¶¤¸¤Ç¤¹¡£
¥É¥¥å¥á¥ó¥È
¡¡¢
Analyzer
¡¡¢
Token
¡¡¢
¥¤¥ó¥Ç¥Ã¥¯¥¹¤ÎÀ¸À®
¤Þ¤º¡¢¥É¥¥å¥á¥ó¥È¤ò Analyzer(Tokenizer) ¤ËÄ̤·¤Æ¥Æ¥¥¹¥È¤ò Token(ñ¸ì) ¤Ëʬ²ò¤·¤Þ¤¹¡£
»ä¤Ï¥é¥¤¥Ö¥É¥¢¤¬Âç¹¥¤¤Ç¤¹¡£
¡¡¢
»ä ¤Ï ¥é¥¤¥Ö¥É¥¢ ¤¬ Âç¹¥¤ ¤Ç¤¹ ¡£
lucene ¤¬É¸½à¤ÇÍÑ°Õ¤·¤Æ¤¤¤ë Analyzer ¤ÏÆüËܸì¤ò½èÍý¤¹¤ë¤Î¤Ë¸þ¤¤¤Æ¤¤¤Ê¤¤°Ù¡¢livedoor ¤Ç¤Ï¤¤¤¯¤Ä¤«¤Î Analyzer ¤ò³«È¯¤·¤Þ¤·¤¿¡£
¡¦BygramAnalyzer-3.0
Bi-gram¤Î¸í»ú¤Ç¤Ï¤¢¤ê¤Þ¤»¤ó¡£¤Ê¤¼¤«¤½¤¦¤¤¤¦Ì¾Á°¤Ç¤¹¡£
ʸ»ú¥³¡¼¥É¤Î¼ïÊ̤ˤè¤Ã¤Æ Uni-gram¡¦Bi-gram ¤òÀÚ¤êÂؤ¨¤Æ Tokenize ¤·¤Þ¤¹¡£
¡¦MeCabAnalyzer-1.0
MeCab ¤Î Java ¥Ð¥¤¥ó¥Ç¥£¥ó¥°¤òÍøÍѤ·¤Æʬ¤«¤Á½ñ¤¤ò¤·¤Þ¤¹¡£
¡¦HyperAnalyzer-4.0
MeCabAnalyzer ¤È BygramAnalyzer ¤ÎÃæ´ÖŪ¤Ê°ÌÃÖÉÕ¤±¤Ç¤¹¡£
MeCab ¤Ë¤è¤Ã¤Æʬ¤«¤Á½ñ¤¤µ¤ì¤¿Ã±¸ì¤ò¡¢¤½¤Îʸ»ú¥³¡¼¥É¤Î¼ïÊ̤ˤè¤Ã¤Æ¡¢Uni-gram¡¦Bi-gram¡¦Ã±¸ì¤Î¤Þ¤Þ ¤òÀÚ¤êÂؤ¨¤Æ Tokenize ¤·¤Þ¤¹¡£
¡¦RangedValueAnalyzer-1.0
RangeQuery ¤ò»È¤¦¾ì¹ç¡¢Í½¤á¿ôÃͤò¸ÇÄêĹʸ»úÎó¤ËÊÑ´¹¤·¤Æ³ÊǼ¤·¡¢¥¯¥¨¥êʸ»úÎó¤â¸ÇÄêĹʸ»úÎó¤Ç¤¢¤ëɬÍפ¬¤¢¤ë»ÅÍͤ¬ÈѤ路¤¤¤Î¤Ç¡¢Analyzer ¤òÄ̤俤Ȥ¤Ë¸ÇÄêĹʸ»úÎ󲽤¹¤ë¤³¤È¤Ç¡¢¥É¥¥å¥á¥ó¥È¤ÎÄɲᦸ¡º÷¶¦¤Ë¿ôÃͤΤޤ޽èÍý¤Ç¤¤ë¤è¤¦¤Ë¤·¤Þ¤·¤¿¡£
¢¨ AnalyzingQueryParser ¤ò»È¤¦¤³¤È¤Ç RangeQuery ¤ËÅϤµ¤ì¤¿¥¯¥¨¥êʸ»úÎó¤â Analyzer ¤òÄ̤¹¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
ºÇ¸å¤Ë¡¢Ê¬²ò¤µ¤ì¤¿ Token ¤ò¸µ¤ËžÃÖ¥¤¥ó¥Ç¥Ã¥¯¥¹¤òÀ¸À®¤·¤Þ¤¹¡£
¸¡º÷
¥É¥¥å¥á¥ó¥È¤ò¸¡º÷¤¹¤ëή¤ì¤Ï¼¡¤Î¤è¤¦¤Ê´¶¤¸¤Ç¤¹¡£
¥¯¥¨¥êʸ»úÎó
¡¡¢
QueryParser
¡¡¢
Analyzer(Tokenizer)
¡¡¢
Query
¡¡¢
¥É¥¥å¥á¥ó¥È¤ò¸¡º÷¡¦¥¹¥³¥¢·×»»
¡¡¢
Sort
¡¡¢
Hits
¤Þ¤º¡¢¥¯¥¨¥êʸ»úÎó¤ò QueryParser ¤òÄ̤·¤Æ²òÀϤµ¤ì¤¿·ë²Ì Query ¤Ë¤Ê¤ê¤Þ¤¹¡£
text:(¥é¥¤¥Ö¥É¥¢ AND Âç¹¥¤)
¾åµ¤ÎÎã¤Ç¤Ï¡¢text:¤È¤¤¤¦¥Õ¥£¡¼¥ë¥É¤Ë ¥é¥¤¥Ö¥É¥¢ ¤È Âç¹¥¤ ¤È¤¤¤¦Ã±¸ì¤¬´Þ¤Þ¤ì¤ë¥É¥¥å¥á¥ó¥È¤ò¸¡º÷¤·¤Þ¤¹¡£
lucene ¤Î¥¯¥¨¥êʸ»úÎó¤Ë¤Ä¤¤¤Æ¤Ï Query Parser Syntax ¤Ë¾Ü¤·¤¯½ñ¤«¤ì¤Æ¤¤¤Þ¤¹¡£
²¼µ¤Î2¤Ä¤Î¥É¥¥å¥á¥ó¥È¤Ç¡¢
DocID: ¥É¥¥å¥á¥ó¥ÈÆâÍÆ
----------------------------------------
1: »ä¤Ï¥é¥¤¥Ö¥É¥¢¤¬Âç¹¥¤¤Ç¤¹¡£
2: »ä¤Ï¥¤¥ó¥¿¡¼¥Í¥Ã¥È¤¬Âç¹¥¤¤Ç¤¹¡£
žÃÖ¥¤¥ó¥Ç¥Ã¥¯¥¹¤¬ºîÀ®¤µ¤ì¤Æ¤¤¤ë¤È¤·¤Æ¡¢
Token: DocID
----------------------------------------
Ȋ: 1,2
¤Ï: 1,2
¥é¥¤¥Ö¥É¥¢: 1
¥¤¥ó¥¿¡¼¥Í¥Ã¥È: 2
¤¬: 1,2
Âç¹¥¤: 1,2
¤Ç¤¹: 1,2
¡£: 1,2
"¥é¥¤¥Ö¥É¥¢: 1" ¤È "Âç¹¥¤: 1,2" ¤ò AND ¤¹¤ë¤È 1 ¤Î¤ß¤¬¥Ò¥Ã¥È¤·¤Þ¤¹¡£
¤³¤³¤Çñ¸ì¤Î½Ð¸½ÉÑÅ٤ʤɤò¸µ¤Ë¥¹¥³¥¢·×»»¤µ¤ì¡¢ºÇ¸å¤Ë¥½¡¼¥È¤µ¤ì¤Æ·ë²Ì¤È¤Ê¤ê¤Þ¤¹¡£
¥×¥í¥°¥é¥àÎã
¥¤¥ó¥Ç¥Ã¥¯¥¹¤ÎºîÀ®¤È¸¡º÷¤Î´Êñ¤Ê¥×¥í¥°¥é¥àÎã¤ò½ñ¤¤Þ¤¹¡£
¡¦¥¤¥ó¥Ç¥Ã¥¯¥¹¤ÎºîÀ®
// Analyzer ¥¤¥ó¥¹¥¿¥ó¥¹¤òºîÀ®
Analyzer analyzer = new HyperAnalyzer();
// ¥¤¥ó¥Ç¥Ã¥¯¥¹¤Î¥ª¡¼¥×¥ó
String indexPath = "/path/to/your/index";
IndexWriter iw = new IndexWriter(indexPath, analyzer, true);
// ¥É¥¥å¥á¥ó¥È¤ÎºîÀ®¤È¥¤¥ó¥Ç¥Ã¥¯¥¹¤Ø¤ÎÄɲÃ
Document doc;
doc = new Document();
doc.add(new Field("name", "¥é¥¤¥Ö¥É¥¢°ìϺ"), Field.Store.YES, Field.Index.YES);
doc.add(new Field("address", "ÅìµþÅÔ¹Á¶èÀÖºä"), Field.Store.YES, Field.Index.YES);
doc.add(new Field("age", "25"), Field.Store.YES, Field.Index.UN_TOKENIZED);
iw.add(doc);
doc = new Document();
doc.add(new Field("name", "¥é¥¤¥Ö¥É¥¢ÆóϺ"), Field.Store.YES, Field.Index.YES);
doc.add(new Field("address", "ÅìµþÅÔ¹Á¶èÀÖºä"), Field.Store.YES, Field.Index.YES);
doc.add(new Field("age", "23"), Field.Store.YES, Field.Index.UN_TOKENIZED);
iw.add(doc);
doc = new Document();
doc.add(new Field("name", "¥é¥¤¥Ö¥É¥¢²Ö»Ò"), Field.Store.YES, Field.Index.YES);
doc.add(new Field("address", "·²Çϸ©°ËÀªºê»Ôº£°æÄ®"), Field.Store.YES, Field.Index.YES);
doc.add(new Field("age", "21"), Field.Store.YES, Field.Index.UN_TOKENIZED);
iw.add(doc);
// ¥¤¥ó¥Ç¥Ã¥¯¥¹¤Î¥¯¥í¡¼¥º
iw.close();
¡¦¥¤¥ó¥Ç¥Ã¥¯¥¹¤Î¸¡º÷
// ¥¤¥ó¥Ç¥Ã¥¯¥¹¤Î¥ª¡¼¥×¥ó
String indexPath = "/path/to/your/index";
IndexSearcher is = new IndexSearcher(indexPath);
// Analyzer ¥¤¥ó¥¹¥¿¥ó¥¹¤òºîÀ®
Analyzer analyzer = new HyperAnalyzer();
// QueryParser ¤Ë¤è¤ë¥¯¥¨¥êʸ»úÎó¤Î²òÀÏ
String queryString = "name:¥é¥¤¥Ö¥É¥¢ AND address:Åìµþ";
QueryParser qp = new QueryParser("name", analyzer);
Query query = qp.parse(queryString);
// ¥½¡¼¥È´ØÏ¢
SortField sortField = new SortField("age", SortField.Int);
Sort sort = new Sort(sortField);
// ¸¡º÷
Hits hits = is.search(query, sort);
// Hits ¤Îɽ¼¨
System.out.println(hits.length() + "·ï¥Ò¥Ã¥È");
for (int i = 0; i < hits.length(); i++)
{
Document doc = hits.doc(i);
System.out.println(
(i + 1) + ": " +
doc.get("name") + "," +
doc.get("address") + "," +
doc.get("age"));
}
¾åµ¤Î¼Â¹Ô·ë²Ì¡§
1: ¥é¥¤¥Ö¥É¥¢ÆóϺ,ÅìµþÅÔ¹Á¶èÀÖºä,23
2: ¥é¥¤¥Ö¥É¥¢°ìϺ,ÅìµþÅÔ¹Á¶èÀÖºä,25
¤³¤Î¤è¤¦¤Ê´¶¤¸¤Ç lucene ¤ò»È¤¦¤³¤È¤Ç´Êñ¤ËÁ´Ê¸¸¡º÷¤Î¥×¥í¥°¥é¥à¤ò½ñ¤¯¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
Orchestra(¥ª¡¼¥±¥¹¥È¥é)
lucene ¤ÏÈó¾ï¤Ë¹â®¤Ç¡¢Ã±ÂΤǤâ¿ôÉ´Ëü·ï¤Î¥É¥¥å¥á¥ó¥È¤òÂçÄñ1ÉðÊÆâ¤Ë¸¡º÷¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£¤·¤«¤·¥É¥¥å¥á¥ó¥È¿ô¤¬¿ôÀéËü·ïµ¬ÌϤˤʤë¤È¥ì¥¹¥Ý¥ó¥¹¥¿¥¤¥à¤¬ÌäÂê¤Ë¤Ê¤Ã¤Æ¤¤Þ¤¹¡£
¤½¤³¤Ç lucene ¤òÊ£¿ô¤Î¥Î¡¼¥É¤Ëʬ»¶¤¹¤ë¡ÖOrchestra¡×¤È¤¤¤¦»ÅÁȤߤò³«È¯¤·¤Þ¤·¤¿¡£
¥¯¥é¥¹¤ä¥¤¥ó¥¿¡¼¥Õ¥§¥¤¥¹¤ò¤½¤Î¤Þ¤Þ¤Îʬ»¶·¿¤Ë¤·¤Æ¤âÎɤ«¤Ã¤¿¤Î¤Ç¤¹¤¬¡¢IndexWriter ¤ä IndexSearcher ¤Î¥ª¡¼¥×¥ó¡¦¥¯¥í¡¼¥º¤È¤¤¤¦ÈѤ路¤¤ºî¶È¤ò°Õ¼±¤»¤º¤Ë¥É¥¥å¥á¥ó¥È¤ÎÄɲᦺï½ü¡¦¸¡º÷¤ò¹Ô¤¨¤ë¤è¤¦¤Ë¤·¤Þ¤·¤¿¡£
¤¢¤ë¥µ¡¼¥Ó¥¹¤Ç¤Ï¡¢¥Î¡¼¥É1¤Ä¤ËÌó3,300,000·ï¤Ç15¥Î¡¼¥É¤òÏ¢·ë¤·¤ÆÌó50,000,000·ï¤Î¥É¥¥å¥á¥ó¥È¤ò³ÊǼ¤·¤Æ¤¤¤Þ¤¹¡£¤½¤ì¤Ç¤âÂçÄñ¿ôÉ´¥ß¥êÉðÊÆâ¤Ë¸¡º÷¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
ÁÕ(Kanade)
lucene¡¢Orchestra ¤Ï Java ¤Ç½ñ¤«¤ì¤Æ¤¤¤ë¤Î¤Ç¡¢¥Õ¥í¥ó¥È¦¤Î perl ¤«¤éľÀÜÍøÍѤǤ¤Þ¤»¤ó¡£¤½¤³¤Ç¡¢¥É¥¥å¥á¥ó¥È¤ÎÄɲᦺï½ü¡¦¸¡º÷¤ò http ·Ðͳ¤Ç¹Ô¤¦¡ÖÁÕ(Kanade)¡×¤È¤¤¤¦»ÅÁȤߤò³«È¯¤·¤Þ¤·¤¿¡£
Kanade Queue ¤Ë http ·Ðͳ¤Ç add/delete/commit ¤Î¥³¥Þ¥ó¥É¤òÄɲ乤ë¤È¡¢¥Ð¥Ã¥¯¥°¥é¥¦¥ó¥É¤Ç²ÔƯ¤·¤Æ¤¤¤ë Kanade Indexing Service ¤Ë¤è¤ê½ç¼¡ Orchestra ¤Ø½èÍý¤µ¤ì¤Þ¤¹¡£
Kanade Search ¤Ë http ·Ðͳ¤Ç¸¡º÷¥¯¥¨¥ê¤òÅϤ¹¤È xml/json/csv ¤Î·Á¼°¤Ç·ë²Ì¤òÊÖ¤·¤Þ¤¹¡£
QueryParser ¤Îµ¡Ç½³ÈÄ¥
lucene ¤Îɸ½à¤Î QueryParser ¤ÏÎɤ¯¤Ç¤¤Æ¤¤¤Æ¡¢°ìÈÌŪ¤Ê¥¦¥§¥Ö¸¡º÷¤Ê¤É¤Î¥¯¥¨¥ê¤ÇɬÍפʥѥ¿¡¼¥ó¤Î¿¤¯¤ò¥µ¥Ý¡¼¥È¤·¤Æ¤¤¤Þ¤¹¡£AND¡¢OR¡¢NOT¡¢¥Õ¥ì¡¼¥º¤Ê¤É¡£
¤·¤«¤·Ê£»¨¤Ê¸¡º÷¥Ñ¥¿¡¼¥ó¤òɬÍפȤ¹¤ë¾ì¹ç¤Ë¤Ï lucene ¤Î Query ¥¯¥é¥¹¤ò·Ñ¾µ¤·¤ÆÆȼ«¤Î Query ¥¯¥é¥¹¤òºîÀ®¤¹¤ë¤³¤È¤Ë¤Ê¤ê¤Þ¤¹¡£¤½¤Î¤è¤¦¤Ë¤·¤Æ¤Ç¤¤¿ Query ¥¯¥é¥¹¤ò QueryParser ¤Ïǧ¼±¤Ç¤¤Þ¤»¤ó¡£
¤·¤«¤·¤»¤Ã¤«¤¯ QueryParser ¤È¤¤¤¦ÊØÍø¤Ê¤â¤Î¤¬¤¢¤ë¤Î¤Ç¡¢Î㤨¤ÐÆȼ«¤ËºîÀ®¤·¤¿ Query ¥¯¥é¥¹¤ò²¼µ¤Î¤è¤¦¤Ê¥¯¥¨¥êʸ»úÎó¤Ç»È¤¨¤¿¤é¡¦¡¦¡¦¤È»×¤¤¤Þ¤¹¡£
category:½ñŹ AND &geo_in_circle( lat => 'lat_field', long => 'long_field', center => 'E139.44.40.6N35.40.7.4', distance => '5')
¾åµ¤ÎÎã¤Ç¤Ï lat_field ¤È long_field ¤Ë°ÞÅÙ·ÐÅÙ¤¬³ÊǼ¤µ¤ì¤Æ¤¤¤ë¥É¥¥å¥á¥ó¥È¤òÁ°Äó¤Ë¡¢center ¤ÎÃͤ˶ᤤ½ç¤Ë¡¢5¥¥í¥á¡¼¥È¥ë°ÊÆâ¤Ë¤¢¤ë½ñŹ¤ò¸¡º÷¤¹¤ë¡¢¤È¤¤¤¦¤è¤¦¤Ê´¶¤¸¤Ç¤¹¡£
QueryParse ¤Ï ³ÈÄ¥BNF ¤È¤¤¤¦Ê¸Ë¡¤Ç½ñ¤«¤ì¤Æ¤ª¤ê¡¢JavaCC ¤Ç¥³¥ó¥Ñ¥¤¥ë¤¹¤ë¤³¤È¤Ç Java ¤Î¥½¡¼¥¹¥³¡¼¥É¤¬À¸À®¤µ¤ì¤Þ¤¹¡£lex ¤ä Flex ¤È¤¤¤Ã¤¿¥ì¥¥·¥«¥ë¡¦¥¢¥Ê¥é¥¤¥¶¤Î¤è¤¦¤Ê¤â¤Î¤Ç¤¹¡£
¼¡²ó
ÃæÅÓȾü¤Ê½ª¤ï¤ê¤«¤¿¤Ë¤Ê¤Ã¤Æ¤·¤Þ¤¦¤«¤È»×¤¤¤Þ¤¹¤¬¡¢¤À¤¤¤ÖŤ¯¤Ê¤Ã¤Æ¤·¤Þ¤Ã¤¿¤Î¤Ç QueryParser ¤Îµ¡Ç½³ÈÄ¥¤ä¥ì¥¥·¥«¥ë¡¦¥¢¥Ê¥é¥¤¥¶¤Ë¤Ä¤¤¤Æ¤Ï¡¢¤Þ¤¿¼¡²ó¤È¤¤¤¦¤³¤È¤Ç¡£