������ֹ������ʤ����á¹ï¿½ï¿½ï¿½É½ï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½
�ظ����ʤ�Lucene�֥���OSS�����饤�֥���Lucene����Ӥ��Υ��֥ץ��������ȡ�Solr/Tika/Mahout�ʤɡˤˤĤ���
2017.12.15 Friday
���ݥ󥵡�������
������ֹ������ʤ����á¹ï¿½ï¿½ï¿½É½ï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½
| ���ݥ󥵡��ɥ�� | - | | - | - |
2017.12.15 Friday
Solr ���뤢��
���Ҥ� Apache Solr �˴ؤ��붵�顿���󥵥�ƥ��󥰡����ݡ��Ȥγƥ����ӥ����󶡤��Ƥ��뤬����̳���̤��ƥ桼�����ˤ�� Solr �λ�ǰ�ʻȤ������������뤳�Ȥ⤿�Ӥ��Ӥ��롣�����ǤϤ�����Solr ���뤢��פ����Τ��ƾҲ𤷤Ƥߤ褦���ʤ���������ä� Apache Lucene �ˤ����ƤϤޤ뤬�����Ǥϴ�ñ�ˡ�Solr ���뤢��פȵ��ܤ��뤳�Ȥˤ��롣 �������׸�������ְ��׸��� �ե������ f1 �� abc �ǻϤޤ��Τò¸¡ºï¿½ï¿½ï¿½ï¿½ï¿½Î¤ï¿½ q=f1:abc* �Ȥ����ꡢabc ���ե������ f1 ��˸�����Τò¸¡ºï¿½ï¿½ï¿½ï¿½ï¿½Î¤ï¿½ q=f1:*abc* �Ȥ���Τϸ���Ǥ��롣����Ū�����Ԥ��������׸�������Ԥ���ְ��׸����ʤɤȸƤФ�뤬��ž�֥���ǥå��������θ������󥸥�Ǥϸ����оݤκǾ�ñ�̤�ñ��Ȥʤꡢñ��δ������פ����ܤȤʤ롣�������äơ����������׸����ס���ְ��׸����פȤ����褦�ʳ�ǰ���ʤ��������оݥե�����ɤ�ñ��ξ��ϴ������׸����Ȥʤ뤷��2ñ��ʾ�Υե�����ɤǤ������ְ��׸����Ȥʤ롣�������äơ���ְ��׸�����ñ��δ������סˤò¤·¤ï¿½ï¿½ï¿½ï¿½Î¤Ç¤ï¿½ï¿½ï¿½Ð¡ï¿½q=f1:*abc* �ȤϤ����� q=f1:abc �Ȥ���Ф��������q=f1:*abc* �Ǹ����Ǥ��Ƥ���褦�˸����Ƥ���Τϡ����������ʥѥ����󤬹ͤ����뤬��ʸ������Ϥβ����� * ����Ȥ���Ʒ��Ū�� abc �Ȥ���ñ�측���ˤʤäƤ��롢�ʤɤ��ͤ����롣 �Ǥ��������פϤɤ����������������׸����ò¤·¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ï¡ï¿½Å¾ï¿½Ö¥ï¿½ï¿½ï¿½Ç¥Ã¥ï¿½ï¿½ï¿½ï¿½Îºï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½×¤ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½Åªï¿½Ë¤ï¿½ EdgeNGramTokenizer�ʾ��̤ˤ�äƤ� EdgeNGramTokenFilter�ˤ�Ȥäƥ���ǥå�����������롣�����Ȥ��ȡ�abcdefg �Ȥ����ե������ʸ����ϼ��Τ褦��ñ��ʬ�䤵�졢����ǥå���������롣 a ab abc abcd abcde abcdef abcdefg ���Υե�����ɤ��Ф���abc �Ȥ���ʸ������������׸����ò¤·¤ï¿½ï¿½ï¿½ï¿½ï¿½Ð¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Â¦ï¿½Ï¥È¡ï¿½ï¿½ï¿½ï¿½Ê¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ä¤Þ¤ï¿½ KeywordTokenizer ��Ȥäơʾ��ˤ�äƤ� LowerCaseFilter ���Ȥ߹�碌�Ƥ�褤�����ξ��ϥ���ǥå���¦�ˤ�Ŭ�Ѥ��롣�ʲ�Ʊ���˼��Τ褦�˸������롣 q=f1:abc ñ��δ������׸��������ܤθ������󥸥�ǽ��跿�Ρ��������׸����ס���ְ��׸����פò¤·¤ï¿½ï¿½ï¿½ï¿½ï¿½ç¡¢* ��Ȥ������� Solr ���뤢��Ǥ��뤬���������Ϥ狼�뤬�ְ㤤�Ǥ��롣����ϼ��ǽҤ٤�������׸����Ǥ�Ʊ�ͤǤ��롣 �������׸��� Ʊ�������ե�������ͤ� abc �ǽ����ʸ���õ���Ȥ�����̣�����Ƹ������׸����ò¤·¤ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½Î¤ï¿½ q=f1:*abc �Ȥ����������ȯ�Ԥ���Τ� Solr ���뤢��Ǥ��뤬������Ǥ��롣Lucene ��ž�֥���ǥå�����ñ�줬�����Ȥ��줿���֤dz�Ǽ����Ƥ��뤳�Ȥ����Ѥ�����ʬõ������롣�������äơ�ñ������������פ��Ƥ��뤫�ɤ������ǧ����ˤϡ�ž�֥���ǥå������ñ������ʤᤷ�ʤ���Фʤ�ʤ��ʤäƤ��ޤ������Τ��ᡢ��®�����ò¿®¾ï¿½È¤ï¿½ï¿½ï¿½Lucene�ˤϤ��Τ褦�ʸ������Ѱդ���Ƥ��ʤ���API��Ȥä����ʤ᤹�뤳�Ȥϲ�ǽ�ˡ��Ǥ� Lucene/Solr �Ǹ������׸����ϤǤ��ʤ��Τ��Ȥ����Ȥ���ʤ��ȤϤʤ�����ʸ������ž�������������פˤ���פȤ����ƥ��˥å���Ȥ�������Ū�ˤϥ���ǥå���¦��������¦������ ReverseStringFilter ��Ŭ�Ѥ��롣 ����ǥå���¦�Ǥϡ�KeywordTokenizer+ReverseStringFilter+EdgeNGramTokenFilter ���Ѥ��롣����ȡ�abcdefg �Ȥ���ʸ����ϡ����Τ褦�˥���ǥå������Ÿ������롣 g gf gfe gfed gfedc gfedcb gfedcba ������¦�� KeywordTokenizer+ReverseStringFilter ��Ŭ�Ѥ��롣����ȡ�efg �Ȥ���ʸ����θ������׸����ϡ�q=f1:efg ���ꤲ��Ф褤������������ե�����ɤ�ʸ���󤬵�ž���줿��Ǹ�������뤳�Ȥǡ�gfe ���ҥåȤ��롣 �ե����åȤˤ��ʤ���߸����� fq ��Ȥ�ʤ� �ե����åȤˤ��ʤ���߸����� fq ��Ȥ鷺��q �ѥ�᡼���� AND �ǹʤ���߾����ɲä��Ƥ����Τ�褯���� Solr ���뤢��Ǥ��뤬����������Ǥ��롣fq �ˤ�븡����̤ιʤ���ߤ⡢AND ����ɲäˤ�븡���ιʤ���ߤ⡢�֤����ʸ�ñ½¸¹ï¿½ï¿½Æ±ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½Î½ï¿½ï¿½Ö¡Ê¥ï¿½ó¥¥ó¥°¡Ë¤ò¸«¤Æ¤ß¤ï¿½ï¿½Î¾ï¿½Ô¤Ï°Û¤Ê¤ë¤³ï¿½È¤ï¿½ï¿½ï¤«ï¿½ë¡£ï¿½á¥¤ï¿½ï¿½Î¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½q �ѥ�᡼���ˤ� AND �ǹʤ���߾����ɲä��Ƥ��ޤ��ȡ�����ϥ������׻����оݤˤʤäƤ��ޤ��Τǡ�AND ��Ȥ���ˡ���ȹʤ���ि�Ӥ˥�ó¥¥ó¥°¤ï¿½ï¿½Û¤Ê¤Ã¤Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½Ä¤Þ¤ï¿½æ¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½é¸«ï¿½ï¿½È¡ï¿½ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½Ã¥È¤Î¥ï¿½ó¥¯¤ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ë¤¿ï¿½Ó¤ï¿½Ê¸ï¿½ï¿½ï¿½É½ï¿½ï¿½ï¿½ç¤¬ï¿½Û¤Ê¤Ã¤Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤Ï¥æ¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ð¤¹¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½filterCache �����ޤ��Ȥ��ʤ����Ȥ����̤������Ϥ��Ǥ��롣 �ե����åȤˤ��ʤ���߸����Ǥϡ�AND ��Ȥ鷺 fq ��Ȥ����ȳФ��Ƥ������� �������ˤ��Â�ڤ� �֥�������0.5̤����ʸ���Â�ڤꤷ���ֵѤ���ʤ��褦�ˤ������פʤɤȤ��ä���˾�� Solr ���뤢��Ǥ��뤬�������ʥ󥻥󥹤Ǥ��ꡢ�»ܤ��褦�Ȥ���ʳƮ���뤳�ȼ��Ρ����֤�̵�̤Ǥ��롣 �������ؿ��Υѥ�᡼�����Ϥ��줬�٥��ȥ���֥�ǥ�١����� TFIDFSimilarity �Ǥ��졢��Ψ��ǥ�١����� BM25Similarity �Ǥ��졢������ȳ�ʸ��Ǥ��롣�������ϥ��������ꤷ���Ȥ��ˡ���ʸ�����Ϳ���������������Ȥδ�Ϣ�ٹ礤��ɽ��������ΤǤ��뤫�顢�����꤬�Ѥ�äƤ��ޤ�����ߤ���Ӥ��뤳�Ȥ˰�̣���ʤ��������� A �ǥ����� 0.4 ���������ʸ�ñ¤¬¥ï¿½ï¿½ï¿½ï¿½ï¿½ A �����˴�Ϣ���Ƥ��뤬�������� B �ǥ����� 0.6 ���������ʸ�ñ¤¬¥ï¿½ï¿½ï¿½ï¿½ï¿½ B �ˤ������ƴ�Ϣ���Ƥ���Ȥϸ����ʤ����Ȥ������Ȥ��������ˤ��ꤨ�롣 �����Υϥ��饤�� �ϥ��饤�ȵ�ǽ�ʸ���������ɤ�ޤ�ʸ�ñ¥¹¥Ë¥Ú¥Ã¥È¤ï¿½ï¿½Ú¤ï¿½Ð¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¡¼ï¿½É¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤ï¿½ï¿½ï¿½Î©ï¿½Ä¤è¤¦ï¿½ï¿½É½ï¿½ï¿½ï¿½ï¿½ï¿½ëµ¡Ç½ï¿½Ë¤Ë¤Ä¤ï¿½ï¿½Æ¡ï¿½Solr �����äƤ���ϥ��饤�ȵ�ǽ��Ȥ鷺�ˡ�Web���ץꥱ��������JavaScript�ޤ�ˤǤ���ФäƤ��ޤ����Ȥ����Τ⤿�ޤ˸��� Solr ���뤢��Ǥ��롣Solr �˥ϥ��饤�ȵ�ǽ�����뤳�Ȥ��Τ餺�˼����Ǽ������Ƥ��ޤ��ѥ�����Solr�˥ϥ��饤�ȵ�ǽ�����뤳�Ȥ��ΤäƤ��뤬�������������褤�ȿ����Ƽ����Ǽ������Ƥ��ޤ��ѥ����󤬤���褦�������Ԥ��Τ�ʤ��ä��Ȥ������ȤǤ������ǤȤ���¾�ʤ�����Ԥϡ��ʤ����Τ褦��ȯ�ۤˤʤ�Τ��褯�狼��ʤ���Solr �Υϥ��饤�ȵ�ǽ�Ϥ���ۤɤ褯�Ǥ��Ƥ��롣�����Ǵ�ĥ�������Υ��åȤò¤¢¤ï¿½ï¿½Æ¾å¤²ï¿½ï¿½Ê¤ï¿½Ð¡ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½Ð¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Â¦ï¿½ï¿½CPU��٤ò²¼¤ï¿½ï¿½ë¤¿ï¿½á¡¢ï¿½æ¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Î¥Ö¥é¥¦ï¿½ï¿½ï¿½Ë¤ï¿½JavaScript�ǥϥ��饤�Ȥ����硢���ͤ����뤫�⤷��ʤ������������Υ��åȤ�΢�ǡ����ε�ǽ�������� Solr �����ˤ褯�Ǥ����ϥ��饤�ȵ�ǽ��Ȥ�ʤ��ǥ��åȤò¥¢¥×¥ê¥±ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ë¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Þ¤Ã¤Æ¤ï¿½ï¿½ë¤³ï¿½È¤Ëµï¿½ï¿½Å¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½
�Ȥ������Ȥǡ�Solr �Υϥ��饤�ȵ�ǽ�ϸ������ץꥱ�������Τ��Ȥ�褯��θ�����Ѱդ���Ƥ��ꡢ���Ѥ褯�Ǥ��Ƥ��롣�������Ѱդ���Ȥ����ΤϤ����Ƥ��ξ����äƤ�����Ǹ��Ǥ��롣 �ʤ���������̰���ɽ�����˥ϥ��饤�ȥ��˥ڥåȤ�����������桼�����Υ���å��ˤ������ʸ��ɽ�����˸���������ɤ������ʤ���Ω�Ĥ褦��ɽ�����뵡ǽ�ϡ������Ǥ����ϥ��饤�ȵ�ǽ�Ȥϰۤʤ롣�����Ǥ����ϥ��饤�ȵ�ǽ�ϡ�������̰���ɽ�����ˡ��桼������Ŭ�ڤ�ʸ��ò¤¹¤Ð¤ä¤¯ï¿½ï¿½ï¿½ò¤·¤Æ¤ï¿½é¤¦ï¿½ï¿½ï¿½ï¿½Î¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Èµï¿½Ç½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½æ¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Ï¥Ï¥ï¿½ï¿½é¥¤ï¿½È¥ï¿½ï¿½Ë¥Ú¥Ã¥È¤ï¿½Ê¸ï¿½ï¿½ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ë¤«ï¿½É¤ï¿½ï¿½ï¿½ï¿½Î»ï¿½ï¿½Í¤Ë¤ï¿½ï¿½ë¤¿ï¿½á¡¢Solr �Υϥ��饤�ȵ�ǽ�Ϥ����Ѷ�Ū�����Ѥ��褦�ʤ����������ʤμ̿��ò¸¡ºï¿½ï¿½ï¿½Ì°ï¿½ï¿½ï¿½ï¿½ï¿½É½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½è¤¦ï¿½ï¿½EC�����Ȥθ������ץ�Τ褦�ʤ�ΤϽ����ˡ� �����Ǿ夲���ʳ��ˤ⡢RDB �����������줿�ơ��֥�ò¤½¤Î¤Þ¤ï¿½ Solr �Υ���ǥå����������ޤ˥ޥåԥ󥰤� Solr �� JOIN ��Ȥ����뤢��䡢�ƻҳ��ؤΥե����åȤ����Ǵ�ĥ�뤢�뤢�롢��������ꤲ��ݤ˾�˥��֥륯�����ơ������Ǥ����äƤ��ޤ����뤢��ʤɤ⤢�롣 Solr �� OSS �ˤʤä�10ǯ��Ķ���륽�եȥ������Ȥʤä�����ǽ�⽽ʬË�٤ǡ���ꤿ�����Ȥ�¿���Ϥ��Ǥ� Solr �����äƤ��뤳�Ȥ�¿�������ץꥱ�������Ǵ�ĥ�����ˤ��ҥޥ˥奢���������dz������뵡ǽ���ʤ������Ȥ������������ƥ�����˱�ä���ΤˤʤäƤ��뤫�������ٹͤ��Ƥߤ뤳�Ȥò¤ª´ï¿½ï¿½á¤·ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ ���ơ���ǯ�Ϥ����餯�ܵ������Ǹ�Ȥʤ�Ǥ��礦���������٤����ʤ��ʤä��Τˤ⤫����餺����ǯ�⤴���ɤ����������꤬�Ȥ��������ޤ��������͡��褤��ǯ�ò¤ª·Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ 2017.08.25 Friday
LUCENE-6819: Good bye index-time boost
Lucene/Solr 7.0 �Υ�꡼������ɼ���ޤ�ʤ��Ϥޤ����Ȥ��Ƥ��롣 ����䤬���ˤʤäƤ���Τ���LUCENE-6819 �ˤ�뽤���Ǥ��롣 �����ȥ��������ȡ��֥���ǥ����󥰻��֡����ȡʽŤ��դ��ˤ�ߤ�褦�ס���äȿ��ɤߤ���ȡ֥�������˽Ť��դ�������Ǥ��뤫�饤��ǥ����󥰻��˽Ť��դ����ʤ��Ƥ⤤�󤸤�͡��פȤȤ館�Ƥ��ޤ����⤤�뤫�⤷��ʤ���������󡢥���ǥ����󥰻��Ť��դ��ȥ�������Ť��դ��Ǥϡ��ç¤ï¿½ï¿½ï¿½ï¿½Ì£ï¿½ï¿½ï¿½ï¿½Åªï¿½ï¿½ï¿½Û¤Ê¤ë¡£[1] �ܵ����Ǥϡ�LUCENE-6819 �ȴ�Ϣ����¾�ν����ȹ�碌��Lucene 7.0 �ʹߤ� fieldNorm ���ɤΤ褦�ˤʤ뤫����ñ�˲��⤷�褦�� LUCENE-6819���Υ����åȤǤϡ������ȥ��̤ꡢ����ǥ����󥰻���boost��ߤ�褦���Ȥ�����ΤǤ��롣�⤦�����٤��������ȡ�fieldNorm �Ȥ������ƥɥ�����Ȥγƥե�����ɡʤ�������omitNorms=true�Υե�����ɤ�����ˤ��Ȥ�1�Х��Ȥǥ��󥳡��ǥ��󥰤��줿�ͤΡ����μ�����boost������������Ȥ�����ΤǤ��롣 fieldNorm = lengthNorm * boost ���ʤߤˤ��μ��ϡ�BM25Similarity ���о줹������Ρ�Lucene/Solr ��ɸ��Ǥ��ä��٥��ȥ���֥�ǥ�ˤ���ó¥¥ó¥°·×»ï¿½ï¿½Ê³Æ¥É¥ï¿½ï¿½ï¿½ï¿½ï¿½È¤Î¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ù·×»ï¿½ï¿½Ë¤Ë¶ï¿½ï¿½ï¿½ï¿½ï¿½Â¸ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ë¡£Lucene/Solr �ϥ�����������٥١�����������Τ褦�˷׻����줿 fieldNorm ���̣���Ƥ��ꡢBM25 �ǹ�θ����Ƥ���ʸ��Ĺ�ϤȤä��˲�̣����Ƥ��롣[2] ���ơ����줫��boost������С����Τ褦�ˤʤ롣 fieldNorm = lengthNorm lengthNorm �ϥǥե���ȤǼ��Τ褦�˷׻�����롣 lengthNorm = 1 / sqrt(numTerms) numTerms�������ե�����ɤ�ñ����Ǥ��롣�������äơ�lengthNorm �����1��� 1 / sqrt(1) = 1 / 1 = 1�����2��� 1 / sqrt(2) = 1 / 1.414213562373095�� = 0.70710678118655���Ȥ������ˤʤ�Τǡ�0 < lengthNorm <= 1 �Ȥʤ롣��������Lucene 6.x �����Ǥϡ�boost �� lengthNorm �ˤ����ä� fieldNorm ����¸���ʤ���Фʤ餺��boost ���ͤϥ桼�������꼡��ʤΤǺǰ� Float.MAX_VALUE �ˤʤäƤ��ޤ����Ȥ�ͤ����롣������ 0 ���� Float.MAX_VALUE �ι�����Ͱ��鷺��1�Х��Ȥ˥ޥåԥ󥰤��Ƥ���Τ� Lucene 6.x �Ǥ���ʲ��ޡˡ� ���������ۤȤ�ɤΥ桼�����ϥ���ǥ����󥰻��Ť��դ��ʤɤ��ʤ��Τǡ���ޥ��졼����ʬ����ä����ʤ���boost�����������ǽ����ͤ���1�Х��Ȥ�Ⱦʬ��Ĥ��Ƥ���Τ������ۤȤ�ɤΥ桼������boost=1�ʤΤǡ����ˤ�ä����ʤ������ġ������٤ʻȤ����ˤʤäƤ��롣 �����ǡ�LUCENE-6819 �Ǥϡ�
LUCENE-6819 �ǤϤޤ� fieldNorm ��1�Х��Ȥ�Í�����Ѥ���Ƥ��ʤ��������Ǥ��Υ����åȤǤϡ��ե������Ĺ��ñ����ˤ�SmallFloat.intoTobyte4(numTerms)�ǥ��󥳡��ɤ�����¸����褦�˽������줿�����Ȥ���ñ�����1000�ΤȤ��ϡ�SmallFloat.intoTobyte4(1000)��87�Ȥʤꡢ���줬��Ͽ����롣�ǥ����ɤˤ�SmallFloat.byte4ToInt()���Ȥ��롣SmallFloat.byte4ToInt(87)��984�Ȥʤ롣���ʤߤˡ�1��40�ϥ��󥳡��ɤ��Ƥ�1��40����Ϥ�����Ѥ��ʤ��ˡ�Ĺ��41��������˥��󥳡��ɤˤ��������ç¤ï¿½ï¿½ï¿½Ê¤ï¿½Ê²ï¿½ï¿½Þ¡Ë¡ï¿½ Lucene 6.x �Ǥ� boost / sqrt(numTerms) �η׻���̤�1�Х��Ȥ˰��̤���Ƶ�Ͽ����Ƥ�������Lucene 7.0�ʹߤǤ� numTerms �����Τޤޥ��󥳡��ɤ����1�Х��Ȥ˵�Ͽ����롣��Ϥ� BM25Similarity ��ɸ��ʤΤǡ�1 / sqrt(numTerms) ��׻������̣���Τ��ʤ����Ȥ������Ȥ��� Lucene/Solr 7.0 �ʹߤ� omitNorms ��������Ѥ�餺Í���Ǥ��롣 ���󥦥��åȤμ����ٶ���������˻��Ѥ������饤����������Ƥ���Τǡ���碌���ɤ���ߤ����� [1] ���󥦥��åȼ�ŤΥȥ졼�˥󥰥�������Solr�����פǤϡ����Τ褦������ʴ��ä���ǫ�˶����Ƥ��ޤ������������ߤϤ����餫���� [2] ���󥦥��åȼ�ŤΥȥ졼�˥󥰥�������Apache Mahout �� Spark�ǤϤ���뵡���ؽ��פǤϡ������ؽ��Τ��ޤ��ޤʥ�ǥ�䥢�르�ꥺ�����ǫ�˲��⤷�Ƥ��ޤ����١����å��ʵ����ؽ��β���ˤȤɤޤ餺���ܵ����ˤ���褦�ʡ��������󥸥�Υ�ó¥¥ó¥°¤Î¸ï¿½ï¿½Ë¤Ê¤ë¥¹ï¿½ï¿½ï¿½ï¿½ï¿½×»ï¿½ï¿½Î¹Í¤ï¿½ï¿½ï¿½ï¿½Ê¤É¤ï¿½Þ¤ß¡ï¿½ï¿½ï¿½ï¿½Õ¥È¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¯ï¿½Î¼ÂºÝ¤Î¸ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î©ï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½å¤·ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ß¤Ï¤ï¿½ï¿½ï¿½ï¿½é¤«ï¿½ï¿½ï¿½ï¿½ 2017.01.21 Saturday
Solr �ǥ�ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½ï¿½Î¸ï¿½ï¿½ï¿½ï¿½ï¿½
�Ƕ��ˡʡ��˵Ӹ�����ӤƤ����֥�ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½Learning-to-Rank���ʲ�LTR��ά�����Ȥ⤢��ˡפ�Solr�ǻ��ˡ��Ҳ𤷤褦�� Solr�ǥ�ó¥¥ó¥°³Ø½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½È¡ï¿½ï¿½Þ¤ï¿½Ê¤ï¿½ï¿½ï¿½ê¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ Solr 6.4.0 �˴ޤޤ�� SOLR-8542 ��פ��⤫�٤����⤤�뤫�⤷��ʤ��������������Ǥϡ�����19�� Lucene/Solr �ٶ����פ������ޡ����λ��ܼ�Ĺ��ȯɽ����NLP4L��Ȥä���ˡ��Τ�夲�������ʲ�����ͳ����SOLR-8542���Ϥ뤫�˻Ȥ�������ñ���������
�ޤ���NLP4L�ϥ�ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¸ï¿½Ç¾Ò²ð¤µ¤ï¿½ï¿½ï¿½ï¿½ï¿½Åªï¿½ï¿½ï¿½ï¿½Ä§ï¿½Ç¤ï¿½ï¿½ï¿½TF��IDF��TF*IDF�ʤɤ�������Τ��Ф���SOLR-8542�Ϥ�������ħ�ϰ����ʤ��ʰ���ͽ���ʤ������ˡ������SOLR-8542�ϥ�ǥ�����Ѥ�����ʬ��Solr�˰�¸����Ȥ������ç¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½NLP4L�Ǥ��������ʬ��Lucene��٥�ǹԤ�졢LTR�μ��פ���ʬ�μ���������ѥ��ȤǤ��롣�������ä�Elasticsearch�б������Ū��ñ�Ǥ��롣�������SOLR-8542������ͥ��Ƥ�����ʬ�⤢�ꡢ���Ѥ�����ħ��Solr�Υ����꼰�ǽñ¤±¤ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½NLP4L��������ͥ����������Ȥ����롣 LTR�γؽ��ǡ����ˤĤ���LTR�϶��դ���ؽ�����ܤȤ��롣���եǡ����Υ��᡼���ϡ��֤��륯����ˤĤ��Ƥ��줾���ʸ�ñ¤¬¤É¤Î¤ï¿½ï¿½é¤¤ï¿½ï¿½Ï¢ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ë¤«ï¿½×¤È¤ï¿½ï¿½ï¿½ï¿½ï¿½Ù¥ï¿½ï¿½NLP4L�Ǥ�relevance degree�ȸƤ�Ǥ���ˤ��Ĥ����Ƥ����Ϣ�Υ쥳���ɤǤ��롣���η����Υǡ�����ؽ����륢�르�ꥺ���Pointwise���ץ������ȸƤФ�롣¾�ˤ�֤��륯����ˤĤ���ʸ��ڥ����ɤ��餬����Ϣ�٤��⤤���פȤ�����٥뤬�Ĥ����ǡ����ò°·¤ï¿½Pairwise���֤��륯����ˤĤ���ʸ�ñ½¸¹ç¤¬ï¿½ï¿½Ï¢ï¿½Ù½ï¿½Ë¤É¤Î¤è¤¦ï¿½Ë¥ê¥¹ï¿½È¤ï¿½ï¿½ï¿½ë¤«ï¿½×¤È¤ï¿½ï¿½ï¿½ï¿½ï¿½Ù¥ë¤¬ï¿½Ä¤ï¿½ï¿½ï¿½ï¿½Ç¡ï¿½ï¿½ï¿½ï¿½ò°·¤ï¿½Listwise�����롣 ���Ҥ�3���ܤΡֶ��եǡ����ν�����ˡ�פˤĤ��ơ�NLP4L�Ǥ���°�Ρ֥��Υơ������GUI�פ�Ȥä���ˡ�ȡ��֥�������������aka ����������NLP4L�Ǥ����̤ˡ֥���ץ�å��������פȸƤ���̾�Υ������������ȶ��̤��Ƥ���ˤ��饯��å���ǥ��׻����ƴ�Ϣ�٤�ư���Ф������ˡ���Ѱդ���Ƥ��롣�ܵ����Ǥ������δ�ά���Τ������ԤΡ֥��Υơ������GUI�פ�Ȥä���ˡ��Τ�夲�롣���������֥����ƥ�ǿͼ�ǥ��Υơ�������Ԥ��ΤϤʤˤ������Ѥ����ܵ������ɤ�ǥ�ó¥¥ó¥°³Ø½ï¿½ï¿½Î¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ò¤·¤ï¿½ï¿½é¡¢ï¿½ï¿½ï¿½Ò¸ï¿½Ô¤Î¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½é¥¯ï¿½ï¿½Ã¥ï¿½ï¿½ï¿½Ç¥ï¿½ï¿½×»ï¿½ï¿½ï¿½ï¿½Æ´ï¿½Ï¢ï¿½Ù¤ï¿½Æ°ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ë¡ï¿½ï¿½Ä©ï¿½ï¤·ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ �ʲ���NLP4L�ǤΥ�ó¥¥ó¥°³Ø½ï¿½ï¿½Î¼ï¿½ï¿½Î³ï¿½ï¿½×¤Ç¤ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½Î½ï¿½ï¿½Ö¤Ç°Ê¹ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤ï¿½ë¡£
���Ҥ��̤ꡢNLP4L�ˤϾܺ٤�LTR�Τ�����ޥ˥奢������°����ΤǤ����Ǥ��餿��Ƶ����ˤ���ɬ�פ�ʤ����⤷��ʤ������ܵ����Ǥ��о�ʸ�ñ¥»¥Ã¥È¤È¤ï¿½ï¿½ï¿½livedoor�˥塼�������ѥ����Ѥ��뤳�Ȥǡ���Ϣ�������ˡ�������Ū���������Ƥ��롣��̣�Τ������ϡ���������̤�˥��󥹥ȡ���������ʤ��С�Solr�ǥ�ó¥¥ó¥°³Ø½ï¿½ï¿½Î¼ÂºÝ¤ï¿½ï¿½Î¸ï¿½ï¿½ï¿½ï¿½ë¤³ï¿½È¤ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ÜºÙ¤ï¿½ï¿½Î¤ê¤¿ï¿½ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½NLP4L�Υޥ˥奢�������������������� Solr 6.3.0 �Υ��󥹥ȡ���Ŭ���ʥǥ��쥯�ȥ�ˤ�Solr 6.3.0���ܵ�����ɮ���κǿ��ˤ����������ɡ�Ÿ�����롣�ܵ����Ǥ�/opt/nlp4l/solr-6.3.0�ǥ��쥯�ȥ��Solr��Ÿ������Ȥ����ä�ʤ�롣 $ pwd /opt/nlp4l $ wget http://ftp.tsukuba.wide.ad.jp/software/apache/lucene/solr/6.3.0/solr-6.3.0.tgz $ tar xvzf solr-6.3.0.tgz Solr��ư����collection1�Ȥ���̾���Υ�����������롣 $ cd solr-6.3.0 $ ./bin/solr start $ ./bin/solr create_core -c collection1 -d sample_techproducts_configs LTR�����Solr�ץ饰����Υӥ�ɤ�����Ŭ���ʥǥ��쥯�ȥ�ˤ�github�Ǹ�������Ƥ���NLP4L/solr�ץ��������������������ɡ��ӥ�ɡ�Solr�˥ǥץ������롣 $ pwd /somewhere/NLP4L $ git clone https://github.com/NLP4L/solr.git $ cd solr $ mvn package $ cp target/nlp4l-solr-1.1-SNAPSHOT.jar /opt/nlp4l/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib NLP4L/solr��typesafe�Ҥ�config�饤�֥�꤬ɬ�פʤΤǡ��ʲ��Τ褦�����ꡢ�ӥ�ɡ�Solr�˥ǥץ������롣 $ cd /somewhere $ wget https://github.com/typesafehub/config/archive/v1.3.1.tar.gz $ tar xvzf v1.3.1.tar.gz $ cd config-1.3.1 $ sbt package $ cp config/target/config-1.3.0-20170120T044439.jar /opt/nlp4l/solr-6.3.0/server/solr-webapp/webapp/WEB-INF/lib Solr����°��Jetty��webapp����ò¼¡¤Î¤è¤¦ï¿½Ë¹Ô¤ï¿½ï¿½ï¿½ $ cd /opt/nlp4l/solr-6.3.0 $ vi server/solr-webapp/webapp/WEB-INF/web.xml
�ޤ��������ǽ����ħ��ʲ��Τ褦��JSON����ե�������Ѱդ��롣 $ vi server/solr/collection1/conf/ltr_features.conf { "features": [ { "name": "TF in title", "class": "org.nlp4l.solr.ltr.FieldFeatureTFExtractorFactory", "params": { "field": "title" } }, { "name": "TF in body", "class": "org.nlp4l.solr.ltr.FieldFeatureTFExtractorFactory", "params": { "field": "body" } }, { "name": "IDF in title", "class": "org.nlp4l.solr.ltr.FieldFeatureIDFExtractorFactory", "params": { "field": "title" } }, { "name": "IDF in body", "class": "org.nlp4l.solr.ltr.FieldFeatureIDFExtractorFactory", "params": { "field": "body" } }, { "name": "TF*IDF in title", "class": "org.nlp4l.solr.ltr.FieldFeatureTFIDFExtractorFactory", "params": { "field": "title" } }, { "name": "TF*IDF in body", "class": "org.nlp4l.solr.ltr.FieldFeatureTFIDFExtractorFactory", "params": { "field": "body" } } ] } �ޤ���solrconfig.xml��ʲ��Τ褦���Խ����롣 $ vi server/solr/collection1/conf/solrconfig.xml ɸ��� /select �ꥯ�����ȥϥ�ɥ��defaults�ѥ�᡼���˰ʲ����ɲä��롣
NLP4L-LTR�������ħ��Хꥯ�����Ȥ˱������뤿��Υꥯ�����ȥϥ�ɥ��ʲ��Τ褦�����ꤹ�롣
�Ǹ�ˡ�PRank�������ƤӽФ�PRankQParserPlugin�򼡤Τ褦�����ꤹ���PRank�ʳ���RankingSVM�������ǽ�ˡ�
�ޤ�Solr���뤢��Ǥ��뤬��Elevation Component��Ϣ������⤳���Ǻ�����Ƥ������� livedoor �˥塼�������ѥ�����Ͽlivedoor �˥塼�������ѥ���Solr����Ͽ���뤿��ˡ��ޤ��������ޤ����ꤹ���API���̤��ƥ��������ѹ�����Τ��侩���������ݤʤΤǥ��ǥ������Խ����Ƥ��ޤ��ˡ� $ vi server/solr/collection1/conf/managed-schema
��������פ�copyField����ϼ������Ƥ�����������Solr������ѹ���Í�������뤿��˥����Υ�����ɤޤ��ϺƵ�ư��Ԥ��� ������livedoor �˥塼�������ѥ������ꤷ��Solr����Ͽ���롣 $ cd /somewhere $ wget http://www.rondhuit.com/download/livedoor-news-data.tar.gz $ tar xvzf livedoor-news-data.tar.gz $ /opt/nlp4l/solr-6.3.0/bin/post *.xml NLP4L�Υ��󥹥ȡ���ȵ�ư���������NLP4L/nlp4l �ץ����������� github ����ʲ��Τ褦�����ꤷ��ư���롣 $ cd /somewhere/NLP4L $ git clone https://github.com/NLP4L/nlp4l.git $ cd nlp4l $ cp conf/application.conf.sample conf/application.conf $ vi conf/application.conf $ ./activator run ������Web�֥饦������ http://localhost:9000/ �˥����������롣NLP4L-LTR�Ȥ�����˥塼�����ӡ����̾�����ɽ��������˥塼��Config�ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½Æ¡ï¿½ï¿½New����󥯤ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ï¿½È¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Config����ò¤¹¤ï¿½è¤¦ï¿½ï¿½Â¥ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½É½ï¿½ò»²¹Í¤ï¿½ï¿½ï¿½ï¿½ê¤¹ï¿½ë¡£
[Save]�ܥ���ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ï¿½[Load]�ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ï¿½È¡ï¿½ï¿½ï¿½ï¿½Ì¾ï¿½ï¿½ï¿½ï¿½ï¿½test�Ȥ���̾����Config�������ɤ��줿���Ȥ������졢Query��Annotation��Feature��Training�Ȥ�����˥塼���ܤ�����롣 ���եǡ����κ������եǡ���������������Query�ޤ���Annotation��˥塼��Ȥ����ʤ����Ҥ��̤ꡢ�ܵ����Ǥϥ��Υơ������GUI��Ȥä����եǡ����κ�����ˡ���������롣�������������ʥ���ץ�å��������ˤ��饯��å���ǥ��׻����뤳�ȤǶ��եǡ�����²����Ѱդ��������ϡ�Query��˥塼��[Import]�ܥ���ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½É¤ï¿½ï¿½ë¡£ Query��˥塼�ˤϥե���������ò¤¹¤ï¿½Ü¥ï¿½ï¿½ó¤¬¤ï¿½ï¿½ê¡¢ï¿½ï¿½ï¿½ï¿½ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ï¿½È¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ü¤ï¿½ï¿½ì¤¿ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½È¥Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ù¤ï¿½è¤¦ï¿½Ë¤Ê¤ë¡£ï¿½ï¿½ï¿½Î¥Æ¥ï¿½ï¿½ï¿½ï¿½È¥Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½1��1������òµºÜ¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¥Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½É¤ï¿½ï¿½ï¿½È¡ï¿½ï¿½ï¿½ï¿½Þ¤Î¤è¤¦ï¿½Ë¥ï¿½ï¿½ï¿½ï¿½ê¤¬É½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ë¡£
���Υ�����Υ�󥯤򥯥�å������Ϣ�����Solr�ˤƥ����꤬�¹Ԥ��졢Annotation���̤˷�̤�ɽ������롣���β��̤Ǥ�Search�ƥ����ȥܥå��������򤷤������꤬ɽ������Ƥ��뤬��Query��˥塼����Ϥ�ʤ��Ƥ⡢Annotation��˥塼������Ǥ���Search�ƥ����ȥܥå�����ľ�ܥ����������Ƥ�Ʊ���Ǥ��롣
Annotation���̤ˤƳƥ�������֤��줿ʸ��δ�Ϣ�٤�ͤ��ʤ��顢��¦��ɽ������Ƥ������ο���Ŭ�ڤ�����Ǥ�����1����ʬ����ä��顢[Save]�ܥ������¸����[Next]�Ǽ��Υ�����˿ʤࡣ���̤꽪λ�����鼡�Υ��ƥåפǤ�����ħ��Ф˿ʤࡣ ��ħ���Feature��˥塼�����򤹤롣����Ȳ��ޤΤ褦�ʥץ����쥹�С�������롣[Extract]����ħ��Ф����Ϥ��뤬�����Ǥ˥ץ����쥹�С����п��ˤʤäƤ���Ȥ��ϡ��ޤ�[Clear]�ܥ���򥯥�å����Ƥ���[Extract]��Ԥ�����λ�����DONE��ɽ������롣
��ó¥¥ó¥°³Ø½ï¿½Trainig��˥塼�����ò¤·¡ï¿½ï¿½ï¿½Â¦ï¿½Ë¤ï¿½ï¿½ï¿½ï¿½ï¿½New���ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½Solr�����֤�������ե������ltr_features.conf�ˤˤ������ä���ħ�����֥����å��ܥå���������롣���Υ����å��ܥå�����Ŭ������ħ�������[Start]�ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½Ø½ï¿½ï¿½Î¿Ê¹Ô¾ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ì¥¹ï¿½Ð¡ï¿½ï¿½Ç¼ï¿½ï¿½ï¿½ï¿½ì¡¢ï¿½ï¿½Î»ï¿½ï¿½ï¿½ï¿½È²ï¿½ï¿½Î¤è¤¦ï¿½Ë¥ï¿½Ç¥ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½É½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½PRank�����ò¤·¤ï¿½ï¿½Î¤Ç¡ï¿½ï¿½ï¿½ï¿½ï¿½Ä§ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½Å¤ß¤È¡ï¿½3�Ĥ����͡����ο������3�ˤ������Ȥ�ͳ��ˤ��֤���롣 { "model" : { "name" : "prank", "type" : "prank", "weights" : [ { "name" : "TF in title", "weight" : -113 }, { "name" : "TF in body", "weight" : -14 }, { "name" : "IDF in body", "weight" : -322.44384765625 } ], "bs" : [ -2818, -2374, 6450 ] } } ��ǥ�Υǥץ�����ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½ï¿½ï¿½Î»ï¿½ï¿½ï¿½ï¿½ï¿½[Deploy]�ܥ���ɽ�������Τǡ�����ò¥¯¥ï¿½Ã¥ï¿½ï¿½ï¿½ï¿½ë¤³ï¿½È¤ï¿½Ï¢ï¿½ï¿½ï¿½ï¿½ï¿½Solr��HTTP��ž������롣
Solr¦�ǤϤ��γؽ���ǥ��Í�������뤿��ˡ������Υ�����ɤ�ɬ�פǤ��롣Solr�δ������̤��饳���Υ�����ɤ�Ԥ��Τ���ñ���� ��ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½Ç¥ï¿½ï¿½È¤Ã¤ï¿½ï¿½ï¿½ï¿½ó¥¥ó¥°¤Î¼Âºï¿½NLP4L-LTR��SOLR-8542�ȤϰۤʤꡢSolrɸ��Υ��󥯤�ȤäƸƤӽФ��Τ�rq={!rerank ��}�ѥ�᡼�������Ѥ��롣���Ҥ�solrconfig.xml�����ꤷ��prank queryParser��ƤӽФ��ˤϼ��Τ褦�˸�����¹Ԥ��롣 http://localhost:8983/solr/collection1/select?indent=on&q=�Ҳ�� ����&rq={!rerank reRankQuery=$rqq}&rqq={!prank}�Ҳ�� ����&debugQuery=on Solrɸ��Υ��󥯤ϡ������Ҥ�solrconfig.xml������ˤ��������˺ǽ���̾��edismax�������¹Ԥ������˾�̤�200�������reRankDocs�Υǥե���ȡˤˤĤ��ƥ�ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½Ç¥ï¿½Ë´ï¿½Å¤ï¿½ï¿½ï¿½ï¿½Æ¥ï¿½ó¥¥ó¥°¤ï¿½Ô¤ï¿½ï¿½ï¿½ï¿½Ç¥Ð¥Ã¥ï¿½ï¿½ï¿½ï¿½ï¿½Î¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Üºï¿½ï¿½ï¿½ò¸«¤ï¿½ÈºÇ½ï¿½ï¿½2�Ԥϼ��Τ褦�ˤʤäƤ��롣 14.37461 = combined first and second pass score using class org.apache.solr.search.ReRankQParserPlugin$ReRankQueryRescorer 10.37461 = first pass score �ǽ�ιԤ�2�ʳ�������Υ��������¤�14.37461�ˤʤäƤ��뤳�Ȥò¼¨¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¹Ô¤ï¿½ï¿½ï¿½1�ʳ��Υ�����ʤ����Ǥ��̾��edismax�ˤΥ�������10.37461�Ǥ��뤳�Ȥò¼¨¤ï¿½ï¿½Æ¤ï¿½ï¿½ë¡£3���ܰʹߤ�edismax�Υ������ܺ٤ʤΤǤ����Ǥϥ����åפ��褦�� ��2�ʳ��Υ����ꡢ�Ĥޤ�PRank�Υ�����Ƿ׻�������������פ�2.0�Ǥ��ꡢ���ξܺ٤ϲ����Υ֥��å��Ǽ�����Ƥ��롣 2.0 = second pass score 2.0 = is the index of bs(-2818.0,-2374.0,6450.0) > -2338.074707 sum of: -113.0 = weight: -113.0 * feature: 1.0 sum of: 0.0 = no matching terms 0.0 = no matching terms 1.0 = freq: 1 -84.0 = weight: -14.0 * feature: 6.0 sum of: 2.0 = freq: 2 3.0 = freq: 3 1.0 = freq: 1 -2141.0747 = weight: -322.44385 * feature: 6.640147 sum of: 2.762864 = log(numDocs: 7368/docFreq: 465) 0.55211145 = log(numDocs: 7368/docFreq: 4242) 3.3251717 = log(numDocs: 7368/docFreq: 265) 2.0�Ȥ�����������2�ܡ�reRankWeight�Υǥե���ȡˤ���4.0�Ȥʤꡢ�ǽ���������פ�14.37461�Ȥʤ롣2.0�Ȥ����������Ϥɤ���äƷ׻����줿���Ȥ����ȡ���ǥ��Ϳ����줿�Ťߤ�ȤäƷ׻�������̤�-2338.074707�Ȥʤ뤬������϶����ͤ�[-2374.0,6450.0]���ϰ���ˤ��뤿��2.0�Ȥʤ롣 �ޤȤ�livedoor�˥塼�������ѥ��ϥե�����ɿ���ʸ����Ȥ⾯�ʤ��Τǽ�ʬ�ʶ��եǡ�������Τ��ñ¤·¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Â±ï¿½ï¿½Ñ¤ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½É¿ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ç¤ï¿½Ê¥Ç¡ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½ï¿½ï¿½ï¿½Æ¡ï¿½ï¿½ï¿½ó¥¥ó¥°³Ø½ï¿½ï¿½ï¿½ï¿½Ì¾ï¿½ï¿½Similarity�����Ф��륹���������ɤ��������뤳�Ȥ����ԤǤ��롣���һ�Ƥ������������� 2016.07.09 Saturday
�����ե졼����Хġ��� KEA �� Lucene ��Ǽ�������
�����ե졼����Хġ��� KEA �� Lucene �饤�֥���ȤäƼ������� KEA-lucene ��ȯ�����ΤǾҲ𤷤�����Solr �� Elasticsearch ���Ѥ��֤��� Lucene API ���ٶ����Ƥ���ͤ⻲�ͤˤʤ�Ȼפ��� KEA �Ȥϡ�KEA�ϥ˥塼�������ɤΥ磻������ؤdz�ȯ����Ƥ��롢��������ǽñ¤«¤ì¤¿Ê¸ï¿½ñ¤«¤é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½Ê¥ï¿½ï¿½ï¿½ï¿½ï¡¼ï¿½É¡Ë¤ï¿½Æ°ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£KEA��Keyphrase Extraction Algorithm��ά�Ǥ��ꡢKEA�ץ������������륢�르�ꥺ�ऽ�Τ�Τ�ؤ����⤢�롣 ʸ����ղä��줿�����ե졼���ϡ�����ʸ��ΰ�̣Ū�᥿�ǡ����Ǥ��ꡢʸ��ζ�ü��û�����ޥ꡼�Ȥ�����롣���Τ��ᡢʸ����ɤ߹�����֤��ʤ����ˤ���ʸ��Υ᥿�ǡ����Ǥ��ë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½Î¥ê¥¹ï¿½È¤ï¿½Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½Ä¤ï¿½ï¿½à¤³ï¿½È¤ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½È¤ï¿½ï¿½Ð¸Â¤ï¿½ì¤¿ï¿½ï¿½ï¿½Ö¤Ç¤ï¿½ï¿½ï¿½Ä´ï¿½ï¿½Êªï¿½ò¤·¤Æ¤ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½è¤¦ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ä´ï¿½ï¿½ï¿½Ë´Ø·ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½ï¿½Ü¤ï¿½ï¿½ï¿½ï¿½Ë»ï¿½ï¿½Ñ¤ß¤ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ë¤¬ï¿½ï¿½ï¿½ï¿½ï¿½Ù¤Æ¤ï¿½ï¿½Ü¤ï¿½ï¿½Ì¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½Ö¤Ï¤Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤È¤ï¿½ï¿½Ï¡ï¿½Ê¸ï¿½ï¿½Ë¿ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Þ¤ï¿½ï¿½Ê¤ï¿½ï¿½ï¿½Æ¡ï¿½ï¿½Ø·ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½É¤ß»Ï¤ï¿½ë¤³ï¿½È¤ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£ �ؽ���ʸ�ʤɤǤϤ�����ʸ�����Ԥˤ�äƥ����ե졼�����ղä���Ƥ����Τ�¿�����������ʤ�����̤�ʸ�����Ҥϥ����ե졼�����Ĥ����Ƥ��ʤ���Τ��ۤȤ�ɤǤ��롣KEA�Ϥ��Τ褦��ʸ�ñ¤«¤é¼«Æ°Åªï¿½Ë¥ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½è¤¦ï¿½È¤ï¿½ï¿½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£KEA�����Ԥˤ�ê¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Õ²Ã¤ï¿½ï¿½ì¤¿Ê¸ï¿½ï¿½ï¿½ï¿½É¤ß¹ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ï¿½ï¿½ï¿½Ä§ï¿½ï¿½Ø½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Ä¤ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Ê¤ï¿½Ì¤ï¿½Î¤ï¿½Ê¸ï¿½ñ¤«¤é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Æ°ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Õ¤ï¿½ï¿½êµ¡ï¿½ï¿½ï¿½Ø½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£KEA�ϸ���ʱѸ�����ܸ�ʤɡˤ˴ط��ʤ�ư���ǽ�ʥ��르�ꥺ��ȤʤäƤ��롣 �����ե졼����ФȾ��󸡺��δط��ܹƤ�Lucene����ǥå������é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½è¤¦ï¿½È¤ï¿½ï¿½ï¿½ï¿½Ã¤Ê¤Î¤Ç¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Ð¤È¾ï¿½ï¿½ó¸¡ºï¿½ï¿½Î´Ø·ï¿½ï¿½Ë¤Ä¤ï¿½ï¿½Æ¤â¿¨ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ �ޤ�KEA�κ�Ԥ�ϥ�����ɤȤ������դǤϤʤ��������ե졼���Ȥ������դ�ȤäƤ����������ܤ�������������ɤȤ����Ƚ��פʡʥ�����1�Ĥ�ñ���Ϣ�ۤ��뤬����ɤǤϤʤ��ե졼��������ˤȤ������դ�Ȥ����Ȥǡ������Ȥʤ�1�İʾ��ñ���Ϣ�ʤ��ʸ�ñ¤«¤ï¿½ï¿½ï¿½Ð¤Ç¤ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½Ä´ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ë¡£ Lucene����ǥå������é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤Ç¤ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½È¡ï¿½ï¿½ï¿½ï¿½ó¸¡ºï¿½ï¿½Ë¤Ï¤É¤ï¿½Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Þ¤ï¿½ï¿½×¤ï¿½ï¿½â¤«ï¿½Ö¤Î¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤Þ¤ï¿½ï¿½ï¿½Ì¾ï¿½ò¥ª¡ï¿½ï¿½È¥ï¿½ï¿½ï¿½×¥ê¡¼ï¿½È¡Ë¤Ç¤ï¿½ï¿½ë¡£Lucene����ǥå����Ϥɤ����Ƥ�ñ��ñ�̤�ʸ���󤬴�������Ƥ���Τǡ���������������ñ��ñ�̤ȤʤäƤ��ޤ��������������Ȥʤ�ե졼������ư��ФǤ����Ϣ³����ʣ����ñ�줬���٤˥��������ȤǤ������äȤ��꤬���ߤ��������⤷�����Ƹ�����Ʊ�ͤǤ��롣 �ޤ�������̰�����ɽ������Ȥ��ˡ��ϥ��饤�ȵ�ǽ������⤷���ϥϥ��饤�ȵ�ǽ�Ȱ�������ʸ��Υ����ե졼����ɽ�����뤳�Ȥǥ桼����ʸ������ν����Ȥ��뤳�Ȥ�ͤ����롣����ˤϥե����åȡʹʤ���߸����ˤΥ����Ȥ��뤳�Ȥ�ͤ�����Τǡ������ե졼������ФǤ��뤳�ȤΥ��åȤ��ç¤ï¿½ï¿½ï¿½ï¿½ KEA�������KEA����ʸ�ϳ����ñ¤·¤ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ñ¤¤¤Æ¤ï¿½ï¿½ï¿½ï¤±ï¿½Ç¤Ï¤Ê¤ï¿½ï¿½Î¤Ç¡ï¿½ï¿½ï¿½ï¿½Ö¤Î¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ë¤Ï¤ï¿½ï¿½Ò°ï¿½ï¿½É¤ò¤ª´ï¿½ï¿½á¤¹ï¿½ë¡£ï¿½Ü¹Æ¤Ç¤Ï¤Î¤ï¿½ï¿½Î¤ï¿½Lucene�饤�֥���Ȥä�KEA�μ�������⤹��Τǡ����줬����Ǥ���褦����¤�������Ԥ����Ȥˤ��褦�� KEA�ˤ���������ϡֳؽ������פȡ֥����ե졼����в����פ����̤���뤬��ξ�Ԥ˶��̤ʽ����Ȥ��ƥ����ե졼���θ����ꥹ�ȥ��åפ���ץ�������¸�ߤ��롣�����ե졼������ϵ���Ū�˥ꥹ�ȥ��åפ���롣���Τ褦�˥ꥹ�ȥ��åפ��줿¿���Υ����ե졼�����䤫�顢�ؽ����ϥ����ե졼���ˤʤ�ʤ�䤹���ʤޤ��Ϥʤ�ˤ����ˤ�ؽ����롣�����ƥ����ե졼����л��ϳؽ�������Ψ��ǥ�ò»²¾È¤ï¿½ï¿½Æ¡ï¿½Â¿ï¿½ï¿½ï¿½Î¥ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ò¥¹¥ï¿½ï¿½ï¿½ï¿½Å¤ï¿½ï¿½ï¿½ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ç¤ï¿½ï¿½ï¿½ï¿½Î¤ï¿½ï¿½ï¿½ï¿½Ë¥ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½É½ï¿½ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ÂºÝ¤Î¥ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Ð»ï¿½ï¿½Ï¤ï¿½ï¿½Î¤è¤¦ï¿½Ë½ï¿½ï¿½ï¿½Õ¤ï¿½ï¿½ï¿½ï¿½ì¤¿ï¿½ï¿½ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¥ê¥¹ï¿½È¤ï¿½ï¿½ï¿½Å¬ï¿½ï¿½ï¿½Ê¤È¤ï¿½ï¿½ï¿½ï¿½ï¿½Â�ڤ��Ԥ��� �ܹƤDz��⤹��KEA�μ����ϡ�Lucene���礤�˳��Ѥ��롣�ؽ��Υץ������Ǥϡ����ΤΡ����Ԥˤ�ê¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Ä¤ï¿½ï¿½ï¿½ì¤¿ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½ï¿½ï¿½Ã¤ï¿½Lucene����ǥå�������������夽�������ǥ�ե������������롣̤�ΤΡʥ����ե졼�����Ĥ����Ƥ��ʤ���ʸ�ñ¤«¤é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤Ï¡ï¿½ï¿½ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½Lucene����ǥå�����Ͽ���Ƥ��é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ë¡£ �����ե졼����������KEA�Ǥϥ����ե졼���θ���Ȥ��ơ�1�Ĥ������3�Ĥ�Ϣ³����ñ�����󤹤롣���Ȥ��С����Τ褦��ʸ�ñ¤¬¤ï¿½ï¿½Ã¤ï¿½ï¿½È¤ï¿½ï¿½è¤¦ï¿½ï¿½
����ȡ�KEA�Ǥϼ��Τ褦��10�Ĥθ���Ȥʤë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ó¤¹¤ë¡£
�����ե졼����������ˤ����Ƥϡ����ȥåץ�ɤ���Ϥޤ�ե졼���䥹�ȥåץ�ɤǽ����ե졼���ϸ���Ȥʤ�ʤ�����äơ�ñ�Ȥ�to�Τߤʤ餺��like(s) to��to go�ʤɤϸ���Ȥ��ƥꥹ�ȥ��åפ���ʤ����ޤ����󸡺���NLP�Ǥ褯�Ԥ���ʸ����ñ����������⤳���ʳ��ǹԤ��롣��äơ�Tokyo��Yugawara��tokyo��yugawara�ˡ�likes�ϥ��ƥߥ󥰤����like�ˤʤ롣 KEA�ˤ������ǥ�γؽ�KEA�Ǥ�ñ��٥���ʬ�����Ѥ��ƶ��դ���ǡ��������Ԥˤ�ê¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Ä¤ï¿½ï¿½ï¿½ì¤¿Ê¸ï¿½ï¿½Ç¡ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½ï¿½é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½ï¿½ï¿½ï¿½ë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½Ë¤Ê¤ï¿½Ê¤ï¿½ä¤¹ï¿½ï¿½ï¿½ï¿½P[yes]�ˤȤʤ�ˤ�����P[no]�ˤ�ؽ����롣����Ū�ˤϼ��Τ褦�ʼ����Ѥ��롣
������"Y / (Y + N)"��"N / (Y + N)"����ʬ�ϥ����ե졼�����䤫������������Ψ�Ǥ��롣�����ʸ�ǥ������ꥹ�����Ĥ���줿��Τ������ե졼�����Ȥ���ȡ��Ĥ��Τ褦�ˤʤ롣
KEA�Ǥ�ʸ��D��Υե졼��P��ɽ������Τ�2�Ĥ���ħ�̤��Ѥ��Ƥ��롣1�Ĥ��TF*IDF�Ǥ��ꡢ���Τ褦�˷׻�����롣
freq(P,D)��ʸ��D��˥ե졼��P�����������size(D)��ʸ���ñ�����df(P)�ϥե졼��P��ޤ�ʸ�����N����ʸ����Ǥ��롣�ʤ�log�����2�Ȥ��롣 2�Ĥ����ħ�̤�first occurrence����̾����Υ�ˤȸƤФ���ΤǤ��롣����ϥե졼��P��ʸ��D�˺ǽ���о줹����֤�size(D)�dz�ä���ΤǤ��롣���Ȥ���������tokyo��1/7��governor��2/7�Ȥʤ롣 2�Ĥ���ħ�̤����μ¿��Ȥʤ뤬��KEA�Ǥ�Ϣ³�ͤ�Υ����������dz�Υ���ͤ��Ȥλ����Ψ��׻����롣���Ҥμ���Pt[t|yes]�ʤޤ���Pt[t|no]�ˤ�TF*IDF������Pd[d|yes]�ʤޤ���Pd[d|no]�ˤϵ�Υ�λ����Ψ�Ǥ��롣 KEA�ˤ����ë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Ì¤ï¿½Î¤ï¿½Ê¸ï¿½ï¿½D���é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½Ô¤ï¿½ï¿½Ë¤Ï¡ï¿½D������󤷤������ե졼������ˤĤ������Ҥ�2�Ĥ���ħ�̤�׻�����P[yes]��P[no]���ᡢ�ǽ�Ū�˼��μ���Ȥäƥ������ò»»½Ð¤ï¿½ï¿½Æ¹ß½ï¿½Ë¥ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½Å¬ï¿½ï¿½ï¿½Ê¤È¤ï¿½ï¿½ï¿½ï¿½ï¿½Â�ڤꤹ�롣
�ʤ���̤��ʸ��ϥ�ǥ�˴ޤޤ�Ƥ��ʤ����ᡢTF*IDF��׻�����ݤϡ�df(P)��N��1��û����롣 Apache Lucene��Ȥä�KEA�ץ�������Ǥ����ҤΥ��르�ꥺ��ò¸µ¤Ë¡ï¿½Lucene�饤�֥���ե���Ѥ���KEA�ץ�������KEA-lucene�ȸƤ֤��Ȥˤ���ˤò¼«ºî¤·ï¿½Æ¤ß¤è¤¦ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¾Ò²ð¤¹¤ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Github�˸������Ƥ��롣�ʤ����ץ������Ϥ狼��䤹����ͥ�褷���ǥ��쥯�ȥ�̾�ʤɤ��տ�Ū�˥ϡ��ɥ����ǥ��󥰤���Ƥ��롣 �ʤ�Lucene��Ȥ��Τ����Ȥ�����KEA���������Τˤʤ�Lucene��Ȥ��Τ���������KEA�˸¤餺������������Υġ������������硢ñ��ο���������ꡢ����ñ���ޤ�ʸ��ο���������ꡢ���������ʸ���ñ��˶��ڤä��ꤹ�뤳�Ȥ��褯�Ԥ��롣Lucene�Ϥ������ä�������Ԥ��Τˤ褯�������줿API�������Ƥ��롣����ˤ�Lucene��ž�֥���ǥå����ʰʲ�ñ�˥���ǥå����ȸƤ֡ˤ�ñ�ì¼ï¿½ï¿½È¤ï¿½ï¿½Æ¤ï¿½Í¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ã¤ï¿½KEA-lucene�Τ���˻��Ѥ���Lucene API��ʲ��˾Ҳ𤷤褦�� AnalyzerLucene�Ǥϡ�ʸ�Ϥ�ñ��˶��ڤ�Τ�Analyzer���饹���Ѥ��롣KEA-lucene�Ǥϡ��ȡ����ʥ����Τ����StandardTokenizer�ò¡¢¾ï¿½Ê¸ï¿½ï¿½ï¿½Ø¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¤ï¿½ï¿½ï¿½ï¿½LowerCaseFilter�ò¡¢¤ï¿½ï¿½ï¿½ï¿½ï¿½Ã±ï¿½ï¿½N-gram�ò¥µ¥Ý¡ï¿½ï¿½È¤ï¿½ï¿½ë¤¿ï¿½ï¿½ï¿½ShingleFilter��ȤäƤ��롣��ǰ�ʤ���KEA�Υ��ȥåץ�ɤιͤ�����Lucene��StopFilter�Ǥϼ¸��Ǥ��ʤ��Τǡ�KEAStopFilter�Ȥ����ȼ�TokenFilter��������Ƥ��롣�����ƺǽ�Ū�˼��Τ褦��KEAAnalyzer���Ȥ߾夲���� public class KEAAnalyzer extends Analyzer { private final int n; public KEAAnalyzer(int n){ this.n = n; } @Override protected TokenStreamComponents createComponents(String fieldName) { Tokenizer source = new StandardTokenizer(); TokenStream lcf = new LowerCaseFilter(source); if(n == 1){ TokenStream stf = new KEAStopFilter(lcf, n, Commons.stopWords, Commons.beginStopWords, Commons.endStopWords); return new TokenStreamComponents(source, stf); } else{ assert n >= 2; ShingleFilter shf = new ShingleFilter(lcf, n, n); shf.setOutputUnigrams(false); KEAStopFilter keasf = new KEAStopFilter(shf, n, Commons.stopWords, Commons.beginStopWords, Commons.endStopWords); return new TokenStreamComponents(source, keasf); } } } KEA����ʸ�ˤ��ȡ�conjunctions, articles, particles, prepositions, pronouns, anomalous verbs, adjectives ����� adverbs �γ��ʻ줫�饹�ȥåץ�ɥꥹ�Ȥ������Ƥ��롣�����Lucene���ꥹ�ȥ��åפ��Ƥ��륹�ȥåץ�ɤ���Ϥ뤫��¿����KEA-lucene�Ǥ�Lucene�Υ��ȥåץ�ɤ��ȼ��˥��ȥåץ�ɤ��ɲä����� �����KEA-lucene�Ǥ�ñ��N-gram��N��1����3���Ѥ��Ƹ��̤Υե�����ɤ˥���ǥå�����Ͽ���뤿�ᡢLucene��PerFieldAnalyzerWrapper�ò¼¡¤Î¤è¤¦ï¿½Ë»ï¿½ï¿½Ñ¤ï¿½ï¿½Æ¤ï¿½ï¿½ë¡£ public static Analyzer getKEAAnalyzer(String fieldName){ Map ñ�ì¼ï¿½ï¿½È¤ï¿½ï¿½ï¿½Lucene����ǥå�������Ѥ���Lucene��ñ��ò¥¡ï¿½ï¿½Ë¤ï¿½ï¿½ï¿½Å¾ï¿½Ö¥ï¿½ï¿½ï¿½Ç¥Ã¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½Ã±ï¿½ì¼ï¿½ï¿½È¤ï¿½ï¿½Æ»È¤ï¿½ï¿½Î¤Ï¼Â¤Ë¼ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½ï¿½Ê¤Ã¤Æ¤ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½field��ñ���ǽ餫��Ǹ�ޤ���������ˤϼ��Τ褦�ˤ��롣������ir��Lucene����ǥå����ò¥ª¡ï¿½ï¿½×¥ó¤·¤Æ¤ï¿½ï¿½ï¿½IndexReader�Υ��֥������ȤǤ��롣 Terms terms = MultiFields.getTerms(ir, field); TermsEnum te = terms.iterator(); for(BytesRef rawPhrase = te.next(); rawPhrase != null; rawPhrase = te.next()){ String phrase = rawPhrase.utf8ToString(); : } Lucene�Υ���ǥå����Ϥ���ñ��ñ�줬�ꥹ�ȥ��åפ���Ƥ�������ǤϤʤ������Ҥ�2�Ĥ���ħ�̤�׻����뤿���ɬ�פȤʤ������̤���¸����Ƥ��롣���Ȥ���freq(P,D)��ʸ��D��˥ե졼��P���и��������Ǥ��뤬�������MultiFields��getTermDocsEnum()�᥽�åɤ���Ѥ��Ƽ�������PostingsEnum���֥������Ȥ��Ф�nextDoc()��ʸ��D�����ꤷ���塢freq()��ƤӽФ����ȤǼ����Ǥ��롣 �ޤ�df(P)�ϥե졼��P��ޤ�ʸ�����N����ʸ����Ǥ��뤬����ϼ��Τ褦�ˤ��Ƽ����Ǥ��롣 int dfreq = ir.docFreq(new Term(field, phrase)); : int n = ir.numDocs(); ����ˤϥե졼���ε�Υ��׻����뤿��˥ݥ����������ɬ�פˤʤäƤ��뤬�������MultiFields��getTermPositionsEnum()�᥽�åɤ�Ȥä�PostingsEnum���֥������Ȥ�����塢advance()�᥽�åɤ�ʸ������ꤷ����nextPosition()�Ǻǽ�Υե졼���Υݥ���������롣KEA-lucene�Ǥ�1-gram, 2-gram, 3-gram����Ω�����ե�����ɤˤ��Ƥ���Τǡ�nextPosition()�Ǽ��������ݥ������Ϥ��Τޤ޵�Υ�׻����Ѥ�������ʤ��� ��ǥ�γؽ��ʥ�ǥ�ե�����ν��ϡ�KEA-lucene�Ǥ�KEAModelBuilder����ǥ�γؽ���ԤäƤ��롣���Υץ��������������������������Ǥ��붵�եǡ�����MAUI.tar.gz��Ÿ��������ˤ�����ˤ���fao30.tar.gz��Ÿ������ˤ�data/fao30/�ǥ��쥯�ȥ�����֤���뤳�Ȥ�����˽ñ¤«¤ï¿½Æ¤ï¿½ï¿½ï¿½Î¤ï¿½ï¿½ï¿½ï¿½Õ¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ KEA�Ǥϥ�ǥ�γؽ���ñ��٥�����ȤäƤ���Τǡ��ؽ��ˤϳƼ������̤ò¥«¥ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ì¤Î³ï¿½ê»»ï¿½ï¿½Ô¤ï¿½ï¿½ï¿½ï¿½ï¿½ê»»ï¿½ï¿½×»ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½Î¤Ï¸ï¿½Ò¤ï¿½KeyphraseExtractor(2)�ץ������Ǥ��ꡢKEAModelBuilder�ץ������Ǥ������̤���Ƥ�������ʤΤǡ����Τˤϥ�ǥ�γؽ��Ȥ������ϥ�ǥ�ե��������Ϥ��Ƥ���˲᤮�ʤ��� KEAModelBuilder�����Ϥ����ǥ�ե�����ϼ��Τ褦�ʥ��ڡ������ڤ�Υƥ����ȥե�����Ǥ��롣 $ head features-model.txt 0.000852285 0.231328 false 0.000284095 0.980489 false 0.000426143 0.0124768 false 0.000426143 0.429134 false 2.01699e-05 0.0479968 false 0.000284095 0.0160665 false 0.000426143 0.726610 false 0.000136409 0.000752663 false 0.000226198 0.379661 false 0.000284095 0.478057 false 1�ĤΥ쥳���ɤ�1�ĤΥ����ե졼�������ɽ���Ƥ��롣�ǽ�ο��ͤ�TF*IDF�ǡ�2���ܤο��ͤϵ�Υ��ɽ���Ƥ��롣3���ܤϥ����ե졼����true�ˤ��ݤ���false�ˤ�ɽ�����륯�饹�Ǥ��롣 KEAModelBuilder�ץ������Ͼåµï¿½Î¤è¤¦ï¿½Ê¥ï¿½Ç¥ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ï¤ï¿½ï¿½ë¤¿ï¿½ï¿½Ë¶ï¿½ï¿½Õ¥Ç¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Lucene����ǥå������������Lucene����ǥå�����ñ�ì¼ï¿½ï¿½È¤ß¤Ê¤ï¿½ï¿½ï¿½iterate���ʤ�����ħ�̤�׻����Ĥġ��åµï¿½Î¥ï¿½Ç¥ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ï¤ï¿½ï¿½ë¡£ ���ͤޤǤ˽������֤ò¼¨¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½MacBook Pro��Processor 2.3 GHz Intel Core i7�ˤǤ�5ʬ���٤����ä���Lucene����ǥå������������Τϰ�֤�����Lucene����ǥå������������ʤ����ǥ�ե��������Ϥ���Τ˾������֤������äƤ��롣 R�ˤ��Ϣ³�ͤ�Υ����KEA�Ǥ���ħ�̤μ¿��ͤò¤½¤Î¤Þ¤ï¿½ï¿½Ñ¤ï¿½ï¿½ï¿½Î¤Ç¤Ï¤Ê¤ï¿½ï¿½ï¿½MDLP�ʺǾ�����Ĺ�����ˤˤ���᤿Υ���ͤ˥ޥåԥ󥰤����Ѥ��롣�����Ǥ�MDLP�η׻��Τ����R��ȤäƼ��Τ褦�˵�᤿�� data <- read.table('features-model.txt') mdlp(data)$cutp [[1]] [1] 0.0003538965 0.0013242950 0.0041024750 [[2]] [1] 0.0003553105 0.0144056500 0.0697899000 ����������줿��̤ϡ����Υץ������KeyphraseExtractor(2)�ǻ��Ѥ���Τǡ�cutp-model.txt�Ȥ����ե�����̾����¸���Ƥ����� $ cat cutp-model.txt 0.0003538965 0.0013242950 0.0041024750 0.0003553105 0.0144056500 0.0697899000 R�ˤ��MDLP�׻���Ʊ���Ķ���30ʬ���٤����ä��� Lucene����ǥå�������Υ����ե졼������ǤϤ��褤��Lucene����ǥå������é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½Æ¤ß¤è¤¦ï¿½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½KeyphraseExtractor��KeyphraseExtractor2�Ǥ��롣���Ԥ�KEAModelBuilder�Ǻ�����������ǥå����Τ�����˴��ΤΥե�����a0011e00.txt�ʤ��Υե�����̾��ץ��������˥ϡ��ɥ����ǥ��󥰤��Ƥ���ˤ��é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½ï¿½Î¤Ç¤ï¿½ï¿½ë¡£KeyphraseExtractor2������fao780�����Ҥ�MAUI.tar.gz��Ÿ��������ˤ�������̤�Τ�ʸ��t0073e.txt���é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½×¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤Ê¤Ã¤Æ¤ï¿½ï¿½ë¡£ KeyphraseExtractor�����ϴ�¸��Lucene����ǥå������������̤����Ƥ���Τ��Ф���KeyphraseExtractor2�����Ͽ�����ʸ�ñ¤«¤ï¿½Æ¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ì¤ï¿½ï¿½ï¿½ï¿½ë¤¿ï¿½ï¿½ï¿½Lucene����ǥå������������Ȥ�������ԤäƤ��롣�ޤ���KeyphraseExtractor2�����ϥ�ǥ�˴ޤޤ�ʤ�����ʸ��Ǥ��뤿��ˡ�df(P) / N �η׻���ʬ��ʬ�졢ʬ�ҤȤ�1���û�����Ƥ���Ȥ�����ۤʤ롣 KeyphraseExtractor��KeyphraseExtractor2�ϤȤ��Lucene����ǥå�������Ͽ�Ѥߤ�ʸ������ꤷ������ʸ��ˤ����ë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ê¥¹ï¿½È¥ï¿½ï¿½Ã¥×¤ï¿½ï¿½Ê¤ï¿½ï¿½é¥¹ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½×»ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ã¤ï¿½Ã±ï¿½ì¼ï¿½ï¿½ï¿½iterate�������Lucene��TermVector��Ȥ�������������Lucene�ɥ��������Ͽ�ϼ��Τ褦��FieldType��setStoreTermVectors(true)��Ƥ�Ǥ��롣 static Document getDocumentWithTermVectors(String fn, String content) throws IOException { Document doc = new Document(); doc.add(new StringField(FILE_NAME, fn, Field.Store.YES)); doc.add(new StoredField(DOC_SIZE_FIELD_NAME, Commons.getDocumentSize(content))); FieldType ft = new FieldType(); ft.setStored(true); ft.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS); ft.setStoreTermVectors(true); doc.add(new Field(Commons.getFieldName(FIELD_NAME, 1), content, ft)); doc.add(new Field(Commons.getFieldName(FIELD_NAME, 2), content, ft)); doc.add(new Field(Commons.getFieldName(FIELD_NAME, 3), content, ft)); return doc; } �����ե졼����������ҤΥ������׻����ˤ��score(P,D)���׻�����߽�˥����Ȥ���롣KEA-lucene�ǤϤ��Τ����Lucene��PriorityQueue���Ѥ��Ƥ��롣�Ȥ����ǡ�KEA�Ǥ���ħ�̤Ȥ���Ϣ³�ͤǤϤʤ�Υ���ͤ�ȤäƤ��뤿�ᡢscore(P,D)��¿���Υ����ե졼������֤�Ʊ���ˤʤ���ݤ���ʸ�ǻ�Ŧ����Ƥ��롣���ξ���TF*IDF��Ȥä�Ʊ���辡��Ԥ��Ȥ������ȤʤΤǡ�PriorityQueue�μ����ϼ��Τ褦�ˤʤäƤ��롣 static class KeyphraseScorePQ extends PriorityQueue �ޤ�PriorityQueue�沼�̤Υ����ե졼�����䤬���������̤Υ����ե졼������Υ��֥ե졼���Ǥ��ä����ϤϤ������褦�˼�������ɬ�פ⤢�롣 �¹Է�̤ò¸«¤Æ¤ß¤è¤¦ï¿½ï¿½KeyphraseExtractor��¹Ԥ�����Ф��줿�����ե졼���ξ��20��ϼ��Τ褦�Ǥ��롣�ʤ���������2�Ĥο��ͤϡ���������TF*IDF��ħ�̤�ɽ���Ƥ���ʥ�������Ʊ���λ���TFIDF�ͤ��⤤������̤ˤ��Ƥ���Τ��狼��������ˡ� animal (0.311211,0.001698) standards setting (0.102368,0.014469) sps (0.102368,0.009091) chains (0.102368,0.008121) food chains (0.102368,0.008038) food safety (0.102368,0.008004) sps standards (0.102368,0.006865) value chain (0.102368,0.005468) setting process (0.102368,0.005250) standards setting process (0.102368,0.004846) livestock food (0.102368,0.004823) oie (0.102368,0.004501) poor to cope (0.102368,0.004442) animal health (0.102368,0.004255) animal production (0.089053,0.000425) consultation (0.076506,0.004025) assisting the poor (0.076506,0.002827) sanitary and technical (0.076506,0.001615) requirements assisting (0.076506,0.001615) dynamics of sanitary (0.076506,0.001615) ������Ф��ͼ�ǤĤ���줿�����ե졼���ϼ��Τ褦�Ǥ����fao30�ǡ�����ʣ���οʹ֤��פ��פ��˥����ե졼����Ĥ��Ƥ���Τǡ����Ǥâ¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Ç§ï¿½ï¿½ï¿½ï¿½ï¿½ì¤¿ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ò¤³¤ï¿½ï¿½Ç¤Ï¥ê¥¹ï¿½È¥ï¿½ï¿½Ã¥×¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½Ë¡ï¿½ $ cat $(find fao30 -name a0011e00.key)|sort|uniq animal health animal production animal products capacity building consumers developing countries development policies disease surveillance domestic markets empowerment fao food chains food safety food security hygiene information dissemination livestock markets meat hygiene mechanics phytosanitary measures poverty public health regulations risk analysis risk management rural population standards technical aid technology trade veterinary hygiene world ξ�Ԥ˶��̤��Ƹ���ë¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½Ï¼ï¿½ï¿½Î¤è¤¦ï¿½Ë¤Ê¤ë¡£
�褷������Ψ���⤤�Ȥϸ����ʤ�������Ф��줿�����ե졼���ϥե졼���Ȥ����Լ����ʤ�Τ����ʤ��褦�˸����롣��ư��Ф��줿�ե졼���Ȥ��ƤϤ�������ԤäƤ���ΤǤϤʤ����������� �ޤ�KeyphraseExtractor2��¹Ԥ���̤�Τ�ʸ�ñ¤«¤é¥ï¿½ï¿½ï¿½Õ¥ì¡¼ï¿½ï¿½ï¿½ï¿½Ð¤ï¿½ï¿½ï¿½ï¿½ï¿½Ì¤Ï¼ï¿½ï¿½Î¤è¤¦ï¿½Ç¤ï¿½ï¿½ë¡£Æ±ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½20���ɽ�����Ƥ��롣 post harvest (0.628181,0.004842) vegetables (0.311211,0.002956) root crops (0.311211,0.002523) fruits (0.311211,0.002316) vegetables and root (0.311211,0.002161) food losses (0.311211,0.001900) harvest food losses (0.311211,0.001582) post harvest food (0.311211,0.001582) fresh produce (0.102368,0.011087) packing (0.102368,0.007199) decay (0.102368,0.005210) containers (0.102368,0.004370) prevention of post (0.089053,0.001231) fruits vegetables (0.089053,0.000472) handling (0.076506,0.002345) storage (0.076506,0.002177) tubers (0.076506,0.001889) marketing (0.076506,0.001474) roots (0.076506,0.001389) potatoes (0.029596,0.003984) �����ǡ��ͼ�ǤĤ���줿�����ե졼���ϰʲ��Ǥ��롣 $ cat t0073e.key Fruits Marketing Packaging Postharvest losses Postharvest technology Root crops Vegetables �������빽�����ʥե졼������Ф���Ƥ��롣���󸡺��Υ�����ɥ������������ʤɤ��Ѥ���ˤϽ�ʬ�����٤Ȥ��äƤ⤤���������� �ޤȤ��ܹƤ϶��դ��굡���ؽ��ǥ����ե졼������Ф���KEA�ò¡¢¤ï¿½ï¿½ï¿½ï¿½ï¿½Lucene�饤�֥�����Ѥ��뤳�ȤǼ���������KEA�ץ������ϥ��르�ꥺ�ब��ñ���ƤʤΤǡ������ե졼������γƼ������̤����뤿���Lucene API��ɤΤ褦�˻Ȥ���Τ����ץ�����फ���ɤ߼��䤹���ΤǤϤʤ����ȹͤ��Ƥ��롣 �ɼԤ��ܹƤ��̤���Lucene�饤�֥��ض�̣����ä��ꡢ�饤�֥��ؤ��μ��ò¾¯¤ï¿½ï¿½Ç¤â¹ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤é¹¬ï¿½ï¿½ï¿½Ç¤ï¿½ï¿½ë¡£ �����ܹƤ�ɮ�����ǽ�Υץ������ΥС������Ǥϡ��ʲ��ι��ܤˤĤ��Ƥϻ��֤����󤫤�������Ƥ��ʤ���
KEA-lucene��Lucene�ؽ��Ը����ˤ����ƥϡ��ɥ����ǥ��󥰤򤷤Ƥ��뤬������Ū�ʤ�Τ�Ŭ��NLP4L���̤����󶡤���ͽ��Ǥ��롣 2015.06.24 Wednesday
���ܸ�Wikipedia���������ì¼ï¿½ï¿½Î¼ï¿½Æ°ï¿½ï¿½ï¿½ï¿½
���ܸ�Wikipedia���� Lucene / Solr / Elasticsearch �ǻȤ�������ì¼ï¿½ï¿½ï¿½Æ°ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ë¡ï¿½ï¿½Ò²ð¤¹¤ë¡£ ���ˤ����������������Ƥ�����ˡ�ǹԤä���⤷����������Ϥ����볰���˸¤äƸ��դ���������Τǡ��Ϥ뤫�����٤��褤��¬�äƤϤ��ʤ����������ˡ��ޤ������� NLP4L ���̤��Ƹ������Ƥ���Τǡ�ï�Ǥ��롣�ޤ��������狼��С����ܸ�Wikipedia�˸¤餺�����Ǥ��롣 ���ܸ�ʸ��ϼ��Τ褦�˥������ʸ�Ȥ��θ츻�αѸ�����ˤΥ���ե��٥å�ʸ���󤬶ᤤ��Υ�����֤���ƽñ¤«¤ï¿½ë¤³ï¿½È¤ï¿½ï¿½ï¿½ï¿½Ë¤Ë¤ï¿½ï¿½ë¡£ ���󥿡��ƥ�����ȡʱ�: entertainment�ˤȤϡ��͡���ڤ��ޤ����ڤò¤¤¤ï¿½ï¿½ï¿½ï¿½ï¿½Wikipedia�֥��󥿡��ƥ�����ȡפ��ȴ��� ������ʸ������̤˼������Ƹߤ��˶ᤤ��Υ�ˤ��륫�����ʸ�ȥ���ե��٥å�ʸ����������Υڥ��Ȥ��� Lucene/Solr �˻Ȥ���ƥ����ȥե�����˽��Ϥ��뤳�Ȥ�ͤ��롣���������ᤤ���˽ñ¤«¤ï¿½Æ¤ï¿½ï¿½ë¤¹ï¿½Ù¤Æ¤Î¥Ú¥ï¿½ï¿½ï¿½É¬ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Æ±ï¿½ï¿½ï¿½ï¿½Ì£ï¿½ï¿½ï¿½ï¿½Ä¤È¤Ï¸Â¤ï¿½Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½Ð¡ï¿½ï¿½ï¿½ï¿½Î¤è¤¦ï¿½ï¿½Ê¸ï¿½Ï¤Ç¤Ï´Ö°ï¿½Ã¤ï¿½ï¿½ï¿½Ì¤ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ ����ԥ塼������FORTRAN�Τ������Ǥ��ȶ�ˤʤä��� ʸ�Ϥ��������̤˼������ƥ������ʸ�ȥ���ե��٥å�ʸ����ζ�����Ĵ�٤뤳�ȤǤ��ǽ���⤷��ʤ�����ʸ�Ϥ����̤˼������뤳�Ȥ�ï�Ǥ�Ǥ��뤳�ȤǤϤʤ��� NLP4L �ˤϱ�ñ��ȥ������ʸ�� Transliteration �Υץ������ȳؽ��ǡ������Ĥ��Ƥ��롣 ���Υץ�������Ȥ��ȥ������ʸ줫���ñ����֤�����Ǥ��롣���ο����ͤ�ʸ�Ϥ��齦�ä�����ե��٥å�ʸ�������Ӥ���ʸ���󤬻��Ƥ���С�Ŭ���˷�᤿�Խ���Υ�ʲ��ʤ�˽��ä��������ʸ�ȥ���ե��٥å�ʸ�����Ʊ����̣����ĤȤ�������ì¼ï¿½ï¿½Ë½ï¿½ï¿½Ï¤ï¿½ï¿½ë¡£ ���ϼ����̤ꡣ
�ʾ�μ������ܸ�Wikipedia������Ф�����������������ʸ��������ʤɤ��������Ǥ��롣 ��ˤ��������󥿡��ƥ�����Ȥ䥤�󥿥ե������Τ褦�ˡ��������ʸ��ɽ���ɤ줬������⼡�Τ褦�˼����Ǥ���Τⶽ̣������ entertainment,���󥿡��ƥ�����,���󥿡��ƥ������ interface,���󥿥ե�����,���󥿡��ե����� pennsylvania,�ڥ󥷥�Х˥�,�ڥ󥷥�٥˥� ���ޤ��Ȥ��и����κƸ�Ψ������礤����Ω�Ĥ������� 2015.01.15 Thursday
Lucene ���ߥåȻ���ư���������
Lucene ����ǥå����Υ��ߥåȻ���ư��ˤĤ��ƾܽҤ��줿�᡼�����ʸ�ˤ򸫤Ĥ����ΤǤ���������Ǻܤ��褦���᡼�뤽�Τ�Τϡ�Lucene ���ߥå����� Uwe �᤬�Ƕ�� Java 9 �Υ��ߥåȤˤ�äƥǥ��쥯�ȥ�ؤ� fsync() ư����ޤ������ʤ��ʤä����ȤˤĤ��ƺǶ�� Lucene ���ߥ�˥ƥ��ξ����ˤĤ���������������ˤϥ��ߥ�˥ƥ��˲����ˡ���䤦����ΤȤʤäƤ��롣
�Ƕ�� Java 9 �Υ��ߥå� (e5b66323ae45) ���ǥ��쥯�ȥ�ؤ� fsync ���˲�������ˤĤ��� ����ˤ��ϡ� Apache Lucene ���ߥå�������ɽ������Ƥ��ޤ���Apache Lucene��Solr��Elasticsearch��ξ�����褴���ƥ��Ȥ����Ȥ���Hotspot�ޤ����濴�����꤬���Ĥ��ä����ȤϤ�¸�Τ��Ȼפ��ޤ��������ǡ�JDK 9�Υץ�ӥ塼�ǥӥ��40�����Ѥ���褦��ʬ�����Υƥ��ȥ���ե���������åץǡ��Ȥ��ޤ���������ϼ��Jigsaw�ޤ���������ǧ���뤳�Ȥ���Ū�Ǥ��������������ĥ�äƤ⤳�Υӥ�ɤǤ����꤬ȯ�����ʤ��ΤǤ��� �Ȥ�������ǰ�ʤ���OpenJDK 9�κǶ�Υ��ߥåȤǤ����꤬ȯ�����Ƥ����http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/e5b66323ae45�ˡ��������������Τ�https://bugs.openjdk.java.net/browse/JDK-8066915�Ǥ�����������˴�Ϣ������ʬ�����Ǥ�https://issues.apache.org/jira/browse/LUCENE-6169�Ȥ�������åɤ�Ω�Ƥޤ����� �ޤ��������������ޤ���Apache Lucene�� write once ���ץ������ʤ��٤ƤΥե�����ǰ��٤����ñ¤¹ï¿½ï¿½ß¤ï¿½ï¿½Ô¤ï¿½ï¿½ï¿½Ë¤ï¿½È¤Ã¤Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½Lucene�ν���Ρ֥��ߥåȥݥ���ȡפǡ֥��ߥåȡפ�����ϰ��Ū�ʥե�����̾�˽ñ¤¹ï¿½ï¿½ß¡ï¿½ï¿½ï¿½ï¿½ì¤«ï¿½é¤³ï¿½Î¥Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤ï¿½ï¿½ï¿½Ó¤ï¿½ï¿½Ù¤Æ¤Î´ï¿½Ï¢ï¿½Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½Ë¤ï¿½fsync��¹Ԥ�����ˤʤ�ޤ�������ϥե��������ͥ�Ǥ�fc.force()�ò¥³¡ï¿½ï¿½ë¤¹ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ã±ï¿½Ç¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¥ï¿½ï¿½ß¥Ã¥È¤ÎºÇ½ï¿½Åªï¿½Ê¡Ö¥Ñ¥Ö¥ï¿½Ã¥ï¿½ï¿½ï¿½×¤Ï¡ï¿½Files.move(Path, Path, StandardCopyOption.ATOMIC_MOVE)�פ�Ȥä����ȥ�Υ�͡���ǹԤ��ޤ��� ������ˡ�����ŤΤ褦���ç¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ç¤ï¿½Ê¤ï¿½ï¿½Â¤ê¤¦ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¤è¤¦ï¿½ï¿½ï¿½ï¿½ï¿½ê¤¬È¯ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¡ï¿½POSIX OS�Ǥϥ�͡�����������������ʤ��ʤ��ǽ��������ޤ���Linux��MAN�ڡ�����http://linux.die.net/man/2/fsync�ˤˤϤ��٤Ƥ˰ʲ��ε��Ҥ�����ޤ��� ��fsync() �Υ�����ϥե�����Τ���ǥ��쥯�ȥ���Υ���ȥ꤬�ǥ������˽ñ¤¹ï¿½ï¿½Þ¤ì¤¿ï¿½ï¿½ï¿½È¤ï¿½É¬ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ý¾Ú¤ï¿½ï¿½ï¿½ï¿½Î¤Ç¤Ï¤Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Î¤ï¿½ï¿½á¡¢ï¿½Ç¥ï¿½ï¿½ì¥¯ï¿½È¥ï¿½Î¥Õ¥ï¿½ï¿½ï¿½ï¿½ëµï¿½Ò»Ò¤Ë¤ï¿½ fsync() ����������ɬ�פ����롣�� ����Ū�ˡ����ߤ�Apache Lucene�Ǥ�ʲ��μ���Ʊ�����Ȥò¤·¤Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½Ç¥ï¿½ï¿½ì¥¯ï¿½È¥ï¿½ï¿½ï¿½FileChannel ��READ�ѡˤò¥ª¡ï¿½ï¿½×¥ó¤·¡ï¿½ fc.force() �ò¥³¡ï¿½ï¿½ë¤·ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ó¡¢¡ï¿½Æ°ï¿½î¤¹ï¿½ë¤³ï¿½È¤Ï°ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¬ï¿½ï¿½ï¿½Ã¤Æ¤ï¿½ï¿½ï¿½ï¿½Î¤Î¡Ë¼ÂºÝ¤Ë¤ï¿½Java API�ˤ��ε��Ҥ��ʤ����ᡢ�����ϡ�Í���ʿ�¬�פ˴�Ť��Ƥ����Ԥ����Ȥˤʤ�ޤ���Windows�ǤϤ��ޤ������ʤ��ʤɡ�*�ˡ�IOException���㳰��ɬ��ȯ�����ޤ��� ����������ˤʤ�Τ������ҤΥ��ߥåȤǤϤ��Υ��ץ�������OpenJDK 9��FileSystemException�Ǥ��ޤ������ʤ����ȤǤ���FileSystemException�ϥǥ��쥯�ȥ�Ǥ����ҳ���Lucene��꡼���ϡ����ҤΤ褦�ˡ��㳰�ò¤¹¤Ù¤Æ¼ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ë¤¿ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½×¤Ê¤Î¤Ç¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Æ¥ï¿½ï¿½È¥ï¿½ï¿½ï¿½Õ¥ï¿½Ç¤Ï¡ï¿½ï¿½ï¿½ï¿½Ê¤ï¿½ï¿½È¤ï¿½Linux��MacOSX�Ǥ�ư���Ƥ��ޤ��ΤǤ��� �ǿ��Υ����ɤ�http://goo.gl/vKhtsW�ˤ���ޤ��� �ɤ����Ƥ⥵�ݡ��Ȥ���Ƥ���OS�Υǥ��쥯�ȥ��fsync���Ǥ���褦�ˤ��Ƥ��������ΤǤ������ҤΥ��ߥåȤ�8u40��7u80�Υ�꡼���ˤ��ᤷ�����ʤ��ΤǤ���Java 9�ʤ����ذƤ�Ƥ�Ǥ���ΤǤ�����
�ո����б���ˡ�˴ؤ��륢�ɥХ����򲼤����� Uwe *�ˤ��������Υե����륷���ƥ�Υ��ޥ�ƥ������ϥե����뤬�μ¤˸�����褦�ˤ��Ƥ��뤿�ᡢ�ºݤΤȤ���������ǤϤ���ޤ����ä�����äȤ���ޤ��ʥǥ��쥯�ȥ���ɤ߹����Ѥ˳����ȡ�Access Denied�פ�Java IOException�˥ޥåԥ󥰤���ޤ�����������������ꤢ��ޤ���ˡ� 2014.12.19 Friday
Solr��Ȥä������Ѹ����
Solr��Ȥäƴ�ñ�������Ѹ���Ф�����ˡ��Ҳ𤷤褦�� �����Ѹ�Ȥϡ��������Ѹ��פȤϡ�Wikipedia�ˤ��С��֤�������ο��Ȥ˽�������Ԥ䡢��������γ����ʬ��ȳ����δ֤ǤΤ߻��Ѥ��졢���Ѥ�����ա��Ѹ췲�פǤ��롣�������ܵ����Ǥϡ����������ʬ���ʸ����Ͽ���줿����ǥå����ȡ��̤�ʬ���ʸ����Ͽ���줿����ǥå�������Ӥ����ߤ���������Υ���ǥå����˴ޤޤ�ʤ�ñ�콸���ɽ�����뤳�Ȥ������Ѹ����Ф��뤳�Ȥˤ��롣 Solr�ȥ���ǥå����ν��������ǾҲ𤹤���ˡ��Solr 1.4 �ʹߤǤ����ư���ʤϤ��ˡ�Solr����äƤ��ʤ��ͤϥ���������ɤ����Ѱդ��褦�������Ʋ�����schema.xml����ꤷ��Solr��ư���롣 <?xml version="1.0" encoding="UTF-8" ?> <schema name="example" version="1.5"> <types> <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/> <fieldType name="int" class="solr.TrieIntField" precisionStep="0" omitNorms="true" positionIncrementGap="0"/> <fieldType name="long" class="solr.TrieLongField" precisionStep="0" positionIncrementGap="0"/> <fieldType name="date" class="solr.TrieDateField" omitNorms="true" precisionStep="0" positionIncrementGap="0"/> <fieldType name="text_ja" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="false"> <analyzer> <tokenizer class="solr.JapaneseTokenizerFactory" mode="search"/> <filter class="solr.JapaneseBaseFormFilterFactory"/> <filter class="solr.CJKWidthFilterFactory"/> <filter class="solr.JapaneseKatakanaStemFilterFactory" minimumLength="4"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> </types> <fields> <field name="url" type="string" indexed="true" stored="true" required="true" /> <field name="cat" type="string" indexed="true" stored="true"/> <field name="date" type="date" indexed="true" stored="true"/> <field name="title" type="text_ja" indexed="true" stored="true"/> <field name="body" type="text_ja" indexed="true" stored="false" multiValued="true"/> </fields> <uniqueKey>url</uniqueKey> </schema> ʸ��ǡ����Ϥ����Ǥ�livedoor�˥塼�������ѥ����Ѥ��롣�ʲ��Τ褦��wget��livedoor�˥塼�������ѥ������ꤷ��Ÿ�����Ƥ����� $ wget http://www.rondhuit.com/download/livedoor-news-data.tar.gz $ tar xvzf livedoor-news-data.tar.gz ������Solr��ư����Ÿ����������줿�ե�����Τ����ҤȤġ���Ȥ��Ƥ����Ǥϡ��Ƚ��̿���dokujo-tsushin.xml�ˡפ�Solr����Ͽ���롣 $ ./post.sh dokujo-tsushin.xmlTermsComponent��ñ��ꥹ�Ȥ���� ���ˡ�Solr��TermsComponent��Ȥäƥ���ǥå�������ñ�������������롣����curl���ޥ�ɤ�¹Ԥ���ȡ�body�ե�����ɤǤλ������٤�¿����˾��1�����ñ������������롣 $ curl "http://localhost:8983/solr/collection1/terms?terms.fl=body&terms.limit=10001&omitHeader=true&wt=json&indent=true" | tail -n 10001|head -n 10000|cut -d , -f1|cut -d \" -f2 > dokujo-tsushin.txt $ head -n 30 dokujo-tsushin.txt �� ���� �� �� �� �� �� �� �� �� �� �� ���� �ʤ� ���� ���� ���� �ʤ� ��� �� �Ȥ��� �� �� �� �Ǥ� �ޤ� �褦 �� ���� �פ� head���ޥ�ɤǾ��30�̤��ѽ�ñ���ɽ�����Ƥ��뤬�����Ǥ�schema.xml�ե�����򸫤Ƥ��������Ȥ狼��Ȥ��ꡢ���ȥåץ�ɤν�������ڤ��Ƥ��ʤ��Τǡ����褽�����ˤ����Ω���ʤ��ʤ����������Ѹ�Ǥ�ʤ���ñ�줬�¤�Ǥ���Τ��狼��������� ���˰ۤʤ�ʬ���ʸ�����Ͽ����Ʊ�ͤ�ñ�������ɽ�����롣���Τ���ˤ����Ǥ��ä���Solr����ߤ��ƥ���ǥå��������������ٵ�ư�����̤�ʬ���ʸ��ʤ����Ǥ���Ȥ��ơ֥��ݡ��ĥ����å���sports-watch.xml�ˡסˤ���Ͽ���롣������Ʊ�ͤ�TermsComponent��Ȥä�ñ����������롣 $ ./post.sh sports-watch.xml $ curl "http://localhost:8983/solr/collection1/terms?terms.fl=body&terms.limit=10001&omitHeader=true&wt=json&indent=true" | tail -n 10001|head -n 10000|cut -d , -f1|cut -d \" -f2 > sports-watch.txt $ head -n 30 sports-watch.txt �� �� ���� �� �� �� �� �� �� �� �� �� �� ���� �ʤ� ��� �ʤ� ���� ���� ���� �� �Ȥ��� �Ǥ� �ޤ� �� ���� ���� 1 ���� ���2�Ĥ�ñ�콸�����Ӥ��� �Ǹ��2�Ĥ�ñ�콸�����Ӥ��ƺ�ʬ��ɽ������ץ�������񤯡������Ǥϰʲ��Τ褦�ʴ�ñ��Scala�ץ���������������� package terms import scala.io.Source object TermsExtractor { def main(args: Array[String]): Unit = { val set1 = termsSet("/Users/koji/work/ldcc/dokujo-tsushin.txt") val set2 = termsSet("/Users/koji/work/ldcc/sports-watch.txt") println(set1 &~ set2) println(set2 &~ set1) } def termsSet(file: String): Set[String] = { val src = Source.fromFile(file, "UTF-8") var result : Set[String] = Set() try{ src.getLines.foreach{ line: String => result = result + line } result } finally { src.close } } } �¹Ԥ���ȼ��Τ褦��2�Ԥν��Ϥ������롣1���ܤϡ��Ƚ��̿��פˤΤ߽и�����ñ��Ǥ��ꡢ2���ܤϡ֥��ݡ��ĥ����å��פˤ����и�����ñ��Ǥ��롣�ѽФ��Ƥ������ȥåץ�ɤäݤ�ñ��Ϥ��줤�˾ä��Ƥ��뤳�Ȥ��狼�롣 Set(����, ̣����, �Ƥ�̾, ����, ��ͥ���, ����, ����, �ڻ�, ��̾, ����, �ۤ���, ����, �פ�������, �Ϥ���, �����ɥ���, ����̣, Ĵ�٤�, ��, ��ɮ, ����, �ܻ�, ��, �ߥå����, �ե�, ����, Ŵ§, ��Ȥ���, �����, ����, ���߿���, �»�, ��ǥ, ����, �������å��쥳���ɥ꡼�ǥ���, ��å�, ����, ������, �褽��, �ޥ���, ����, ����, ��, sweet, ;��, ����, ���ɥ쥹, ������˳, ��ë, ����, ˡŪ, �ο�, ���ޤ�, �ߥ˥�����, ��̵, �ؤ�, į���, �λ�, ����, ���äѤ�, �����ʤ�, �ϥå�, �����, ������, �²�, ����, ����, ����, ϫƯ��, ���, ntt, �ʲ���ά) Set(����, ������, �ۤȤܤ�, ����, �ե�, �����, �ե��󥷥�, Ϣ��, ��, 205, ����, 2014, ƣ��, Ĺ��, ��, ���, �ץ��쥹��, �����ե�, �⤯, �Ϥ���, ����, �����꡼�ȥե���, Ļȩ, �翳, ʨ��, ����, ���礦, �߽�, ������ʥ���, ��, �����餤, ���Ѱ�, ����, �г�, ��餹, ����, ����, ���, ʿ�ˤ�, 98, ������ǥ󥴡��륺, ����, ��, ����, ���ꥨ��, ���ץ�󥰥�, ���, beautiful, ���礦��ʤ�, 113, �Լ�, ��, ��ŷ, ����, ��, sunday, ��ü, �������륹��������, ��ɾ, ���롼�ѥ�, ����, �۲�, �桼��, ������, �;�, �ѿ�, �䤮, cb, ������, ����, ����, mmaplanet, ���ޤ�, ů��, ����, ��ԥ奿, ����Ρ��֥�, ���ޥ�, ����������, brad, ��Τ, wbc, ����, pm, �֤äĤ���, ȴŧ, ���Ť�, ���եꥫ, �ʲ���ά) ������äƤ��줾���ʬ��������Ѹ�Ǥ���ȸ����ڤ�ˤ��񤬤��뤬���ɤ��餫�����Ƚ��̿��פǤɤ��餫���֥��ݡ��ĥ����å��פ��Ȥ���2��������Ф��줿�ʤ�С��ۤȤ�ɤοͤ����򤹤뤯�餤�ˤ������Ѹ�äݤ����Ϥ������Ƥ���Ȥ�����ΤǤϤʤ����������� 2014.12.04 Thursday
word2vec for Lucene ���Ҳ�
Lucene����ǥå����ò¥³¡ï¿½ï¿½Ñ¥ï¿½ï¿½È¤ß¤Ê¤ï¿½ï¿½ï¿½Æ°ï¿½î¤¹ï¿½ï¿½ word2vec for Lucene ��ȯ�����ΤǤ��λȤ�����Ҳ𤷤褦��word2vec �� Tomas Mikolov ��ˤ�ä���ơ���ȯ���줿ñ���٥��ȥ���Ѵ�����OSS�ġ���Ǥ��롣���ꥸ�ʥ�� word2vec �ϼ�������ǽñ¤«¤ì¤¿ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½È¥Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½É¤ß¹ï¿½ï¿½ß¡ï¿½Ã±ï¿½ï¿½Ù¥ï¿½ï¿½È¥ï¿½ï¿½ï¿½ï¿½Ï¤ï¿½ï¿½ï¿½è¤¦ï¿½Ë¤Ê¤Ã¤Æ¤ï¿½ï¿½ë¡£ word2vec for Lucene �ϥƥ����ȥե�����������Lucene����ǥå��������ϥ����ѥ��Ȥ����Ѥ��롣���ꥸ�ʥ�� word2vec �������ƥ����ȥե������ñ��ñ�̤�ʬ�����ñ¤¤ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½É¬ï¿½×¤ï¿½ï¿½ï¿½ï¿½ë¡£ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ï±Ñ¸ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½È¤Ê¤É¤Ç¤ï¿½ï¿½ï¿½ï¿½ï¿½Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½Ã±ï¿½ï¿½ï¿½Ê¬ï¿½ï¿½ï¿½ï¿½ï¿½ñ¤¤ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Ê¤ï¿½ï¿½ï¿½ï¿½Ü¸ï¿½Ê¤É¤Ç¤Ï¤ï¿½ï¿½é¤«ï¿½ï¿½ï¿½ï¿½MeCab�ʤɤη����Dz��ϴ��ʬ�����ñ¤¥Ä¡ï¿½ï¿½ï¿½ï¿½ï¿½Ñ¤ï¿½ï¿½ï¿½Ã±ï¿½ï¿½ï¿½Ê¬ï¿½ï¿½ï¿½ï¿½ï¿½ñ¤¤ï¿½ï¿½ì¤¿ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½È¥Õ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ñ°Õ¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½Ê¤ï¿½ï¿½ï¿½Ð¤Ê¤ï¿½Ê¤ï¿½ï¿½ï¿½ï¿½Þ¤ï¿½ï¿½Ñ¸ï¿½Ç¤ï¿½ï¿½Ã¤Æ¤ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ê¸ï¿½ï¿½ï¿½Ñ´ï¿½ï¿½ï¿½Ã±ï¿½ï¿½Æ¥ï¿½ï¿½ï¿½ï¿½È¤ï¿½Ä¾ï¿½ï¿½ï¿½Â³ï¿½ï¿½ï¿½Ô¥ê¥ªï¿½É¡ï¿½ï¿½ï¿½ï¿½ï¿½Þ¤Îºï¿½ï¿½ï¿½Ê¤É¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤ï¿½É¬ï¿½×¤Ë¤Ê¤Ã¤Æ¤ï¿½ï¿½ë¡£word2vec for Lucene �Ǥ� Lucene����ǥå��������ϥ����ѥ��Ȥ����Ѥ��뤿�ᡢ��������������Ȥ����פǤ��롣 �ʲ��Ǥϴ�ñ�˻�Ƥ���������褦��word2vec for Lucene �Υǥ��¹Ԥ�������������롣��ޤ��ʼ��ϡ�
����word2vec for Lucene ����������� �Ȥ��������Ǥ��롣���Ǥ�Lucene����ǥå�������äƤ������ϼ�磲����������˼������Lucene����ǥå������磴�ǻ��ꤹ�뤳�Ȥˤ��Ƥ�褤��word2vec for Lucene��Lucene 4.10.2�١����Ǻ������������ʻ�Ƥ��ʤ����ˤ����Ƥ���Lucene 4.x ����ǥå�����ư����ΤȻפ��롣�������äơ�Solr 4.x �Ϥ������Τ��ȡ�Elasticsearch�ʥС������Ϥ褯�Τ�ʤ���1.x�ϰʹߡˤǺ�����������ǥå��������Ǥ��롣 ��磵�dzڤ�����̤����뤿��ˤϤ���ʤ���ç¤ï¿½Ê¥ï¿½ï¿½ï¿½ï¿½Ñ¥ï¿½ï¿½ï¿½É¬ï¿½×¤Ç¤ï¿½ï¿½ë¡£ï¿½ï¿½É´ï¿½ï¿½Ê¸ï¿½ï¿½Ê¤ï¿½ï¿½ç¤ï¿½ï¿½Lucene����ǥå�������äƤ������Ϥ��� word2vec for Lucene ���Ƥ������������� �ʲ����ò¼¨¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ã±ï¿½Î¤ï¿½ï¿½ï¿½ï¿½ /Users/koji/work �ǥ��쥯�ȥ�ʲ��� Solr �Ķ��� word2vec for Lucene �Ķ��ò¥¤¥ó¥¹¥È¡ï¿½ï¿½ë¤¹ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ã¤ï¿½Ê¤ï¿½ë¡£ ��磱��word2vec for Lucene �����������$ pwd /Users/koji/work $ git clone https://github.com/kojisekig/word2vec-lucene.git��磲���ǥ�ǡ����ν��� $ pwd /Users/koji/work $ cd word2vec-lucene # ���ꥸ�ʥ�� word2vec �Ǥ���Ѥ���Ƥ���Ѹ쥳���ѥ� text8 �����������ɤ��� Solr �������Ѵ� $ ant t8-solr $ ls -l text8* -rw-r--r-- 1 koji staff 100000000 6 9 2006 text8 -rw-r--r-- 1 koji staff 100017005 12 3 13:44 text8.txt -rw-r--r-- 1 koji staff 100017078 12 3 13:44 text8.xml -rw-r--r-- 1 koji staff 31344016 12 3 13:40 text8.zip��磳��Lucene/Solr����ǥå����ν����ʥǥ�ǡ�������Ͽ�� ���󥽡����2���Ѱդ��ƺǽ�Υ��󥽡����Solr��ư���롣 $ pwd /Users/koji/work # Solr����������� $ wget http://ftp.meisei-u.ac.jp/mirror/apache/dist/lucene/solr/4.10.2/solr-4.10.2.tgz $ tar xvzf solr-4.10.2.tgz # Solr�ε�ư $ cd solr-4.10.2/example $ java -Dsolr.solr.home=/Users/koji/work/word2vec-lucene/solrhome -Dsolr.dir=/Users/koji/word/solr-4.10.2 -jar start.jar 2�Ĥ�Υ��󥽡���Ǥ�text8�����ѥ���Solr����Ͽ���롣 $ pwd /Users/koji/work $ cd word2vec-lucene # Solr�������Ѵ������Ѹ쥳���ѥ���Solr����Ͽ $ ./post.sh collection1 text8.xml��磴��ñ��٥��ȥ�κ��� demo-word2vec.sh ������ץȤ�Solr����̾����ꤷ��word2vec��¹Ԥ��롣�¹Է�̤�vectors.txt�Ȥ����ե�����˽��Ϥ���롣 $ pwd /Users/koji/work $ cd word2vec-lucene $ ./demo-word2vec.sh collection1 $ ls -l vectors.txt -rw-r--r-- 1 koji staff 136053041 12 3 15:31 vectors.txt��磵��ñ��٥��ȥ��ͷ�� demo-distance.sh ��¹Ԥ����vectors.txt�ե�������ɤ߹���������Ԥ��ˤʤ롣������ñ������Ϥ���ȡ�����ñ��˺Ǥ�ᤤ�ʥ��������׻��˥٥��ȥ�����ñ����40�Ĥ�ɽ�����롣 $ ./demo-distance.sh cat Word: cat Position in vocabulary: 2601 Word Cosine distance ------------------------------------------------------------------------ cats 0.511078 dog 0.471308 dogs 0.469539 sighthound 0.452233 bobtail 0.436424 tapir 0.424105 demo-analogy.sh ��¹Ԥ����vectors.txt�ե�������ɤ߹���������Ԥ��ˤʤ�Τǡ�3�Ĥ�ñ��ò¥¹¥Ú¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ú¤ï¿½ï¿½ï¿½ï¿½ï¿½Ï¤ï¿½ï¿½ë¡£ï¿½Ê²ï¿½ï¿½Ç¤ï¿½man king woman�����Ϥ��Ƥ��뤬����man������king�δط���woman��Ŭ�Ѥ���Ȥɤ��ʤ뤫�פȤ�����̣�Ǥ��롣�����queen�������ͤˤʤ뤬���¹Է�̤⤽�Τ褦�ˤʤ롣 $ ./demo-analogy.sh man king woman���ܸ�ǻ��� �ʾ�μ��ϥ��ꥸ�ʥ�� word2vec �Ǥ���Ѥ��Ƥ��� text8 �Ȥ����Ѹ���������ѤߤΥ����ѥ���Ȥä����Ǥ��뤬���ʲ��Ǥ����ܸ�Ρ��������ѤߤǤʤ��˥����ѥ���Ȥä�����Ҳ𤷤褦���ǥ�ǡ����� livedoor �˥塼�������ѥ����Ѥ��롣���󥦥��åȤΥ����Ȥ��� Solr �ͥ��ƥ��ַ����˲ù����� livedoor �˥塼�������ѥ������������ɤ��롣 $ pwd /Users/koji/work $ cd word2vec-lucene $ mkdir work $ cd work $ pwd /Users/koji/work/word2vec-lucene/work $ wget http://www.rondhuit.com/download/livedoor-news-data.tar.gz $ tar xvzf livedoor-news-data.tar.gz $ cd .. ����������ɤ��� livedoor�˥塼�������ѥ���Solr����Ͽ����word2vec��¹Ԥ��롣����̾�ˤ�ldcc����ꤹ�롣�ޤ���-a���ץ�����Lucene��Analyzer���饹����ꤹ�롣�ޤ��ʲ��Ǥ�-f���ץ����ǽ�����٥��ȥ�ե�����̾����ꤷ�Ƥ���ʻ��ꤷ�ʤ��ȥǥե���Ȥ�vectors.txt�ȤʤäƤ��ޤ�����ۤɺ��������Ѹ��ñ��٥��ȥ�ե�������ñ¤¤ï¿½ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½Ë¡ï¿½ $ ./post.sh ldcc work/*.xml $ ./demo-word2vec.sh ldcc -a org.apache.lucene.analysis.ja.JapaneseAnalyzer -f vectors-ldcc.txt $ ./demo-distance.sh -f vectors-ldcc.txt �뺧 ����ǥ�ǡ����ʳ���Lucene����ǥå����ǻ���������� word2vec for Lucene �����ۤ��Ƥ���ǥ�¹ԥ�����ץ� demo-word2vec.sh �ò¸«¤Æ¤ï¿½é¤¦ï¿½È¤ï¤«ï¿½ï¿½è¤¦ï¿½Ë¡ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ä¤ï¿½ï¿½Î¥Ñ¥ï¿½á¡¼ï¿½ï¿½ï¿½ï¿½ï¿½Ï¡ï¿½ï¿½É¥ï¿½ï¿½ï¿½ï¿½Ç¥ï¿½ï¿½ó¥°¤Ç»ï¿½ï¿½ê¤µï¿½ï¿½Æ¤ï¿½ï¿½ë¡£ï¿½Ê²ï¿½ï¿½ï¿½Lucene�˴�Ϣ����ѥ�᡼���Ǥ��롣�����������Lucene����ǥå����˹礦�褦��Ŭ�����ꤷ�ʤ�����Lucene 4.x ����ǥå����Υǡ�����word2vec��ư���Ϥ��Ǥ��롣
-analyzer ���ץ����ˤĤ��Ƥϡ�stored�ǡ����ǤϤʤ�TermVector�Ǥ�褫�ä��Τ�����TermVector����¸���Ƥ���Lucene����ǥå����Ͼ��ʤ�������®��Ū�ˤ⵿����ä����ᡢstored�ǡ����򤳤��ǻ��ꤹ��Lucene��Analyzer���饹��analyze �������ˤ�Ȥä����ǤϤʤ�indexed�Ǥʤ���Ф����ʤ����Ȥ����ȡ��ʥ��ꥸ�ʥ�����Ǥ��word2vec�Ǥϥ����ѥ���2���ɤ�Ǥ��ơ�1���ܤ��ɤ߹��ߤ�ñ��ɽ��������Ƥ��뤬��indexed�Ǥ���Ф����ϰ�֤Ǵ�λ�����ʤ�Ȥ��äƤ�Lucene�äݤ�����Ǥ��롣�ޤ���word2vec�Dz��Ϥ������ʤ�褦��Lucene�ե�����ɤϤ����Ƥ�indexed�Ǥ���stored�ȹͤ�����Τǡ�¿����Lucene����ǥå��������ʡ��ˤϤ��ξ��ϼ����������Ϥ��Ǥ��롣 ��������ʸ�����äƤ���Lucene����ǥå�������äƤ������ϡ��ǥ�ǡ�����ư����ǧ�������ȡ�����word2vec for Lucene��������Lucene����ǥå�����ǻ�Ƥ����������ɤ�ʷ�̤ˤʤä��������Ƥ���������Ф��꤬������ 2014.01.30 Thursday
�ƥ����ȥ��Υơ������ġ��� brat ����������
brat �� OSS �Υƥ����ȥ��Υơ������ġ���Ǥ��롣GUI�ʥӥ��奢���������ˤȤǤ�������Υ��Υơ���������Τ狼��䤹���ʥ���ץ�ʥե����ޥåȤ�����NLP�ץ�����फ�鰷���䤹�����ˡ�Ƴ���δ�ñ�������Ф餷���ΤǤ����ǾҲ𤷤褦�� brat �Υ��󥹥ȡ���Apache �ʤɤ� Web �����о��ư�������ܳ�Ū�˻Ȥ���ˡ�ȡ��Ŀ�Ū�ˤ����ä�Ω���夲�ƻȤ���ˡ�����ꡢ�����Ǥϸ�Ԥ�Ҳ𤹤롣���ξ�祤�󥹥ȡ���ϴ�ñ�ǡ�brat ��Ŭ���ʥǥ��쥯�ȥ�˥���������ɤ���Ÿ�����롣 $ mkdir work $ cd work $ wget http://weaver.nlplab.org/~brat/releases/brat-v1.3_Crunchy_Frog.tar.gz $ tar xvzf brat-v1.3_Crunchy_Frog.tar.gz�����ơ����Τ褦�˥��󥹥ȡ���Υ�����ץȤ�¹Ԥ��Ƽ����������н����Ǥ��롣 $ cd brat-v1.3_Crunchy_Frog $ ./install.sh # brat �˥������󤹤�桼��̾�ò¿·µï¿½ï¿½ï¿½ï¿½ê¤¹ï¿½ë¡£ Please the user name that you want to use when logging into brat editor # �ѥ���ɤ�����롣 Please enter a brat password (this shows on screen) annotate # �᡼�륢�ɥ쥹������롣 Please enter the administrator contact email [email protected]�����ѥ��ν��� ���˥��Υơ�������������оݤΥ����ѥ��� data �ǥ��쥯�ȥ�β����Ѱդ��롣�����Ǥ� livedoor�˥塼�������ѥ���Ȥä���ò¼¨¤ï¿½ï¿½ï¿½ $ wget http://www.rondhuit.com/download/ldcc-20120915.tar.gz $ tar xvzf ldcc-20120915.tar.gz�����ѥ��ե������ UTF-8 ����¸���졢.txt �γ�ĥ�ҤǤʤ���Фʤ�ʤ�������ˡ����Τ褦�˥��Υơ�������¹Ԥ��� .txt �ե�����ҤȤĤҤȤĤ��б�������� .ann �ե�����ò¤¢¤é¤«ï¿½ï¿½ï¿½ï¿½ï¿½Ñ°Õ¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½É¬ï¿½×¤ï¿½ï¿½ï¿½ï¿½ë¡£ $ find text -name '*.txt' | sed -e 's|\.txt|.ann|g' | xargs touch��Íɽ����������� brat �ε�ư brat �Ϸ�������Τ褦�ʹ�¤Ū���Υơ�������Ĥ��뤳�Ȥ��Ǥ��뤬���ܵ����Ǥϸ�Íɽ��������Ĥ�����ò¼¨¤ï¿½ï¿½ï¿½ï¿½Ç¥Õ¥ï¿½ï¿½ï¿½È¤Î¾ï¿½ï¿½Ö¤Ç¤â¤¤ï¿½ï¿½ï¿½Ä¤ï¿½ï¿½Î¸ï¿½Íɽ�����������ꤵ��Ƥ��뤬�������Ǥ��ȼ����������ꤷ�Ƥߤ롣livedoor�˥塼�������ѥ��� tar ��Ÿ�������Ȥ��ˤǤ��� text �ǥ��쥯�ȥ��ľ���ˡ����Τ褦��2�Ĥ�����ե������������롣 $ cd text $ cat > annotation.conf <<EOF > [entities] > ORGANIZATION > FACILITY > PERSON > TITLE > LOCATION > > [relations] > [events] > [attributes] > EOF $ $ cat > visual.conf <<EOF > [labels] > > [drawing] > ORGANIZATION bgColor:#8fb2ff > FACILITY bgColor:#aaaaee > PERSON bgColor:#ffccaa > TITLE bgColor:#7fe2ff > LOCATION bgColor:#6fffdf > EOF�ǽ������ե�����ϥ��������ꡢ2�Ĥ�ϥ����ο�������Ǥ��롣�ʤ�������ɬ�׺���¤�����Ǥ��롣����ե�������������ꤷ�������ϡ�brat �ǥ��쥯�ȥ��������ե����뤬����Τǡ�������ò»²¾È¤ï¿½ï¿½Æ¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ �ʾ�����꤬�����С�brat �ò¥¹¥ï¿½ï¿½ï¿½É¥ï¿½ï¿½ï¿½ï¿½ï¿½â¡¼ï¿½É¤Çµï¿½Æ°ï¿½Ç¤ï¿½ï¿½ë¡£ï¿½Ê¤ï¿½ï¿½ï¿½brat �� Python �ǽñ¤«¤ï¿½Æ¤ï¿½ï¿½ï¿½Î¤Ç¡ï¿½Python �δĶ���ɬ�פǤ��롣 $ cd ../.. $ python standalone.pybrat �ؤΥ�������
��ε�ư����ɽ�����줿 URL http://127.0.0.1:8001 �˥֥饦�����饢���������롣����ȡ����Τ褦�ʲ��̤�ɽ������롣
�������󤷤��饳���ѥ����쥯���������򤹤롣����ˤϥ��֥����򲡲����뤫����Υ�˥塼���� [Collection] ������ǥ���å����롣����ȡ����Υ����ѥ����쥯�����������̤ˤʤ롣 ���ΤȤ����Ǥˡ������Х����ɤ��б����� .ann �ե�����ˤϼ��Τ褦�˥��Υơ��������󤬥���ץ�ʥե����ޥåȤǵ�Ͽ����Ƥ��롣 $ cat movie-enter-5840081.ann T1 PERSON 81 86 �ܺꤢ���� T2 PERSON 87 90 ����� T3 TITLE 1046 1048 ���� T4 PERSON 447 450 ����� T5 PERSON 975 977 �ܺ� T6 TITLE 977 979 ���� T7 PERSON 980 981 �� T8 TITLE 981 983 ���� T9 PERSON 1021 1023 �ܺ� T10 PERSON 1196 1197 �� T11 PERSON 1245 1247 �ܺ� T12 PERSON 1254 1255 �� T13 PERSON 947 949 �ܺ�������Ĥ������֤ǵ�Ͽ����Ƥ���Τǡ��ץ��������ɤ߹���Ȥ��ϡ���������ǥ����Ȥ���ɬ�פ������������
Solr��Mahout�Υȥ졼�˥󥰥���������������2��μ��ּԤ��罸��Ǥ��� 2014.01.20 Monday
Heliosearch/Solr���եҡ��ץե��륿
ɮ�Ԥ�Yonik�ε��Ĥ�����������������ʸ�Ϥ������� Solr�Υѥե����ޥ󥹤���ʹ���Heliosearch�Ȥ��������������ץ󥽡����ץ��������Ȥ˿�������ɲä��줿��ǽ�����եҡ���Native�ե��륿���� ������JVM�ҡ��פ�����Ū����JVM���ç¤ï¿½ï¿½ï¿½Ò¡ï¿½ï¿½×¤Î½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½è¤·ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½Ê¤ï¿½ï¿½Ã¤ï¿½ï¿½ï¿½ï¿½Ò¡ï¿½ï¿½×¤ï¿½ï¿½ç¤ï¿½ï¿½ï¿½È¥ï¿½ï¿½ï¿½ï¿½Ù¥Ã¥ï¿½ï¿½ï¿½ï¿½ì¥¯ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ñ¤·¤ï¿½ï¿½Ê¤ê¡¢GC�ݡ����ˤ�ä�Ĺ���֥����ƥब��ߤ����ۤ��ν��������ڤǤ��ʤ��ʤ����¿�������Τ��᥯���꡿�ꥯ�����ȥ����ॢ���Ȥ�ȯ�������ꡢSolrCloud�⡼�ɤ�zookeeper���å����Υ����ॢ���ȤޤǤ�ȯ�������ǽ�������롣 ���եҡ��ץե��륿���˹��٤ʥե��륿����å��󥰵�ǽ���Ѱդ��Ƥ���Heliosearch/Solr���������ץꥱ�������ˤ�äƤ�����ʥ������Ѥ����ǽ�������롣�ç¤ï¿½ï¿½Â¸Â³ï¿½ï¿½ï¿½Ö¤ï¿½Ä¹ï¿½ï¿½ï¿½ï¿½ï¿½Ö¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½È¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½JVM�ҡ��פ�����Ф�������Ū�˴���������åȤ������롣���եҡ��ץ���ϥ����٥å����쥯�����鸫���ʤ��Τ��� Heliosearch�ե��륿��Solr DocSet���֥������ȡˤϥ��եҡ��פǥ��������Ȥ����褦�ˤʤäƤ��ꡢ���פˤʤ�Ф����˲����Ǥ���褦���Ȥ�������Ȥ����褦�ˤʤä����ޤ���JVM GC�Ϥ��Τ褦�ʥ���֥��å��Υ��ԡ������ǻ��֤�̵�̤ˤ���ɬ�פ��ʤ��ʤä������Τ��Ȥ�GC��Ĺ���֥ݡ������ӽ������ꥯ�����ȥ��롼�ץåȤ��������Τ���Ω�äƤ��롣 �ƥ������������̤������Τ��ä�Ĺ���֤�GC�ݡ�����Ƹ�����ˤϤ��ʤꤤ�������ʤ��Ȥ�ɬ�פ�����ȹͤ��Ƥ�����������1���ܤǤ����ʤꤳ�줬�Ƹ�����Ƥ��ޤä����ҡ��ץ����������Τ褦���ç¤ï¿½Ê¤ï¿½Î¤Ç¤Ï¤Ê¤ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Ï¾ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½Þ¤Ã¤Æ¤ï¿½ï¿½ë¡£ï¿½Ò¡ï¿½ï¿½×¤ï¿½ï¿½ç¤ï¿½ï¿½ï¿½ï¿½GC�ݡ������Ӥ�褦���� �ƥ��Ȥξܺ�
Apache Solr���ޥ�ɥ饤�� $ java -jar -Xmx4G start.jarHeliosearch/Solr���ޥ�ɥ饤�� $ java -jar start.jar Apache Solr�¹Ի���OOM�㳰����򤹤뤿��˥ҡ��ץ�������4G�Х��Ȥ����ꤹ��ɬ�פ����ä�����ܲ�ǽ��RAM�Ϻ���8G���ä����ᡢ�Ĥ�Υ���ϥ���ǥå����ե�����Υ���å����ѤȤ���OS�˻Ĥ��Ƥ��������ä��ʤ������ʤ���®�٤�����������Ƥ��ޤ��ˡ� GC�η��2��������ꥯ�����Ȥ�Ԥä�GC�μ¹Է�̤�ʲ��Υ���դ˼����� ���졼������GC���פ������֡������ϥҡ��פμ¥������������������ϥҡ��פμ»����̤򼨤��Ƥ��롣 Solr
Solr�Ǥ�Ĺ���֥ݡ�����ƥ�����˳����鸫��ΤϤ�����ưפ��ä���������Í���ˤʤäƤ�������ꥯ�����Ȥ����٤ƥ�����å�������Ĥ��������ߥʥ�ǤϹ�®���������뤬ȯ�������������ơ�GC���絬�Ϥʥ���ѥ������ȯ������ȥ����ߥʥ�Υ����������������ߤ�����
�����٥å����쥯�����ν�����û���֤Ǥ��ि���Heliosearch GC����դ�������λ���ᤤ��Ĺ���֤δ�����GC�ݡ������ۤȤ��ȯ�����Ƥ��餺���ۤ���GC�ݡ����������˸������Ƥ��뤳�Ȥ����ܤ�������
���Υ���դϡ�2���������¹Ԥ����Ȥ��θ�Ⱦ��1��������ʥۥåȥ��ݥåȤȥ���å��夬����夤�������ߥ󥰤򸫷פ餦����ˤ��Ԥ����֤�ѡ�����ȤǼ����Ƥ��롣
�ץ������ʥȥå׷�ͳ�dz�������ƻ�ˤξ�������������̤�5��ˤ錄�äƷ�¬������
����δ�ñ�ʥƥ��Ȥˤ�ꡢ���եҡ��ץե��륿���ç¤ï¿½Ê°Û¾ï¿½ï¿½Í¤ò¸º¤é¤·ï¿½ï¿½ï¿½ï¿½ï¿½Î¤Î¥ï¿½ï¿½ï¿½ï¿½ê¥¹ï¿½ë¡¼ï¿½×¥Ã¥È¤ï¿½ï¿½á¡¢Ä¹ï¿½ï¿½ï¿½Ö¤ï¿½GC�ݡ������ӽ����ƥꥯ�����Ȥ�ͽ¬���ưפˤ��뤳�Ȥ�Ƚ�������� �����ȤǤ������������HeliosearchForum�Ǵ��ۤ�ʹ������������������ Solr��Mahout�Υȥ졼�˥󥰥���������������2��μ��ּԤ��罸��Ǥ��� |
+ Solr�ˤ��֥����⸡��
+ PROFILE
+ LINKS
�������󥸥����� - ��ӤΥݥ����
���Ѹ������󥸥������������ô���Ԥ��ɤޤʤ��Ǥ�������������å�������ޤ����顦���� >>������� 10�Υݥ���� + Lucene&Solr�ǥ�
+ ThinkIT����
+ RECOMMEND
![]()
Apache Solr���� �������ץ󥽡�����ʸ�������󥸥� ��JUGEM��ӥ塼 »ï¿½ï¿½
�ظ� ����,���� ����,���� ��ʿ,���� ��,��ë ��
+ RECOMMEND
![]()
Lucene in Action ��JUGEM��ӥ塼 »ï¿½ï¿½
Erik Hatcher,Otis Gospodnetic,Mike McCandless FastVectorHighlighter�ˤĤ��Ʋ��âµï¿½ï¿½ï¿½ï¿½ï¿½Æ¤ï¿½ï¿½Þ¤ï¿½ï¿½ï¿½ï¿½ï¿½
+ RECOMMEND
+ SELECTED ENTRIES
+ RECENT COMMENTS
+ RECENT TRACKBACK
+ CATEGORIES
+ ARCHIVES
+ MOBILE
+ SPONSORED LINKS
|
(C) 2024 �֥��� JUGEM Some Rights Reserved.
|
PAGE TOP |