ÆÍÁ³¤Ç¤¹¤¬¡¤mecab¤Î¼½ñ (mecab-ipadic) ¤ò¥Ç¥Õ¥©¥ë¥È¤Î¤Þ¤Þ»È¤Ã¤Æ¡¤mecab°Õ³°¤È»È¤¨¤Í¤§¤È¤«Ê¸¶ç¸À¤Ã¤Æ¤ë°¤¤»Ò¤Ï¤ª¤é¤ó¤«¤Í¡©
mecab-ipadic ¤ÏÈæ³ÓŪ¤ª¹Ôµ·¤Î¤è¤¤ÆüËܸì¤ò¥Ù¡¼¥¹¤Ëºî¤é¤ì¤Æ¤¤¤ë¤Î¤Ç¡¤¤½¤Î¤Þ¤Þ¤Ç¤Ï web¾å¤Î¸ý¸ìʸÂΤΥƥ¥¹¥È¤Ï¤¦¤Þ¤¯°·¤¨¤Ê¤¤¤³¤È¤¬¤¢¤ê¤Þ¤¹¡£ËÜÍè¤Ï¶µ»Õ¥Ç¡¼¥¿¤òÍÑ°Õ¤·¡¤³Ø½¬¤µ¤»¤ë¤È¤¤¤Ã¤¿¼êË¡¤ò»È¤¦¤Î¤¬Àµ¹¶Ë¡¤À¤È»×¤¤¤Þ¤¹¤¬¡¤¤È¤ê¤¢¤¨¤ºÌ¾»ì¤ò½¼¼Â¤µ¤»¤ë¤À¤±¤Ç¤â¼ÂÍÑÅ٤ϤÀ¤¤¤Ö¾å¤¬¤ë¤Ç¤·¤ç¤¦¡£
¿Í´Ö¤ÎÏ乸À¸ì¤Ë¤Ï¡¤Æ°»ì¤Î¸ì´´¤ä̾»ì¤Ë¤ÏÆü¡¹¿·¤·¤¯¸ì×ä¬Áý¤¨¤ë¤±¤É¡¤½õ»ì¤ä³èÍѤΥ롼¥ë¤Ï´Êñ¤Ë¤ÏÊѲ½¤·¤Ê¤¤¡¤¤È¤¤¤¦ÆÃÀ¤¬¤¢¤ê¤Þ¤¹¡£Æäˡ֤¤¤ÞºÇ¤â¤Ä¤Ö¤ä¤«¤ì¤Æ¤¤¤ëñ¸ì¥é¥ó¥¥ó¥°¡×¤È¤¤¤Ã¤¿½¸·×¤ò¤¹¤ë¤è¤¦¤Ê¾ì¹ç¤Ï¡¤Ì¾»ì¤ÎÈϰϤÎÀÚ¤ê½Ð¤·¤µ¤¨´Ö°ã¤¨¤Ê¤±¤ì¤Ð¤½¤ì¤Ê¤ê¤Î·ë²Ì¤ò½Ð¤»¤ë¤³¤È¤â¿¤¤¤Î¤Ç¤¹¡£
¤¿¤À¡¤¼½ñ¤Ø¤Îñ¸ìÄɲäϤ³¤³¤Ë¤¢¤ëÄ̤ê´Êñ¤Ë¤Ç¤¤ë¤Î¤Ç¤¹¤¬¡¤Ã±¸ì¤ÎÀ¸µ¯¥³¥¹¥È¤ò·è¤á¤ëÉôʬ¤Çíµ¤¤¤Æ¤·¤Þ¤¦¤³¤È¤â¿¤¤¤È»×¤¤¤Þ¤¹¡£
¤½¤³¤Ç¡¤¤¦¤Á¤Ç°ÊÁ°¤«¤é»È¤Ã¤Æ¤¤¤¿ mecab ¤Î¼½ñÁý¶¯ÍѤΥե졼¥à¥ï¡¼¥¯¤ò¸ø³«¤¹¤ë¤³¤È¤Ë¤·¤Þ¤·¤¿¡£wikipedia ¤Î¥Ç¡¼¥¿¤ä´éʸ»ú¼½ñ¤Ê¤É¤«¤é¥æ¡¼¥¶¼½ñ¤òºîÀ®¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
mecab-dic-overdrive
GenDic.pm ¤Î¥µ¥Ö¥¯¥é¥¹¤òºîÀ®¤¹¤ë¤³¤È¤Ç¡¤¤µ¤Þ¤¶¤Þ¤Ê·Á¼°¤ÎÆþÎϥǡ¼¥¿¤«¤éñ¸ì¤òÆɤ߼è¤ê¡¤(¤½¤ì¤Ê¤ê¤Ë)ŬÀÚ¤ÊÀ¸µ¯¥³¥¹¥È¤ò¼«Æ°Åª¤Ë¿ä¬¤·¤Æ¥æ¡¼¥¶¼½ñ¥Õ¥¡¥¤¥ë¤òÀ¸À®¤·¤Æ¤¯¤ì¤ë»ÅÁȤߤˤʤäƤ¤¤Þ¤¹¡£¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï wikipedia ÆüËܸìÈǤΠjawiki-latest-page.sql.gz ¤È´éʸ»ú¼½ñÍѤÎtsv¤È¤«¤é¡¤¤½¤ì¤¾¤ì¥æ¡¼¥¶¼½ñ¤òºîÀ®¤¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
»÷¤¿¤è¤¦¤Ê¥¹¥¯¥ê¥×¥È¤äµ»ö¤¬¤¹¤Ç¤Ë¤¤¤¯¤Ä¤«¸ø³«¤µ¤ì¤Æ¤¤¤ë¤Î¤Ç¤¢¤¨¤Æ¸ø³«¤¹¤ë¤³¤È¤â¤Ê¤¤¤«¤Ê¤È»×¤Ã¤Æ¤¤¤¿¤Î¤Ç¤¹¤¬¡¤¸å¤Ç½Ò¤Ù¤ë¤è¤¦¤Ë¡¤À¸µ¯¥³¥¹¥È¤Î·×»»ÊýË¡¤ä¡¤¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤Þ¤Ç´Þ¤á¤¿¼½ñ´ÉÍý¤Ë¿¾¯¤ÎÆȼ«À¤È¤¤¤¦¤«¸ø³«¤¹¤ë°ÕµÁ¤¬¤¢¤ëµ¤¤¬¤·¤Þ¤·¤¿¤Î¤Ç¡£²¿¤«¤Î»²¹Í¤Ë¤Ê¤ì¤Ð¹¬¤¤¤Ç¤¹¡£
mecab-dic-overdrive¤Îµ¡Ç½
¼½ñ¤Îutf-8²½
mecab¤ò»È¤¦¤Î¤Ëipadic¼«ÂΤòutf-8²½¤¹¤ëɬÍפÏɬ¤º¤·¤â¤Ê¤¤¤Î¤Ç¤¹¤¬¡¤¼¡¤Ë½Ò¤Ù¤ë¼½ñ¥Ñ¥Ã¥Á¤òºî¤ë¾ì¹ç¤ä¡¤³Æ¼ï¥×¥í¥°¥é¥à¤«¤é»²¾È¤¹¤ë¾ì¹ç¤Ê¤É¤Ë¤Ï utf-8 ¤ÎÊý¤¬ÊØÍø¤Ê¤Î¤Ç¡¤ºÇ½é¤Ëʸ»ú¥³¡¼¥É¤ÎÊÑ´¹¤ò¤·¤Þ¤¹¡£
¼½ñ¤Ø¤Î¥Ñ¥Ã¥ÁŬÍÑ
misc/dic/*.patch ¤Ë¡¤ipadic ¤ËÂФ¹¤ë¥Ñ¥Ã¥Á¤¬¤¤¤¯¤Ä¤«ÍÑ°Õ¤·¤Æ¤¢¤ê¤Þ¤¹¡£"A" "B" ¤Ê¤É¤Î±Ñ¿ô»ú¤¬Ã±ÆȤÇÀÚ¤ê½Ð¤µ¤ì¤Ë¤¯¤¯¤Ê¤ë¤¿¤á¤ÎÊѹ¹¤ä¡¤"¤î" "¤ç" ¤Ê¤É¤¬½õ»ì¤È¤·¤Æǧ¼±¤µ¤ì¤ë¤è¤¦¤Ë¤Ê¤ë¤¿¤á¤Î¥Ñ¥Ã¥Á¤¬´Þ¤Þ¤ì¤Þ¤¹¡£¤³¤Î¾¤Ë¤â¼«Á°¤Ç²¿¤«Êѹ¹¤ò²Ã¤¨¤¿¤¤¾ì¹ç¤Ï *.patch ¥Õ¥¡¥¤¥ë¤ò (utf-8¤Ç) ½ñ¤¤¤Æ¤³¤³¤ËÃÖ¤¤¤Æ¤ª¤¯¤È¼«Æ°Åª¤ËŬÍѤµ¤ì¤Þ¤¹¡£
¼½ñ¤Î¥Î¡¼¥Þ¥é¥¤¥º
¼½ñ¤ò͸ú³èÍѤ¹¤ë¤¿¤á¤Ë¤Ï¡¤
- ¥¢¥ë¥Õ¥¡¥Ù¥Ã¥È¤ò¾®Ê¸»ú¤ËÅý°ì¤¹¤ë
- HTML¥¿¥°¤ò½üµî¤¹¤ë
- HTML¥¨¥ó¥Æ¥£¥Æ¥£¤ò¥Ç¥³¡¼¥É¤¹¤ë
- ÇÈ¥À¥Ã¥·¥å¤äĹ²»Éä¤Î»È¤¤Êý¤òÅý°ì¤¹¤ë
- NFKCÀµµ¬²½
- etc.
¤Ê¤É¡¤¤µ¤Þ¤¶¤Þ¤Ê¼êË¡¤ò¶î»È¤·¤Æɽ¸½Íɤì¤òµÛ¼ý¤·¤Æ¤ª¤¯É¬Íפ¬¤¢¤ê¤Þ¤¹¡£¼½ñºîÀ®»þ¤Èʸ¾Ï²òÀÏ»þ¤ÎξÊý¤ÇƱ¤¸¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤òŬÍѤ¹¤ë¤Î¤â½ÅÍפÊÃí°ÕÅÀ¤Ç¤¹¡£
¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï°Ê²¼¤Î¥Î¡¼¥Þ¥é¥¤¥º½èÍý¤¬¤³¤ÎÄ̤ê¤Î½ç¤ÇŬÍѤµ¤ì¤Þ¤¹¡£NFKC¤Èlc°Ê³°¤Ï¥Ð¥Ã¥É¥Î¥¦¥Ï¥¦¤Î²ô¤Ç¤¹¡£²þ¹Ô¤Î°·¤¤¤Ê¤É¤Ï¼½ñºîÀ®»þ¤Ë¤Ï̵³²¤Ç¤¹¤¬¡¤Æä˴éʸ»ú¤äµ¹æ¤ò´Þ¤à¥Æ¥¥¹¥È¤ËÂ礤¯±Æ¶Á¤¹¤ëÀßÄê¤â´Þ¤Þ¤ì¤ë¤Î¤Ç¡¤É¬¤º¡¤²òÀÏ»þ¤Ë»È¤¦Àµµ¬²½¤ÈƱ¤¸¤â¤Î¤òÀßÄꤹ¤ë¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤¡£
- decode_entities : HTML¥¨¥ó¥Æ¥£¥Æ¥£¤ò¥æ¥Ë¥³¡¼¥Éʸ»ú¤Ë¥Ç¥³¡¼¥É [ ♥ ¢ª ♥ ]
- strip_single_nl : ñÆȤβþ¹Ô¤ò½üµî (Æó¤Ä°Ê¾åϢ³¤¹¤ë²þ¹Ô¤Ï¶èÀÚ¤ê¤È¸«¤Ê¤¹)
- wavetilde2long : ÇÈ¥À¥Ã¥·¥å¤òĹ²»µ¹æ¤ËÃÖ¤´¹¤¨¤ë [ ¥×¡Á ¢ª ¥×¡¼ ]
- fullminus2long : Á´³Ñ¥Þ¥¤¥Ê¥¹µ¹æ¤òĹ²»µ¹æ¤ËÃÖ¤´¹¤¨¤ë [ ¥×¡Ý ¢ª ¥×¡¼ ]
- dashes2long : ¥À¥Ã¥·¥åÁ´È̤òĹ²»µ¹æ¤ËÃÖ¤´¹¤¨¤ë [ ¥×— ¢ª ¥×¡¼ ]
- drawing_lines2long : ·ÓÀþ¤Ë»È¤ï¤ì¤ë²£Àþ¤Ê¤É¤òĹ²»µ¹æ¤ËÃÖ¤´¹¤¨¤ë (»²¹Í:[1] [2]) [ ¥×¨¡ ¢ª ¥×¡¼ ]
- unify_long_repeats : Ϣ³¤¹¤ëĹ²»µ¹æ¤òĹ²»µ¹æ°ì¸Ä¤ËÃÖ¤´¹¤¨¤ë [ ¥×¡¼¡¼¡¼ ¢ª ¥×¡¼ ]
- nfkc : NFKCÀµµ¬²½ [ ¥Õ¡¬ÌÞ¢ª ¥×¥× ]
- lc : ¥¢¥ë¥Õ¥¡¥Ù¥Ã¥È¤ò¾®Ê¸»ú¤ËÅý°ì [ ABC ¢ª abc ]
Êѹ¹¤·¤¿¤¤¾ì¹ç¤Ï lib/MecabTrainer/NormalizeText.pm ¤ò»²¾È¤Î¾å¡¤etc/config.pl ¤ÎÆâÍƤòÊÔ½¸¤·¤Þ¤¹¡£bin/normalize_text.pl ¤ò»È¤Ã¤Æ¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤Î·ë²Ì¤ò³Îǧ¤¹¤ë¤³¤È¤â¤Ç¤¤Þ¤¹¡£
>bin/normalize_text.pl ·À¨¬¨¬¨¬¨¬¨¬¨¬(ߢÏß)¨¬¨¬¨¬¨¬¨¬¨¬ !!!!! ¥¥¿¡¼(゚¢Ï゚)¡¼ !!!!! >bin/normalize_text.pl --normalize_opts=decode_entities,nfkc ㍖ ½ ¥ì¥ó¥È¥²¥ó 1⁄2
ñ¸ìÀ¸µ¯¥³¥¹¥È¤Î¼«Æ°³ä¤êÅö¤Æ
¿·¤·¤¯Ã±¸ì¤òÅÐÏ¿¤¹¤ë¾ì¹ç¤ËÌäÂê¤Ë¤Ê¤ë¤Î¤¬¡¤¾å¤Ç½Ò¤Ù¤¿Ã±¸ìÀ¸µ¯¥³¥¹¥È¤Î»»½Ð¤Ç¤¹¡£¤³¤³¤Ç"É¡¥»¥ì¥Ö" ¤È¤¤¤¦¾¦ÉÊ̾¤òÎã¤Ë¡¤Ã±¸ìÀ¸µ¯¥³¥¹¥È¤ÎÄ´À°¤Î¤·¤«¤¿¤ò¹Í¤¨¤Æ¤ß¤Þ¤·¤ç¤¦¡£
É¡¥»¥ì¥Ö(¥¦¥µ¥®¸ÂÄê)¤Ð¤«¤êÇã¤Ã¤Æ¤ë¿Í¤ÎÎã
ñ¸ì¤¬Ã±ÂΤǸ½¤ì¤¿¾ì¹ç¤Ë¡¤Ê¬³ä¤µ¤ì¤Ê¤¤¤®¤ê¤®¤ê¤Î¥é¥¤¥ó¤òµá¤á¤ëÊýË¡
ÁǤμ½ñ¤Ç"É¡¥»¥ì¥Ö"¤À¤±¤«¤é¤Ê¤ëʸ¤ò mecab ¤Ç²òÀϤ¹¤ë¤È°Ê²¼¤Î¤è¤¦¤Ë¡ÖÉ¡¡×¤È¡Ö¥»¥ì¥Ö¡×¤¬ÊÌ¡¹¤Îñ¸ì¤È¤·¤Æǧ¼±¤µ¤ì¤Æ¤·¤Þ¤¤¤Þ¤¹¡£
·ÁÂÖÁÇ | Ï¢ÀÜ¥³¥¹¥È | ñ¸ìÀ¸µ¯¥³¥¹¥È | ÎßÀÑ¥³¥¹¥È |
---|---|---|---|
BOS | - | 0 | 0 |
-283 | - | -283 | |
É¡(̾»ì/°ìÈÌ) | - | 6033 | 5750 |
62 | - | 5812 | |
¥»¥ì¥Ö(̾»ì/°ìÈÌ) | - | 9461 | 15273 |
-573 | - | 14700 | |
EOS | - | 0 | 14700 |
(BOS¤ÏʸƬ¡¤EOS¤Ïʸ¤Î½ª¤ï¤ê¤òɽ¤·¤Þ¤¹¡£)
¤½¤³¤Ç¡¤Ã±¸ì¡ÖÉ¡¥»¥ì¥Ö¡×¤¬Ã±ÂΤÎʸ¾Ï¤È¤·¤Æ¸½¤ì¤¿¾ì¹ç¤Ë¡¤¤½¤ì°Ê¾åʬ³ä¤µ¤ì¤Ê¤¤¤è¤¦¤Ë¤¹¤ë¤³¤È¤òÌÜɸ¤È¤·¤Æ¤ß¤Þ¤¹¡£
¤Þ¤º¼½ñ¤Ë·ÁÂÖÁÇ¡ÖÉ¡¥»¥ì¥Ö(¸ÇÍ̾»ì/°ìÈÌ)¡×¤òÄɲä·¤Þ¤¹¡£¤½¤·¤Æ¡¤mecab ¤¬¡Ö¡ØÉ¡+¥»¥ì¥Ö¡Ù¤Ëʬ²ò¤¹¤ë¤è¤ê¡ØÉ¡¥»¥ì¥Ö¡ÙñÂΤȤ·¤¿Êý¤¬¥È¡¼¥¿¥ë¥³¥¹¥È¤¬Ä㤤¡×¤ÈȽÃǤ¹¤ë¤è¤¦¤Ëñ¸ìÀ¸µ¯¥³¥¹¥È¤òÄ´À᤹¤ë¤³¤È¤ò¹Í¤¨¤Þ¤¹¡£
¤Ä¤Þ¤ê¡¤
·ÁÂÖÁÇ | Ï¢ÀÜ¥³¥¹¥È | ñ¸ìÀ¸µ¯¥³¥¹¥È | ÎßÀÑ¥³¥¹¥È |
---|---|---|---|
BOS | - | 0 | 0 |
-310 | - | -310 | |
É¡¥»¥ì¥Ö(¸ÇÍ̾»ì/°ìÈÌ) | - | *1 | * |
-919 | - | * | |
EOS | - | 0 | *2 |
¾åɽ¤Î *1 ¤ò²¿¤Ë¤¹¤ì¤Ð *2 ¤¬ 14700 °Ê²¼¤Ë¤Ê¤ë¤«¡© ¤È¤¤¤¦·êËä¤áÌäÂê¤ò²ò¤¯¤³¤È¤Ë¤Ê¤ë¤ï¤±¤Ç¤¹¡£¤³¤Î¾ì¹ç¤Ï *1 ¤ò 15928 °Ê²¼¤Ë¤¹¤ì¤Ð¡¤Á´ÂΤΥ³¥¹¥È¤¬¡ÖÉ¡+¥»¥ì¥Ö¡×¤Î14700¤è¤ê¤âÄ㤯¤Ê¤ê¤Þ¤¹¡£
¢¨1¡ÖÌÀÆü¤ÎÉ¡¥»¥ì¥Öº×¤ê¤ÏÃæ»ß¤Ç¤¹¡×¤Î¤è¤¦¤ËÁ°¸å¤Ë¾¤Î·ÁÂÖÁǤ¬¤Ä¤Ê¤¬¤ë¾ì¹ç¤Ï¡¤Á°¸å¤ÎÏ¢ÀÜ¥³¥¹¥È¤¬ÊѤï¤Ã¤Æ¤¤Þ¤¹¡£¡ÖñÂΤÎʸ¾Ï¤È¤·¤Æ(BOS¤ÈEOS¤Î´Ö¤Ë)¸½¤ì¤¿¾ì¹ç¤Ëʬ³ä¤µ¤ì¤Ê¤¤¤è¤¦¤Ë¤¹¤ë¡×¤È¤¤¤¦¥ë¡¼¥ë¤Ï¤¢¤¯¤Þ¤Ç¤â×ó°ÕŪ¤Ê´ð½à¤Ë¤¹¤®¤Þ¤»¤ó¡£
¢¨2 ¤È¤¤É¤¤³¤³¤Ë¤¢¤ëAuto Link¤ÎÎã¤Ë½¾¤Ã¤Æ¡¤cost = (int)max(-36000, -400 * (length^1.5)) ¤È¤¤¤¦¼°¤ò¤½¤Î¤Þ¤Þ»È¤Ã¤Æ¤¤¤ëµ»ö¤ò¸«¤«¤±¤Þ¤¹¤¬¡¤¤³¤Î¼°¤Ï¤¢¤¯¤Þ¤Ç¤³¤Î¼½ñ¤À¤±¤ò»È¤Ã¤Æ mecab ¤ò AutoLink ÀìÍѤËÍѤ¤¤ë¾ì¹ç ¤òÁÛÄꤷ¤Æ½ñ¤«¤ì¤¿¤â¤Î¤Ç¡¤¤³¤ì¤ò ipadic ¤Èº®¤¼¤ë¤È´ð½àÃͤ¬¹ç¤ï¤Ê¤¯¤Ê¤ë¤È»×¤¤¤Þ¤¹¡£ipadic¤Ë¤¢¤ëÀ¸µ¯¥³¥¹¥È¤Ï»Í·å¤°¤é¤¤¤Þ¤Ç¤ÎÀµ¤Î¿ô¤Ç¤¹¤¬¡¤¤³¤Î¼°¤À¤È¥³¥¹¥È¤¬¥Þ¥¤¥Ê¥¹¤Ë¤Ê¤ë¤Î¤Ç¡¤Ê¸Ì®¤Ë´Ø¤ï¤é¤º¤Û¤Ü¾ï¤Ë¥æ¡¼¥¶¼½ñ¤Î¥¨¥ó¥È¥ê¤¬Í¥À褵¤ì¤ë¤Ç¤·¤ç¤¦¡£(ÌÞÏÀ¤½¤¦¤¤¤¦°Õ¿Þ¤Ê¤é¤½¤ì¤Ç¹½¤ï¤Ê¤¤¤Î¤Ç¤¹¤¬¡£)
´û¸¼½ñ¤«¤é¡¤Æ±¤¸ÉÊ»ì&Ʊ¤¸Ä¹¤µ¤Î·ÁÂÖÁǤÎÊ¿¶Ñ¥³¥¹¥È¤ò·×»»¤·¤Æ¤ª¤¯ÊýË¡
¾å¤È¤ÏÊ̤ˡ¤¤â¤¦¾¯¤·Ã±½ã¤ËÀ¸µ¯¥³¥¹¥È¤ÎÌܰ¤òÆÀ¤ëÊýË¡¤â¤¢¤ê¤Þ¤¹¡£
Î㤨¤Ð´û¸¤Îipadic¤ÎÃ椫¤é¡Ö¸ÇÍ̾»ì/°ìÈ̡פÎñ¸ì¤À¤±¤ò¼è¤ê½Ð¤·¡¤Ã±¸ì¤ÎŤµ¤´¤È¤ËÀ¸µ¯¥³¥¹¥È¤ÎÊ¿¶Ñ¤ò¤È¤Ã¤Æ¤ª¤¤Þ¤¹¡£
ʸ»ú¿ô | Ê¿¶Ñ¥³¥¹¥È |
---|---|
1 | 8998 |
2 | 8242 |
3 | 8339 |
4 | 7989 |
5 | 6947 |
... | ... |
10 | 5038 |
... | ... |
¤³¤Î¥Æ¡¼¥Ö¥ë¤ò¤¢¤é¤«¤¸¤á¤Ä¤¯¤Ã¤Æ¤ª¤¡¤¿·¤¿¤Êñ¸ì¤òÅÐÏ¿¤¹¤ëºÝ¤Ï¡¤Æ±¤¸ÉÊ»ì&Ʊ¤¸Ä¹¤µ¤Î´û¸¤Î·ÁÂÖÁǤÎÊ¿¶ÑÃͤò¤¢¤Æ¤Ï¤á¤ë¤è¤¦¤Ë¤¹¤ë¤ï¤±¤Ç¤¹¡£"É¡¥»¥ì¥Ö"¤Î¾ì¹ç¤Ï4ʸ»ú¤Ê¤Î¤ÇÀ¸µ¯¥³¥¹¥È¤È¤·¤Æ7989¤òºÎÍѤ¹¤ë¤³¤È¤Ë¤Ê¤ê¤Þ¤¹¡£¤Þ¤¢¡¤Â绨ÇĤǤϤ¢¤ê¤Þ¤¹¤¬²¿¤â¤·¤Ê¤¤¤è¤ê¤Ï¤À¤¤¤Ö¥Þ¥·¤Ê´¶¤¸¤Ë¤Ê¤ë¤È»×¤¤¤Þ¤¹¡£
mecab-dic-overdrive ¤Î¥³¥¹¥ÈÀ¸À®Êý¼°
mecab-dic-overdrive ¤Ç¤Ï¡¤¤³¤ÎÆó¤Ä¤ÎÊý¼°¤òÁȤ߹ç¤ï¤»¤Æ¥³¥¹¥È·èÄê¤ò¹Ô¤¤¤Þ¤¹¡£¥Ç¥Õ¥©¥ë¥È¤ÎÆ°ºî¤Ï
- Ʊ¤¸ÉÊ»ì&Ʊ¤¸Ä¹¤µ¤Î´û¸ñ¸ì¤ÎÊ¿¶Ñ¥³¥¹¥È (¢¨¾ò·ï¤òËþ¤¿¤¹´û¸ñ¸ì¤¬¸«¤Ä¤«¤é¤Ê¤¤¾ì¹ç¤Ï¤¢¤é¤«¤¸¤á·è¤á¤¿¸ÇÄêÃͤòÍøÍÑ)
- ¾å¤Ç¼¨¤·¤¿¡ÖñÆȤǸ½¤ì¤¿¾ì¹ç¤Ë¤½¤ì°Ê¾åºÙʬ³ä¤µ¤ì¤Ê¤¤¤®¤ê¤®¤ê¤Î¥³¥¹¥È¡×x 0.7
¤Î¡¤¤É¤Á¤é¤«¾®¤µ¤¤Êý¤ò¤È¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤Þ¤¹¡£(¤³¤ÎÆ°ºî¤Ï GenDic.pm ¤Î200¹ÔÌܤ«¤é¤Î¤¢¤¿¤ê¤òÊÔ½¸¤¹¤ì¤Ð¥«¥¹¥¿¥Þ¥¤¥º²Äǽ¤Ç¤¹¡£)
Á°¼Ô¤Î·×»»¤Ë¤Ï¼½ñ¤Î¸µ¤Îcsv¥Õ¥¡¥¤¥ë¡¤¸å¼Ô¤Î·×»»¤Ë¤Ï left-id.def, right-id.def, matrix.def ¤ò»²¾È¤¹¤ë¤¿¤á¡¤mecab-ipadic ¤Î¥½¡¼¥¹¤Î¾ì½ê¤ò config ¤ËÀßÄꤷ¤Æ¤ä¤ëɬÍפ¬¤¢¤ê¤Þ¤¹¡£
mecab-dic-overdrive »ÈÍÑÊýË¡
¼½ñ¤Î¥¤¥ó¥¹¥È¡¼¥ë & ¥æ¡¼¥¶¼½ñºîÀ®
(1) »öÁ°¤ËɬÍפʥ饤¥Ö¥é¥êÅù¤Î½àÈ÷
- ¤¢¤é¤«¤¸¤á mecabËÜÂÎ, ¤ª¤è¤Ó¡¤°Ê²¼¤Îperl¥é¥¤¥Ö¥é¥ê¤ò¥¤¥ó¥¹¥È¡¼¥ë¤·¤Æ¤ª¤¯
- Text::MeCab
- Unicode::Normalize
- Unicode::RecursiveDowngrade
- HTML::Entities
- File::Spec
- Path::Class
- Log::Log4perl
- mecab-dic-overdriveËÜÂΤògithub¤«¤éÆþ¼ê¤¹¤ë
- mecab-ipadic-2.7.0-20070801 ¤ò¥À¥¦¥ó¥í¡¼¥É¤·¡¤²òÅष¤Æ¤ª¤¯¡£(¾¤Î¥Ð¡¼¥¸¥ç¥ó¤Î¾ì¹ç¡¤Á°½Ò¤Î¥Ñ¥Ã¥Á¤ÎÃʳ¬¤Ê¤É¤Ç¤³¤±¤ë²ÄǽÀ¤¬¤¢¤ê¤Þ¤¹)
> git clone https://github.com/nabokov/mecab-dic-overdrive.git > tar -xvzf mecab-ipadic-2.7.0-20070801.tar.gz
(2) config.pl / log.conf ¤ÎÀßÄê
mecab-dic-overdrive/etc/config.pl ¤ÎÆâÍƤò´Ä¶¤Ë¤¢¤ï¤»¤Æ¥«¥¹¥¿¥Þ¥¤¥º¤¹¤ë¡£ºÇÄã¤Ç¤â
- $HOME (mecab-dic-overdrive ¤ò²òÅष¤¿¥Ç¥£¥ì¥¯¥È¥ê)
- $DIC_SRC_DIR (mecab-ipadic-2.7.0-20070801 ¤ò²òÅष¤¿¥Ç¥£¥ì¥¯¥È¥ê)
¤ÏÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£
¤Þ¤¿¡¤¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤òÊѹ¹¤·¤¿¤¤¾ì¹ç¤Ï¾å¤Î¡Ö¼½ñ¤Î¥Î¡¼¥Þ¥é¥¤¥º¡×¤Î¹à¤ò»²¹Í¤Ë default_normalize_opts ¤òÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£
(Îã) default_normalize_opts => [qw(decode_entities strip_html nfkc lc)],
Æ°ºî¥í¥°¤Î½ñ¤½Ð¤·Àè¤òÊѤ¨¤¿¤ê¡¤¥í¥°¥ì¥Ù¥ë¤òÊѤ¨¤¿¤¤¾ì¹ç¤Ï etc/log.conf ¤òÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£
(Îã) log4perl.rootLogger=DEBUG, LOGFILE log4perl.appender.LOGFILE.filename=/path/to/log.txt
(3) utf8²½+¥Î¡¼¥Þ¥é¥¤¥º+¥Ñ¥Ã¥ÁŬÍѤµ¤ì¤¿ mecab-ipadic ¤ÎºîÀ®
>bin/initialize_dic.pl
¤³¤ì¤Ç (1)¼½ñ¤Îutf-8²½ (2)¼½ñ¤Ø¤Î¥Ñ¥Ã¥ÁŬÍÑ (3)¼½ñ¤Î¥Î¡¼¥Þ¥é¥¤¥º (4)¼½ñ¤Î¥³¥ó¥Ñ¥¤¥ë&¥¤¥ó¥¹¥È¡¼¥ë¡¤¤Þ¤Ç¤¬´°Î»¤·¤Þ¤¹¡£
"make install failed" ¤È¸À¤ï¤ì¤Æ¤·¤Þ¤¦¾ì¹ç¡¤¤¢¤ë¤¤¤Ï´û¸¤Î¼½ñ (/usr/local/lib/mecab/dic/ipadic) ¤ò»Ä¤·¤ÆÊ̤ξì½ê¤Ø¥¤¥ó¥¹¥È¡¼¥ë¤·¤¿¤¤¾ì¹ç¤Ï¡¤°Ê²¼¤Î¤è¤¦¤ËÊ̤ξì½ê¤Ø¼êºî¶È¤Ç¼½ñ¤ò¥³¥Ô¡¼¤·¡¤ mecab ¸Æ¤Ó½Ð¤·¤ÎºÝ¤Ë -d ¥ª¥×¥·¥ç¥ó¤ò»È¤Ã¤Æ¼½ñ¥Ç¥£¥ì¥¯¥È¥ê¤ò»ØÄꤹ¤ë¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤¡£
(¼êÆ°¤Ç /usr/local/lib/mecab/dic/ipadic-utf8 ¤Ø¥¤¥ó¥¹¥È¡¼¥ë¤¹¤ë¾ì¹ç¤ÎÎã) >bin/initialize_dic.pl --noinstall >mkdir /usr/local/lib/mecab/dic/ipadic-utf8 >cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/*.bin /usr/local/lib/mecab/dic/ipadic-utf8/ >cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/*.def /usr/local/lib/mecab/dic/ipadic-utf8/ >cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/*.dic /usr/local/lib/mecab/dic/ipadic-utf8/ >cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/dicrc /usr/local/lib/mecab/dic/ipadic-utf8/ (¤³¤Î¤¢¤È etc/config.pl ¤Î dicdir = "/usr/local/lib/mecab/dic/ipadic" ¤ò "/usr/local/lib/mecab/dic/ipadic-utf8" ¤ØÊѹ¹¤¹¤ë) (mecab ¤ò¥³¥Þ¥ó¥É¥é¥¤¥ó¤«¤é»È¤¦¾ì¹ç¤Ï -d ¥ª¥×¥·¥ç¥ó¤ò»ØÄê) >mecab -d /usr/local/lib/mecab/dic/ipadic-utf8/
(4) wikipedia¤Î¥Ç¡¼¥¿¤«¤é¥æ¡¼¥¶¼½ñ¤òºîÀ®¤¹¤ë
ÆüËܸìÈÇwikipedia¤Î¥À¥ó¥×¥µ¥¤¥È¤«¤é jawiki-latest-page.sql.gz ¤òÆþ¼ê¤·¤Æ misc/dic °Ê²¼¤Ë .gz ¤Î¤Þ¤ÞÊݸ¤·¤Þ¤¹¡£( zcat/gzcat ¤¬ÍøÍѤǤ¤Ê¤¤´Ä¶¤Ç¤Ï²òÅष¤Æ¤ª¤¤Þ¤¹¡£¥Õ¥¡¥¤¥ë̾¤äÃÖ¤¾ì½ê¤òÊѤ¨¤¿¤¤¾ì¹ç¤Ï GenDic/WikipediaFile.pm ¤òŬµ¹Êѹ¹¤·¤Æ¤¯¤À¤µ¤¤¡£)
>bin/generate_dic.pl --target=wikipedia_file
¤È¤¹¤ë¤È¡¤SQL¥Õ¥¡¥¤¥ë¤òľÀÜÆɤ߹þ¤ó¤Çµ»ö¥¿¥¤¥È¥ë¤òÃê½Ð¤·¡¤¡Ö¸ÇÍ̾»ì/°ìÈ̡פȤ·¤Æ¥æ¡¼¥¶¼½ñ¥Õ¥¡¥¤¥ë misc/dic/wikipedia.dic ¤Ë½ñ¤½Ð¤·¤Þ¤¹¡£
¢¨SQLʸ¤òľÀܶ¯°ú¤Ë¥Ñ¡¼¥¹¤¹¤ë»ÅÁȤߤΤ¿¤á¡¤º£¸åwikipedia¤Î¥À¥ó¥×»ÅÍͤËÊѹ¹¤¬¤¢¤ë¤ÈÆ°¤«¤Ê¤¯¤Ê¤ë²ÄǽÀ¤â¤¢¤ê¤Þ¤¹¡£¤½¤Î¾ì¹ç¤Ï¤¤¤Ã¤¿¤ó¥Ç¡¼¥¿¤òDB¤ËÆɤ߹þ¤ß¡¤DB¤«¤é½ñ¤½Ð¤·¤ò¹Ô¤¦¤è¤ê³Î¼Â¤ÊÊýË¡( --target=wikipedia_file ¤Î¤«¤ï¤ê¤Ë --target=wikipedia ¤ò»ØÄê) ¤âÍøÍѤǤ¤Þ¤¹¡£¾Ü¤·¤¤ÀßÄêÊýË¡¤Ï GenDic/Wikipedia.pm ¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤¡£
(5) ´éʸ»ú¼½ñ¤«¤é¥æ¡¼¥¶¼½ñ¤òºîÀ®¤¹¤ë (optional)
´éʸ»ú¼½ñÍѤȤ·¤ÆÍÍ¡¹¤Ê¾ì½ê¤ÇÇÛÉÛ¤µ¤ì¤Æ¤¤¤ëtsv¤òÆɤ߹þ¤ó¤Ç¥æ¡¼¥¶¼½ñ¤òºî¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
Æɤ߹þ¤ß¸µ¤Ï misc/dic/kaomoji.tsv ¤Ë¤¢¤ë¤Î¤Ç¡¤Äɲä·¤¿¤¤´éʸ»ú¤¬¤¢¤ë¾ì¹ç¤Ï¤³¤³¤ËÄɵ¤·¤¿¤¢¤È¡¤
>bin/generate_dic.pl --target=kaomoji
¤È¤¹¤ë¤È¡¤³Æ´éʸ»ú¤ò¡Öµ¹æ/°ìÈ̡פȤ·¤Æ misc/dic/kaomoji.dic ¤Ë½ñ¤½Ð¤·¤Þ¤¹¡£
¢¨wikipedia.dic ¤Ë¤¢¤ëµ¹æ·Ï¥¨¥ó¥È¥ê¤è¤êÍ¥ÀèÅÙ¤ò¹â¤¯¤¹¤ë¤¿¤á¤Ë¡¤Àè¤ËºîÀ®¤·¤¿ wikipedia.dic ¤òÆɤ߹þ¤ó¤À mecab ¤ò¤Ä¤«¤Ã¤ÆÀ¸µ¯¥³¥¹¥È·×»»¤ò¤¹¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤Þ¤¹¡£¤½¤Î¤¿¤á¡¤misc/dic/wikipedia.dic ¤¬¤Ê¤¤¤ÈÆ°¤¤Þ¤»¤ó¡£¤³¤Î»ÅÍͤòÊѹ¹¤·¤¿¤¤¾ì¹ç¤Ï GenDic/Kaomoji.pm ¤Î defaults ¥á¥½¥Ã¥É¤òÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£
(6) ¤½¤Î¾¡¤wikipedia¤Ë¤Ê¤¤¸ÇÍ̾»ì¤Ê¤É¤òÄɲ乤ë (optional)
¾åµ°Ê³°¤ËÄɲä·¤¿¤¤Ì¾»ì¤¬¤¢¤ë¾ì¹ç¤Ï misc/dic/simple_list.txt ¤Ë²þ¹Ô¶èÀÚ¤ê¤ÇÎóµó¤·¡¤
>bin/generate_dic.pl --target=simple_list
¤È¤¹¤ë¤È¡¤¤½¤ì¤é¤ò¤¹¤Ù¤Æ¡Ö̾»ì/¸ÇÍ̾»ì/°ìÈ̡פȤ·¤ÆÆɤ߹þ¤ß¡¤misc/dic/simple_list.dic ¤Ë½ñ¤½Ð¤·¤Þ¤¹¡£
ºîÀ®¤·¤¿¼½ñ¤ÎÍøÍÑ
¾åµ¥¹¥Æ¥Ã¥×(4)-(6)¤ÇºîÀ®¤·¤¿¥æ¡¼¥¶¼½ñ¤Ï¡¤mecab ¤Î -u ¥ª¥×¥·¥ç¥ó¤Ç»ØÄꤷ¤ÆÍøÍѤǤ¤Þ¤¹¡£
>mecab -u misc/dic/wikipedia.dic,misc/dic/kaomoji.dic,misc/dic/simple_list.dic
(»ÈÍÑÁ°) > mecab ³³¤Î¾å¤Î¥Ý¥Ë¥ç¥Ë¥³Æ°¤Ç¸«¤¿ ³³ ̾»ì,°ìÈÌ,*,*,*,*,³³,¥¬¥±,¥¬¥± ¤Î ½õ»ì,Ï¢Âβ½,*,*,*,*,¤Î,¥Î,¥Î ¾å ̾»ì,Èó¼«Î©,Éû»ì²Äǽ,*,*,*,¾å,¥¦¥¨,¥¦¥¨ ¤Î ½õ»ì,Ï¢Âβ½,*,*,*,*,¤Î,¥Î,¥Î ¥Ý¥Ë¥ç¥Ë¥³ ̾»ì,°ìÈÌ,*,*,*,*,* Æ° ̾»ì,°ìÈÌ,*,*,*,*,Æ°,¥É¥¦,¥É¡¼ ¤Ç ½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*,¤Ç,¥Ç,¥Ç ¸« Æ°»ì,¼«Î©,*,*,°ìÃÊ,Ï¢ÍÑ·Á,¸«¤ë,¥ß,¥ß ¤¿ ½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿ EOS (»ÈÍѸå) >mecab -u misc/dic/wikipedia.dic ³³¤Î¾å¤Î¥Ý¥Ë¥ç¥Ë¥³Æ°¤Ç¸«¤¿ ³³¤Î¾å¤Î¥Ý¥Ë¥ç ̾»ì,¸ÇÍ̾»ì,°ìÈÌ,*,*,*,³³¤Î¾å¤Î¥Ý¥Ë¥ç,Wikipedia:1070057 ¥Ë¥³Æ° ̾»ì,¸ÇÍ̾»ì,°ìÈÌ,*,*,*,¥Ë¥³Æ°,Wikipedia:1347271 ¤Ç ½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*,¤Ç,¥Ç,¥Ç ¸« Æ°»ì,¼«Î©,*,*,°ìÃÊ,Ï¢ÍÑ·Á,¸«¤ë,¥ß,¥ß ¤¿ ½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿ EOS
utf-8ÈÇ ipadic ¤ò¥Ç¥Õ¥©¥ë¥È¤Î¾ì½ê¤È¤Ï°ã¤¦¾ì½ê¤Ë½ñ¤½Ð¤·¤¿¾ì¹ç¤Ï¡¤-d ¥ª¥×¥·¥ç¥ó¤â»ØÄꤷ¤Þ¤¹¡£
>mecab -d /usr/local/lib/mecab/dic/ipadic-utf8
¥Î¡¼¥Þ¥é¥¤¥¶¤ÎÍøÍÑ
Á°½Ò¤Î¤è¤¦¤Ë¡¤²òÀÏ»þ¤Ë¤Ï¼½ñºîÀ®»þ¤ÈƱ¤¸¥Î¡¼¥Þ¥é¥¤¥¶¤òÄ̤µ¤Ê¤¤¤È°ÕÌ£¤¬¤Ê¤¤¤Î¤Ç¡¤ºîÀ®¤·¤¿¼½ñ¤ò»È¤¦¾ì¹ç¤Ë¤Ï°Ê²¼¤Î¤è¤¦¤Ë MecabTrainer::NormalizeText ¤Î¥¤¥ó¥¹¥¿¥ó¥¹¤òÄ̤¹¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤¡£
use Encode; use MecabTrainer::NormalizeText; new $normalizer = MecabTrainer::NormalizeText->new( [decode_entities strip_single_nl nfkc lc] ); $normalized_decoded_text = $normalizer->normalize( Encode::decode('utf8', $raw_input_text) )
»ÈÍÑÊýË¡¤Ë¤Ä¤¤¤Æ¤Ï bin/normalize_text.pl ¤Î¥½¡¼¥¹¤Ê¤É¤â»²¾È¡£
¥³¥Þ¥ó¥É¥é¥¤¥ó¤Ç»È¤¦¾ì¹ç¤Ë¤Ï bin/normalize_text.pl ¤ò¥Ñ¥¤¥×¤Ç¤«¤Þ¤»¤ÆÍøÍѤ¹¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
»ÈÍÑÎã¤È²òÀÏ·ë²Ì¤ÎÎã¤ò°Ê²¼¤Ë¤¤¤¯¤Ä¤«ºÜ¤»¤Æ¤ª¤¤Þ¤¹¡£
> bin/normalize_text.pl | mecab -d /usr/local/lib/mecab/dic/ipadic-utf8/ -u misc/dic/wikipedia.dic,misc/dic/kaomoji.dic ¤Ò¤Þ¤Ê¤¦(¡¡¦¦Ø¡¦¡®) ^D ¤Ò¤Þ ̾»ì,°ìÈÌ,*,*,*,*,¤Ò¤Þ,¥Ò¥Þ,¥Ò¥Þ ¤Ê¤¦ ½õ»ì,½ª½õ»ì,*,*,*,*,¤Ê¤¦,¥Ê¥¦,¥Ê¥¦ ( ́¡¦¦Ø¡¦`) ̾»ì,¸ÇÍ̾»ì,°ìÈÌ,*,*,*,( ́¡¦¦Ø¡¦`),Wikipedia:700982 EOS µ×¡¹¹¹¿·¡Á ¤ªÊ¢¤Ø¤Ã¤¿¤ç ^D µ×¡¹ ̾»ì,°ìÈÌ,*,*,*,*,µ×¡¹,¥Ò¥µ¥Ó¥µ,¥Ò¥µ¥Ó¥µ ¹¹¿· ̾»ì,¥µÊÑÀܳ,*,*,*,*,¹¹¿·,¥³¥¦¥·¥ó,¥³¡¼¥·¥ó ¡¼ µ¹æ,°ìÈÌ,*,*,*,*,¨¡,¨¡,¨¡ ¤ªÊ¢ ̾»ì,°ìÈÌ,*,*,*,*,¤ªÊ¢,¥ª¥Ê¥«,¥ª¥Ê¥« ¤Ø¤Ã Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥é¹Ô,Ï¢ÍÑ¥¿Àܳ,¤Ø¤ë,¥Ø¥Ã,¥Ø¥Ã ¤¿ ½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿ ¤ç ½õ»ì,½ª½õ»ì,*,*,*,*,¤è,¤è,¤è Ä«¤«¤é¥Æ¥´¥Þ¥¹¤Î¤¢¤¤CM¤ä¤Ã¤Æ¤¿ ^D Ä« ̾»ì,Éû»ì²Äǽ,*,*,*,*,Ä«,¥¢¥µ,¥¢¥µ ¤«¤é ½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*,¤«¤é,¥«¥é,¥«¥é ¥Æ¥´¥Þ¥¹¤Î¤¢¤¤ ̾»ì,¸ÇÍ̾»ì,°ìÈÌ,*,*,*,¥Æ¥´¥Þ¥¹¤Î¤¢¤¤,Wikipedia:2035668 cm ̾»ì,°ìÈÌ,*,*,*,*,£Ã£Í,¥·¡¼¥¨¥à,¥·¡¼¥¨¥à ¤ä¤Ã Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥é¹Ô,Ï¢ÍÑ¥¿Àܳ,¤ä¤ë,¥ä¥Ã,¥ä¥Ã ¤Æ Æ°»ì,Èó¼«Î©,*,*,°ìÃÊ,Ï¢ÍÑ·Á,¤Æ¤ë,¥Æ,¥Æ ¤¿ ½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿ EOS ( ߢÏß)±ÊÊȬȬɡ³É¡³É¡³É ¡À / ¡À/ ¡À ^D ( ゚¢Ï゚)¥¢¥Ï¥ÏȬȬ¥Î¡³¥Î¡³¥Î¡³¥Î / / µ¹æ,°ìÈÌ,*,*,*,*,( ゚¢Ï゚)¥¢¥Ï¥ÏȬȬ¥Î¡³¥Î¡³¥Î¡³¥Î / / \n
¥«¥¹¥¿¥à¤ÎÆɤ߹þ¤ß¥¯¥é¥¹¤ÎºîÀ®
GenDic/ °Ê²¼¤Ë¥µ¥Ö¥¯¥é¥¹¤òºîÀ®¤¹¤ë¤³¤È¤Ç¡¤Ç¤°Õ¤ÎÆþÎϤ«¤é¥æ¡¼¥¶¼½ñ¤ò¤Ä¤¯¤ë¤³¤È¤¬¤Ç¤¤Þ¤¹¡£MecabTrainer::GenDic ¥¯¥é¥¹¤ò·Ñ¾µ¤·¤Æ
- ÆþÎÏ¥¹¥È¥ê¡¼¥à¤Î³«¤Êý
- ÆþÎϤò°ì¹Ô¤º¤ÄÆɤߡ¤¥Ñ¡¼¥¹¤¹¤ëÊýË¡
- ÆɤߤȤä¿Ã±¸ì¤Ë¤É¤ó¤ÊÉʻ졤features¤ò³ä¤êÅö¤Æ¤ë¤«
¤òµ½Ò¤·¤Æ¤ª¤±¤Ð¡¤À¸µ¯¥³¥¹¥È¤Î·×»»¤ä¼½ñ¤Î¥³¥ó¥Ñ¥¤¥ë¤Ï¿Æ¥¯¥é¥¹¤¬¤¹¤Ù¤Æ¸ªÂå¤ï¤ê¤·¤Æ¤¯¤ì¤ë»ÅÁȤߤǤ¹¡£¾Ü¤·¤¯¤Ï GenDic ¥Ç¥£¥ì¥¯¥È¥ê°Ê²¼¤Î³Æ¥½¡¼¥¹¤ò»²¾È¤·¤Æ²¼¤µ¤¤¡£
generate_dic.pl ¤Î --target ¥ª¥×¥·¥ç¥ó¤Ç¥µ¥Ö¥¯¥é¥¹Ì¾¤ò»ØÄê (CamelCase¤ò¾®Ê¸»ú+"_"¤ËÃÖ¤´¹¤¨) ¤¹¤ë¤³¤È¤Ç¡¤ºîÀ®¤·¤¿¥µ¥Ö¥¯¥é¥¹¤ò¸Æ¤Ó½Ð¤¹¤³¤È¤¬¤Ç¤¤Þ¤¹¡£
(¥µ¥Ö¥¯¥é¥¹ TestClass.pm ¤ò»ØÄꤹ¤ë¾ì¹ç¤ÎÎã) >bin/generate_dic.pl --target=test_class