wikipedia¤Î¥Ç¡¼¥¿¤ä´éʸ»ú¼­½ñ¤«¤émecab¤Î¥æ¡¼¥¶¼­½ñ¤òºîÀ®¤¹¤ë¥Õ¥ì¡¼¥à¥ï¡¼¥¯

¥«¥Æ¥´¥ê
¥Ö¥Ã¥¯¥Þ¡¼¥¯¿ô
¤³¤Î¥¨¥ó¥È¥ê¡¼¤ò´Þ¤à¤Ï¤Æ¤Ê¥Ö¥Ã¥¯¥Þ¡¼¥¯ ¤Ï¤Æ¤Ê¥Ö¥Ã¥¯¥Þ¡¼¥¯ - wikipedia¤Î¥Ç¡¼¥¿¤ä´éʸ»ú¼­½ñ¤«¤émecab¤Î¥æ¡¼¥¶¼­½ñ¤òºîÀ®¤¹¤ë¥Õ¥ì¡¼¥à¥ï¡¼¥¯
¤³¤Î¥¨¥ó¥È¥ê¡¼¤ò¤Ï¤Æ¤Ê¥Ö¥Ã¥¯¥Þ¡¼¥¯¤ËÄɲÃ

ÆÍÁ³¤Ç¤¹¤¬¡¤mecab¤Î¼­½ñ (mecab-ipadic) ¤ò¥Ç¥Õ¥©¥ë¥È¤Î¤Þ¤Þ»È¤Ã¤Æ¡¤mecab°Õ³°¤È»È¤¨¤Í¤§¤È¤«Ê¸¶ç¸À¤Ã¤Æ¤ë°­¤¤»Ò¤Ï¤ª¤é¤ó¤«¤Í¡©

mecab-ipadic ¤ÏÈæ³ÓŪ¤ª¹Ôµ·¤Î¤è¤¤ÆüËܸì¤ò¥Ù¡¼¥¹¤Ëºî¤é¤ì¤Æ¤¤¤ë¤Î¤Ç¡¤¤½¤Î¤Þ¤Þ¤Ç¤Ï web¾å¤Î¸ý¸ìʸÂΤΥƥ­¥¹¥È¤Ï¤¦¤Þ¤¯°·¤¨¤Ê¤¤¤³¤È¤¬¤¢¤ê¤Þ¤¹¡£ËÜÍè¤Ï¶µ»Õ¥Ç¡¼¥¿¤òÍÑ°Õ¤·¡¤³Ø½¬¤µ¤»¤ë¤È¤¤¤Ã¤¿¼êË¡¤ò»È¤¦¤Î¤¬Àµ¹¶Ë¡¤À¤È»×¤¤¤Þ¤¹¤¬¡¤¤È¤ê¤¢¤¨¤ºÌ¾»ì¤ò½¼¼Â¤µ¤»¤ë¤À¤±¤Ç¤â¼ÂÍÑÅ٤ϤÀ¤¤¤Ö¾å¤¬¤ë¤Ç¤·¤ç¤¦¡£

¿Í´Ö¤ÎÏ乸À¸ì¤Ë¤Ï¡¤Æ°»ì¤Î¸ì´´¤ä̾»ì¤Ë¤ÏÆü¡¹¿·¤·¤¯¸ì×ä¬Áý¤¨¤ë¤±¤É¡¤½õ»ì¤ä³èÍѤΥ롼¥ë¤Ï´Êñ¤Ë¤ÏÊѲ½¤·¤Ê¤¤¡¤¤È¤¤¤¦ÆÃÀ­¤¬¤¢¤ê¤Þ¤¹¡£Æäˡ֤¤¤ÞºÇ¤â¤Ä¤Ö¤ä¤«¤ì¤Æ¤¤¤ëñ¸ì¥é¥ó¥­¥ó¥°¡×¤È¤¤¤Ã¤¿½¸·×¤ò¤¹¤ë¤è¤¦¤Ê¾ì¹ç¤Ï¡¤Ì¾»ì¤ÎÈϰϤÎÀÚ¤ê½Ð¤·¤µ¤¨´Ö°ã¤¨¤Ê¤±¤ì¤Ð¤½¤ì¤Ê¤ê¤Î·ë²Ì¤ò½Ð¤»¤ë¤³¤È¤â¿¤¤¤Î¤Ç¤¹¡£

¤¿¤À¡¤¼­½ñ¤Ø¤Îñ¸ìÄɲäϤ³¤³¤Ë¤¢¤ëÄ̤ê´Êñ¤Ë¤Ç¤­¤ë¤Î¤Ç¤¹¤¬¡¤Ã±¸ì¤ÎÀ¸µ¯¥³¥¹¥È¤ò·è¤á¤ëÉôʬ¤Çíµ¤¤¤Æ¤·¤Þ¤¦¤³¤È¤â¿¤¤¤È»×¤¤¤Þ¤¹¡£

¤½¤³¤Ç¡¤¤¦¤Á¤Ç°ÊÁ°¤«¤é»È¤Ã¤Æ¤¤¤¿ mecab ¤Î¼­½ñÁý¶¯ÍѤΥե졼¥à¥ï¡¼¥¯¤ò¸ø³«¤¹¤ë¤³¤È¤Ë¤·¤Þ¤·¤¿¡£wikipedia ¤Î¥Ç¡¼¥¿¤ä´éʸ»ú¼­½ñ¤Ê¤É¤«¤é¥æ¡¼¥¶¼­½ñ¤òºîÀ®¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£

mecab-dic-overdrive

https://github.com/nabokov/mecab-dic-overdrive

GenDic.pm ¤Î¥µ¥Ö¥¯¥é¥¹¤òºîÀ®¤¹¤ë¤³¤È¤Ç¡¤¤µ¤Þ¤¶¤Þ¤Ê·Á¼°¤ÎÆþÎϥǡ¼¥¿¤«¤éñ¸ì¤òÆɤ߼è¤ê¡¤(¤½¤ì¤Ê¤ê¤Ë)ŬÀÚ¤ÊÀ¸µ¯¥³¥¹¥È¤ò¼«Æ°Åª¤Ë¿ä¬¤·¤Æ¥æ¡¼¥¶¼­½ñ¥Õ¥¡¥¤¥ë¤òÀ¸À®¤·¤Æ¤¯¤ì¤ë»ÅÁȤߤˤʤäƤ¤¤Þ¤¹¡£¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï wikipedia ÆüËܸìÈǤΠjawiki-latest-page.sql.gz ¤È´éʸ»ú¼­½ñÍѤÎtsv¤È¤«¤é¡¤¤½¤ì¤¾¤ì¥æ¡¼¥¶¼­½ñ¤òºîÀ®¤¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£

»÷¤¿¤è¤¦¤Ê¥¹¥¯¥ê¥×¥È¤äµ­»ö¤¬¤¹¤Ç¤Ë¤¤¤¯¤Ä¤«¸ø³«¤µ¤ì¤Æ¤¤¤ë¤Î¤Ç¤¢¤¨¤Æ¸ø³«¤¹¤ë¤³¤È¤â¤Ê¤¤¤«¤Ê¤È»×¤Ã¤Æ¤¤¤¿¤Î¤Ç¤¹¤¬¡¤¸å¤Ç½Ò¤Ù¤ë¤è¤¦¤Ë¡¤À¸µ¯¥³¥¹¥È¤Î·×»»ÊýË¡¤ä¡¤¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤Þ¤Ç´Þ¤á¤¿¼­½ñ´ÉÍý¤Ë¿¾¯¤ÎÆȼ«À­¤È¤¤¤¦¤«¸ø³«¤¹¤ë°ÕµÁ¤¬¤¢¤ëµ¤¤¬¤·¤Þ¤·¤¿¤Î¤Ç¡£²¿¤«¤Î»²¹Í¤Ë¤Ê¤ì¤Ð¹¬¤¤¤Ç¤¹¡£

mecab-dic-overdrive¤Îµ¡Ç½

¼­½ñ¤Îutf-8²½

mecab¤ò»È¤¦¤Î¤Ëipadic¼«ÂΤòutf-8²½¤¹¤ëɬÍפÏɬ¤º¤·¤â¤Ê¤¤¤Î¤Ç¤¹¤¬¡¤¼¡¤Ë½Ò¤Ù¤ë¼­½ñ¥Ñ¥Ã¥Á¤òºî¤ë¾ì¹ç¤ä¡¤³Æ¼ï¥×¥í¥°¥é¥à¤«¤é»²¾È¤¹¤ë¾ì¹ç¤Ê¤É¤Ë¤Ï utf-8 ¤ÎÊý¤¬ÊØÍø¤Ê¤Î¤Ç¡¤ºÇ½é¤Ëʸ»ú¥³¡¼¥É¤ÎÊÑ´¹¤ò¤·¤Þ¤¹¡£

¼­½ñ¤Ø¤Î¥Ñ¥Ã¥ÁŬÍÑ

misc/dic/*.patch ¤Ë¡¤ipadic ¤ËÂФ¹¤ë¥Ñ¥Ã¥Á¤¬¤¤¤¯¤Ä¤«ÍÑ°Õ¤·¤Æ¤¢¤ê¤Þ¤¹¡£"A" "B" ¤Ê¤É¤Î±Ñ¿ô»ú¤¬Ã±ÆȤÇÀÚ¤ê½Ð¤µ¤ì¤Ë¤¯¤¯¤Ê¤ë¤¿¤á¤ÎÊѹ¹¤ä¡¤"¤î" "¤ç" ¤Ê¤É¤¬½õ»ì¤È¤·¤Æǧ¼±¤µ¤ì¤ë¤è¤¦¤Ë¤Ê¤ë¤¿¤á¤Î¥Ñ¥Ã¥Á¤¬´Þ¤Þ¤ì¤Þ¤¹¡£¤³¤Î¾¤Ë¤â¼«Á°¤Ç²¿¤«Êѹ¹¤ò²Ã¤¨¤¿¤¤¾ì¹ç¤Ï *.patch ¥Õ¥¡¥¤¥ë¤ò (utf-8¤Ç) ½ñ¤¤¤Æ¤³¤³¤ËÃÖ¤¤¤Æ¤ª¤¯¤È¼«Æ°Åª¤ËŬÍѤµ¤ì¤Þ¤¹¡£

¼­½ñ¤Î¥Î¡¼¥Þ¥é¥¤¥º

¼­½ñ¤òÍ­¸ú³èÍѤ¹¤ë¤¿¤á¤Ë¤Ï¡¤

¤Ê¤É¡¤¤µ¤Þ¤¶¤Þ¤Ê¼êË¡¤ò¶î»È¤·¤Æɽ¸½Íɤì¤òµÛ¼ý¤·¤Æ¤ª¤¯É¬Íפ¬¤¢¤ê¤Þ¤¹¡£¼­½ñºîÀ®»þ¤Èʸ¾Ï²òÀÏ»þ¤ÎξÊý¤ÇƱ¤¸¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤òŬÍѤ¹¤ë¤Î¤â½ÅÍפÊÃí°ÕÅÀ¤Ç¤¹¡£

¥Ç¥Õ¥©¥ë¥È¤Ç¤Ï°Ê²¼¤Î¥Î¡¼¥Þ¥é¥¤¥º½èÍý¤¬¤³¤ÎÄ̤ê¤Î½ç¤ÇŬÍѤµ¤ì¤Þ¤¹¡£NFKC¤Èlc°Ê³°¤Ï¥Ð¥Ã¥É¥Î¥¦¥Ï¥¦¤Î²ô¤Ç¤¹¡£²þ¹Ô¤Î°·¤¤¤Ê¤É¤Ï¼­½ñºîÀ®»þ¤Ë¤Ï̵³²¤Ç¤¹¤¬¡¤Æä˴éʸ»ú¤äµ­¹æ¤ò´Þ¤à¥Æ¥­¥¹¥È¤ËÂ礭¤¯±Æ¶Á¤¹¤ëÀßÄê¤â´Þ¤Þ¤ì¤ë¤Î¤Ç¡¤É¬¤º¡¤²òÀÏ»þ¤Ë»È¤¦Àµµ¬²½¤ÈƱ¤¸¤â¤Î¤òÀßÄꤹ¤ë¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤¡£

  1. decode_entities : HTML¥¨¥ó¥Æ¥£¥Æ¥£¤ò¥æ¥Ë¥³¡¼¥Éʸ»ú¤Ë¥Ç¥³¡¼¥É [ ♥ ¢ª ♥ ]
  2. strip_single_nl : ñÆȤβþ¹Ô¤ò½üµî (Æó¤Ä°Ê¾åϢ³¤¹¤ë²þ¹Ô¤Ï¶èÀÚ¤ê¤È¸«¤Ê¤¹)
  3. wavetilde2long : ÇÈ¥À¥Ã¥·¥å¤òĹ²»µ­¹æ¤ËÃÖ¤­´¹¤¨¤ë [ ¥×¡Á ¢ª ¥×¡¼ ]
  4. fullminus2long : Á´³Ñ¥Þ¥¤¥Ê¥¹µ­¹æ¤òĹ²»µ­¹æ¤ËÃÖ¤­´¹¤¨¤ë [ ¥×¡Ý ¢ª ¥×¡¼ ]
  5. dashes2long : ¥À¥Ã¥·¥åÁ´È̤òĹ²»µ­¹æ¤ËÃÖ¤­´¹¤¨¤ë [ ¥×— ¢ª ¥×¡¼ ]
  6. drawing_lines2long : ·ÓÀþ¤Ë»È¤ï¤ì¤ë²£Àþ¤Ê¤É¤òĹ²»µ­¹æ¤ËÃÖ¤­´¹¤¨¤ë (»²¹Í:[1] [2]) [ ¥×¨¡ ¢ª ¥×¡¼ ]
  7. unify_long_repeats : Ϣ³¤¹¤ëĹ²»µ­¹æ¤òĹ²»µ­¹æ°ì¸Ä¤ËÃÖ¤­´¹¤¨¤ë [ ¥×¡¼¡¼¡¼ ¢ª ¥×¡¼ ]
  8. nfkc : NFKCÀµµ¬²½ [ ¥Õ¡¬ŽÌŽÞ¢ª ¥×¥× ]
  9. lc : ¥¢¥ë¥Õ¥¡¥Ù¥Ã¥È¤ò¾®Ê¸»ú¤ËÅý°ì [ ABC ¢ª abc ]

Êѹ¹¤·¤¿¤¤¾ì¹ç¤Ï lib/MecabTrainer/NormalizeText.pm ¤ò»²¾È¤Î¾å¡¤etc/config.pl ¤ÎÆâÍƤòÊÔ½¸¤·¤Þ¤¹¡£bin/normalize_text.pl ¤ò»È¤Ã¤Æ¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤Î·ë²Ì¤ò³Îǧ¤¹¤ë¤³¤È¤â¤Ç¤­¤Þ¤¹¡£

>bin/normalize_text.pl
Ž·ŽÀ¨¬¨¬¨¬¨¬¨¬¨¬(Žß¢ÏŽß)¨¬¨¬¨¬¨¬¨¬¨¬ !!!!!
¥­¥¿¡¼(゚¢Ï゚)¡¼ !!!!!

>bin/normalize_text.pl --normalize_opts=decode_entities,nfkc
㍖ ½
¥ì¥ó¥È¥²¥ó 1⁄2

ñ¸ìÀ¸µ¯¥³¥¹¥È¤Î¼«Æ°³ä¤êÅö¤Æ

¿·¤·¤¯Ã±¸ì¤òÅÐÏ¿¤¹¤ë¾ì¹ç¤ËÌäÂê¤Ë¤Ê¤ë¤Î¤¬¡¤¾å¤Ç½Ò¤Ù¤¿Ã±¸ìÀ¸µ¯¥³¥¹¥È¤Î»»½Ð¤Ç¤¹¡£¤³¤³¤Ç"É¡¥»¥ì¥Ö" ¤È¤¤¤¦¾¦ÉÊ̾¤òÎã¤Ë¡¤Ã±¸ìÀ¸µ¯¥³¥¹¥È¤ÎÄ´À°¤Î¤·¤«¤¿¤ò¹Í¤¨¤Æ¤ß¤Þ¤·¤ç¤¦¡£

É¡¥»¥ì¥Ö¥¿¥ï¡¼
É¡¥»¥ì¥Ö(¥¦¥µ¥®¸ÂÄê)¤Ð¤«¤êÇã¤Ã¤Æ¤ë¿Í¤ÎÎã

ñ¸ì¤¬Ã±ÂΤǸ½¤ì¤¿¾ì¹ç¤Ë¡¤Ê¬³ä¤µ¤ì¤Ê¤¤¤®¤ê¤®¤ê¤Î¥é¥¤¥ó¤òµá¤á¤ëÊýË¡

ÁǤμ­½ñ¤Ç"É¡¥»¥ì¥Ö"¤À¤±¤«¤é¤Ê¤ëʸ¤ò mecab ¤Ç²òÀϤ¹¤ë¤È°Ê²¼¤Î¤è¤¦¤Ë¡ÖÉ¡¡×¤È¡Ö¥»¥ì¥Ö¡×¤¬ÊÌ¡¹¤Îñ¸ì¤È¤·¤Æǧ¼±¤µ¤ì¤Æ¤·¤Þ¤¤¤Þ¤¹¡£

·ÁÂÖÁÇ Ï¢ÀÜ¥³¥¹¥È ñ¸ìÀ¸µ¯¥³¥¹¥È ÎßÀÑ¥³¥¹¥È
BOS - 0 0
-283 - -283
É¡(̾»ì/°ìÈÌ) - 6033 5750
62 - 5812
¥»¥ì¥Ö(̾»ì/°ìÈÌ) - 9461 15273
-573 - 14700
EOS - 0 14700

(BOS¤ÏʸƬ¡¤EOS¤Ïʸ¤Î½ª¤ï¤ê¤òɽ¤·¤Þ¤¹¡£)

¤½¤³¤Ç¡¤Ã±¸ì¡ÖÉ¡¥»¥ì¥Ö¡×¤¬Ã±ÂΤÎʸ¾Ï¤È¤·¤Æ¸½¤ì¤¿¾ì¹ç¤Ë¡¤¤½¤ì°Ê¾åʬ³ä¤µ¤ì¤Ê¤¤¤è¤¦¤Ë¤¹¤ë¤³¤È¤òÌÜɸ¤È¤·¤Æ¤ß¤Þ¤¹¡£

¤Þ¤º¼­½ñ¤Ë·ÁÂÖÁÇ¡ÖÉ¡¥»¥ì¥Ö(¸Çͭ̾»ì/°ìÈÌ)¡×¤òÄɲä·¤Þ¤¹¡£¤½¤·¤Æ¡¤mecab ¤¬¡Ö¡ØÉ¡+¥»¥ì¥Ö¡Ù¤Ëʬ²ò¤¹¤ë¤è¤ê¡ØÉ¡¥»¥ì¥Ö¡ÙñÂΤȤ·¤¿Êý¤¬¥È¡¼¥¿¥ë¥³¥¹¥È¤¬Ä㤤¡×¤ÈȽÃǤ¹¤ë¤è¤¦¤Ëñ¸ìÀ¸µ¯¥³¥¹¥È¤òÄ´À᤹¤ë¤³¤È¤ò¹Í¤¨¤Þ¤¹¡£

¤Ä¤Þ¤ê¡¤

·ÁÂÖÁÇ Ï¢ÀÜ¥³¥¹¥È ñ¸ìÀ¸µ¯¥³¥¹¥È ÎßÀÑ¥³¥¹¥È
BOS - 0 0
-310 - -310
É¡¥»¥ì¥Ö(¸Çͭ̾»ì/°ìÈÌ) - *1 *
-919 - *
EOS - 0 *2

¾åɽ¤Î *1 ¤ò²¿¤Ë¤¹¤ì¤Ð *2 ¤¬ 14700 °Ê²¼¤Ë¤Ê¤ë¤«¡© ¤È¤¤¤¦·êËä¤áÌäÂê¤ò²ò¤¯¤³¤È¤Ë¤Ê¤ë¤ï¤±¤Ç¤¹¡£¤³¤Î¾ì¹ç¤Ï *1 ¤ò 15928 °Ê²¼¤Ë¤¹¤ì¤Ð¡¤Á´ÂΤΥ³¥¹¥È¤¬¡ÖÉ¡+¥»¥ì¥Ö¡×¤Î14700¤è¤ê¤âÄ㤯¤Ê¤ê¤Þ¤¹¡£

BlogPaint

¢¨1¡ÖÌÀÆü¤ÎÉ¡¥»¥ì¥Öº×¤ê¤ÏÃæ»ß¤Ç¤¹¡×¤Î¤è¤¦¤ËÁ°¸å¤Ë¾¤Î·ÁÂÖÁǤ¬¤Ä¤Ê¤¬¤ë¾ì¹ç¤Ï¡¤Á°¸å¤ÎÏ¢ÀÜ¥³¥¹¥È¤¬ÊѤï¤Ã¤Æ¤­¤Þ¤¹¡£¡ÖñÂΤÎʸ¾Ï¤È¤·¤Æ(BOS¤ÈEOS¤Î´Ö¤Ë)¸½¤ì¤¿¾ì¹ç¤Ëʬ³ä¤µ¤ì¤Ê¤¤¤è¤¦¤Ë¤¹¤ë¡×¤È¤¤¤¦¥ë¡¼¥ë¤Ï¤¢¤¯¤Þ¤Ç¤â×ó°ÕŪ¤Ê´ð½à¤Ë¤¹¤®¤Þ¤»¤ó¡£

¢¨2 ¤È¤­¤É¤­¤³¤³¤Ë¤¢¤ëAuto Link¤ÎÎã¤Ë½¾¤Ã¤Æ¡¤cost = (int)max(-36000, -400 * (length^1.5)) ¤È¤¤¤¦¼°¤ò¤½¤Î¤Þ¤Þ»È¤Ã¤Æ¤¤¤ëµ­»ö¤ò¸«¤«¤±¤Þ¤¹¤¬¡¤¤³¤Î¼°¤Ï¤¢¤¯¤Þ¤Ç¤³¤Î¼­½ñ¤À¤±¤ò»È¤Ã¤Æ mecab ¤ò AutoLink ÀìÍѤËÍѤ¤¤ë¾ì¹ç ¤òÁÛÄꤷ¤Æ½ñ¤«¤ì¤¿¤â¤Î¤Ç¡¤¤³¤ì¤ò ipadic ¤Èº®¤¼¤ë¤È´ð½àÃͤ¬¹ç¤ï¤Ê¤¯¤Ê¤ë¤È»×¤¤¤Þ¤¹¡£ipadic¤Ë¤¢¤ëÀ¸µ¯¥³¥¹¥È¤Ï»Í·å¤°¤é¤¤¤Þ¤Ç¤ÎÀµ¤Î¿ô¤Ç¤¹¤¬¡¤¤³¤Î¼°¤À¤È¥³¥¹¥È¤¬¥Þ¥¤¥Ê¥¹¤Ë¤Ê¤ë¤Î¤Ç¡¤Ê¸Ì®¤Ë´Ø¤ï¤é¤º¤Û¤Ü¾ï¤Ë¥æ¡¼¥¶¼­½ñ¤Î¥¨¥ó¥È¥ê¤¬Í¥À褵¤ì¤ë¤Ç¤·¤ç¤¦¡£(ÌÞÏÀ¤½¤¦¤¤¤¦°Õ¿Þ¤Ê¤é¤½¤ì¤Ç¹½¤ï¤Ê¤¤¤Î¤Ç¤¹¤¬¡£)

´û¸¼­½ñ¤«¤é¡¤Æ±¤¸ÉÊ»ì&Ʊ¤¸Ä¹¤µ¤Î·ÁÂÖÁǤÎÊ¿¶Ñ¥³¥¹¥È¤ò·×»»¤·¤Æ¤ª¤¯ÊýË¡

¾å¤È¤ÏÊ̤ˡ¤¤â¤¦¾¯¤·Ã±½ã¤ËÀ¸µ¯¥³¥¹¥È¤ÎÌܰ¤òÆÀ¤ëÊýË¡¤â¤¢¤ê¤Þ¤¹¡£

Î㤨¤Ð´û¸¤Îipadic¤ÎÃ椫¤é¡Ö¸Çͭ̾»ì/°ìÈ̡פÎñ¸ì¤À¤±¤ò¼è¤ê½Ð¤·¡¤Ã±¸ì¤ÎŤµ¤´¤È¤ËÀ¸µ¯¥³¥¹¥È¤ÎÊ¿¶Ñ¤ò¤È¤Ã¤Æ¤ª¤­¤Þ¤¹¡£

ʸ»ú¿ô Ê¿¶Ñ¥³¥¹¥È
1 8998
2 8242
3 8339
4 7989
5 6947
... ...
10 5038
... ...

¤³¤Î¥Æ¡¼¥Ö¥ë¤ò¤¢¤é¤«¤¸¤á¤Ä¤¯¤Ã¤Æ¤ª¤­¡¤¿·¤¿¤Êñ¸ì¤òÅÐÏ¿¤¹¤ëºÝ¤Ï¡¤Æ±¤¸ÉÊ»ì&Ʊ¤¸Ä¹¤µ¤Î´û¸¤Î·ÁÂÖÁǤÎÊ¿¶ÑÃͤò¤¢¤Æ¤Ï¤á¤ë¤è¤¦¤Ë¤¹¤ë¤ï¤±¤Ç¤¹¡£"É¡¥»¥ì¥Ö"¤Î¾ì¹ç¤Ï4ʸ»ú¤Ê¤Î¤ÇÀ¸µ¯¥³¥¹¥È¤È¤·¤Æ7989¤òºÎÍѤ¹¤ë¤³¤È¤Ë¤Ê¤ê¤Þ¤¹¡£¤Þ¤¢¡¤Â绨ÇĤǤϤ¢¤ê¤Þ¤¹¤¬²¿¤â¤·¤Ê¤¤¤è¤ê¤Ï¤À¤¤¤Ö¥Þ¥·¤Ê´¶¤¸¤Ë¤Ê¤ë¤È»×¤¤¤Þ¤¹¡£

mecab-dic-overdrive ¤Î¥³¥¹¥ÈÀ¸À®Êý¼°

mecab-dic-overdrive ¤Ç¤Ï¡¤¤³¤ÎÆó¤Ä¤ÎÊý¼°¤òÁȤ߹ç¤ï¤»¤Æ¥³¥¹¥È·èÄê¤ò¹Ô¤¤¤Þ¤¹¡£¥Ç¥Õ¥©¥ë¥È¤ÎÆ°ºî¤Ï

  1. Ʊ¤¸ÉÊ»ì&Ʊ¤¸Ä¹¤µ¤Î´û¸ñ¸ì¤ÎÊ¿¶Ñ¥³¥¹¥È (¢¨¾ò·ï¤òËþ¤¿¤¹´û¸ñ¸ì¤¬¸«¤Ä¤«¤é¤Ê¤¤¾ì¹ç¤Ï¤¢¤é¤«¤¸¤á·è¤á¤¿¸ÇÄêÃͤòÍøÍÑ)
  2. ¾å¤Ç¼¨¤·¤¿¡ÖñÆȤǸ½¤ì¤¿¾ì¹ç¤Ë¤½¤ì°Ê¾åºÙʬ³ä¤µ¤ì¤Ê¤¤¤®¤ê¤®¤ê¤Î¥³¥¹¥È¡×x 0.7

¤Î¡¤¤É¤Á¤é¤«¾®¤µ¤¤Êý¤ò¤È¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤Þ¤¹¡£(¤³¤ÎÆ°ºî¤Ï GenDic.pm ¤Î200¹ÔÌܤ«¤é¤Î¤¢¤¿¤ê¤òÊÔ½¸¤¹¤ì¤Ð¥«¥¹¥¿¥Þ¥¤¥º²Äǽ¤Ç¤¹¡£)

Á°¼Ô¤Î·×»»¤Ë¤Ï¼­½ñ¤Î¸µ¤Îcsv¥Õ¥¡¥¤¥ë¡¤¸å¼Ô¤Î·×»»¤Ë¤Ï left-id.def, right-id.def, matrix.def ¤ò»²¾È¤¹¤ë¤¿¤á¡¤mecab-ipadic ¤Î¥½¡¼¥¹¤Î¾ì½ê¤ò config ¤ËÀßÄꤷ¤Æ¤ä¤ëɬÍפ¬¤¢¤ê¤Þ¤¹¡£

mecab-dic-overdrive »ÈÍÑÊýË¡

¼­½ñ¤Î¥¤¥ó¥¹¥È¡¼¥ë & ¥æ¡¼¥¶¼­½ñºîÀ®

(1) »öÁ°¤ËɬÍפʥ饤¥Ö¥é¥êÅù¤Î½àÈ÷
  • ¤¢¤é¤«¤¸¤á mecabËÜÂÎ, ¤ª¤è¤Ó¡¤°Ê²¼¤Îperl¥é¥¤¥Ö¥é¥ê¤ò¥¤¥ó¥¹¥È¡¼¥ë¤·¤Æ¤ª¤¯
    • Text::MeCab
    • Unicode::Normalize
    • Unicode::RecursiveDowngrade
    • HTML::Entities
    • File::Spec
    • Path::Class
    • Log::Log4perl
  • mecab-dic-overdriveËÜÂΤògithub¤«¤éÆþ¼ê¤¹¤ë
  • mecab-ipadic-2.7.0-20070801 ¤ò¥À¥¦¥ó¥í¡¼¥É¤·¡¤²òÅष¤Æ¤ª¤¯¡£(¾¤Î¥Ð¡¼¥¸¥ç¥ó¤Î¾ì¹ç¡¤Á°½Ò¤Î¥Ñ¥Ã¥Á¤ÎÃʳ¬¤Ê¤É¤Ç¤³¤±¤ë²ÄǽÀ­¤¬¤¢¤ê¤Þ¤¹)
> git clone https://github.com/nabokov/mecab-dic-overdrive.git
> tar -xvzf mecab-ipadic-2.7.0-20070801.tar.gz
(2) config.pl / log.conf ¤ÎÀßÄê

mecab-dic-overdrive/etc/config.pl ¤ÎÆâÍƤò´Ä¶­¤Ë¤¢¤ï¤»¤Æ¥«¥¹¥¿¥Þ¥¤¥º¤¹¤ë¡£ºÇÄã¤Ç¤â

  • $HOME (mecab-dic-overdrive ¤ò²òÅष¤¿¥Ç¥£¥ì¥¯¥È¥ê)
  • $DIC_SRC_DIR (mecab-ipadic-2.7.0-20070801 ¤ò²òÅष¤¿¥Ç¥£¥ì¥¯¥È¥ê)

¤ÏÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£

¤Þ¤¿¡¤¥Î¡¼¥Þ¥é¥¤¥¼¡¼¥·¥ç¥ó¤òÊѹ¹¤·¤¿¤¤¾ì¹ç¤Ï¾å¤Î¡Ö¼­½ñ¤Î¥Î¡¼¥Þ¥é¥¤¥º¡×¤Î¹à¤ò»²¹Í¤Ë default_normalize_opts ¤òÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£

(Îã)
default_normalize_opts => [qw(decode_entities strip_html nfkc lc)],

Æ°ºî¥í¥°¤Î½ñ¤­½Ð¤·Àè¤òÊѤ¨¤¿¤ê¡¤¥í¥°¥ì¥Ù¥ë¤òÊѤ¨¤¿¤¤¾ì¹ç¤Ï etc/log.conf ¤òÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£

(Îã)
log4perl.rootLogger=DEBUG, LOGFILE
log4perl.appender.LOGFILE.filename=/path/to/log.txt
(3) utf8²½+¥Î¡¼¥Þ¥é¥¤¥º+¥Ñ¥Ã¥ÁŬÍѤµ¤ì¤¿ mecab-ipadic ¤ÎºîÀ®
>bin/initialize_dic.pl

¤³¤ì¤Ç (1)¼­½ñ¤Îutf-8²½ (2)¼­½ñ¤Ø¤Î¥Ñ¥Ã¥ÁŬÍÑ (3)¼­½ñ¤Î¥Î¡¼¥Þ¥é¥¤¥º (4)¼­½ñ¤Î¥³¥ó¥Ñ¥¤¥ë&¥¤¥ó¥¹¥È¡¼¥ë¡¤¤Þ¤Ç¤¬´°Î»¤·¤Þ¤¹¡£

"make install failed" ¤È¸À¤ï¤ì¤Æ¤·¤Þ¤¦¾ì¹ç¡¤¤¢¤ë¤¤¤Ï´û¸¤Î¼­½ñ (/usr/local/lib/mecab/dic/ipadic) ¤ò»Ä¤·¤ÆÊ̤ξì½ê¤Ø¥¤¥ó¥¹¥È¡¼¥ë¤·¤¿¤¤¾ì¹ç¤Ï¡¤°Ê²¼¤Î¤è¤¦¤ËÊ̤ξì½ê¤Ø¼êºî¶È¤Ç¼­½ñ¤ò¥³¥Ô¡¼¤·¡¤ mecab ¸Æ¤Ó½Ð¤·¤ÎºÝ¤Ë -d ¥ª¥×¥·¥ç¥ó¤ò»È¤Ã¤Æ¼­½ñ¥Ç¥£¥ì¥¯¥È¥ê¤ò»ØÄꤹ¤ë¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤¡£

(¼êÆ°¤Ç /usr/local/lib/mecab/dic/ipadic-utf8 ¤Ø¥¤¥ó¥¹¥È¡¼¥ë¤¹¤ë¾ì¹ç¤ÎÎã)

>bin/initialize_dic.pl --noinstall
>mkdir /usr/local/lib/mecab/dic/ipadic-utf8
>cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/*.bin /usr/local/lib/mecab/dic/ipadic-utf8/
>cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/*.def /usr/local/lib/mecab/dic/ipadic-utf8/
>cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/*.dic /usr/local/lib/mecab/dic/ipadic-utf8/
>cp [ipadic¤Î¥½¡¼¥¹¥Ç¥£¥ì¥¯¥È¥ê]/dicrc /usr/local/lib/mecab/dic/ipadic-utf8/

(¤³¤Î¤¢¤È etc/config.pl ¤Î dicdir = "/usr/local/lib/mecab/dic/ipadic" ¤ò
 "/usr/local/lib/mecab/dic/ipadic-utf8" ¤ØÊѹ¹¤¹¤ë)

(mecab ¤ò¥³¥Þ¥ó¥É¥é¥¤¥ó¤«¤é»È¤¦¾ì¹ç¤Ï -d ¥ª¥×¥·¥ç¥ó¤ò»ØÄê)
>mecab -d /usr/local/lib/mecab/dic/ipadic-utf8/

(4) wikipedia¤Î¥Ç¡¼¥¿¤«¤é¥æ¡¼¥¶¼­½ñ¤òºîÀ®¤¹¤ë

ÆüËܸìÈÇwikipedia¤Î¥À¥ó¥×¥µ¥¤¥È¤«¤é jawiki-latest-page.sql.gz ¤òÆþ¼ê¤·¤Æ misc/dic °Ê²¼¤Ë .gz ¤Î¤Þ¤ÞÊݸ¤·¤Þ¤¹¡£( zcat/gzcat ¤¬ÍøÍѤǤ­¤Ê¤¤´Ä¶­¤Ç¤Ï²òÅष¤Æ¤ª¤­¤Þ¤¹¡£¥Õ¥¡¥¤¥ë̾¤äÃÖ¤­¾ì½ê¤òÊѤ¨¤¿¤¤¾ì¹ç¤Ï GenDic/WikipediaFile.pm ¤òŬµ¹Êѹ¹¤·¤Æ¤¯¤À¤µ¤¤¡£)

>bin/generate_dic.pl --target=wikipedia_file

¤È¤¹¤ë¤È¡¤SQL¥Õ¥¡¥¤¥ë¤òľÀÜÆɤ߹þ¤ó¤Çµ­»ö¥¿¥¤¥È¥ë¤òÃê½Ð¤·¡¤¡Ö¸Çͭ̾»ì/°ìÈ̡פȤ·¤Æ¥æ¡¼¥¶¼­½ñ¥Õ¥¡¥¤¥ë misc/dic/wikipedia.dic ¤Ë½ñ¤­½Ð¤·¤Þ¤¹¡£

¢¨SQLʸ¤òľÀܶ¯°ú¤Ë¥Ñ¡¼¥¹¤¹¤ë»ÅÁȤߤΤ¿¤á¡¤º£¸åwikipedia¤Î¥À¥ó¥×»ÅÍͤËÊѹ¹¤¬¤¢¤ë¤ÈÆ°¤«¤Ê¤¯¤Ê¤ë²ÄǽÀ­¤â¤¢¤ê¤Þ¤¹¡£¤½¤Î¾ì¹ç¤Ï¤¤¤Ã¤¿¤ó¥Ç¡¼¥¿¤òDB¤ËÆɤ߹þ¤ß¡¤DB¤«¤é½ñ¤­½Ð¤·¤ò¹Ô¤¦¤è¤ê³Î¼Â¤ÊÊýË¡( --target=wikipedia_file ¤Î¤«¤ï¤ê¤Ë --target=wikipedia ¤ò»ØÄê) ¤âÍøÍѤǤ­¤Þ¤¹¡£¾Ü¤·¤¤ÀßÄêÊýË¡¤Ï GenDic/Wikipedia.pm ¤ò»²¾È¤·¤Æ¤¯¤À¤µ¤¤¡£

(5) ´éʸ»ú¼­½ñ¤«¤é¥æ¡¼¥¶¼­½ñ¤òºîÀ®¤¹¤ë (optional)

´éʸ»ú¼­½ñÍѤȤ·¤ÆÍÍ¡¹¤Ê¾ì½ê¤ÇÇÛÉÛ¤µ¤ì¤Æ¤¤¤ëtsv¤òÆɤ߹þ¤ó¤Ç¥æ¡¼¥¶¼­½ñ¤òºî¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£

Æɤ߹þ¤ß¸µ¤Ï misc/dic/kaomoji.tsv ¤Ë¤¢¤ë¤Î¤Ç¡¤Äɲä·¤¿¤¤´éʸ»ú¤¬¤¢¤ë¾ì¹ç¤Ï¤³¤³¤ËÄɵ­¤·¤¿¤¢¤È¡¤

>bin/generate_dic.pl --target=kaomoji

¤È¤¹¤ë¤È¡¤³Æ´éʸ»ú¤ò¡Öµ­¹æ/°ìÈ̡פȤ·¤Æ misc/dic/kaomoji.dic ¤Ë½ñ¤­½Ð¤·¤Þ¤¹¡£

¢¨wikipedia.dic ¤Ë¤¢¤ëµ­¹æ·Ï¥¨¥ó¥È¥ê¤è¤êÍ¥ÀèÅÙ¤ò¹â¤¯¤¹¤ë¤¿¤á¤Ë¡¤Àè¤ËºîÀ®¤·¤¿ wikipedia.dic ¤òÆɤ߹þ¤ó¤À mecab ¤ò¤Ä¤«¤Ã¤ÆÀ¸µ¯¥³¥¹¥È·×»»¤ò¤¹¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤Þ¤¹¡£¤½¤Î¤¿¤á¡¤misc/dic/wikipedia.dic ¤¬¤Ê¤¤¤ÈÆ°¤­¤Þ¤»¤ó¡£¤³¤Î»ÅÍͤòÊѹ¹¤·¤¿¤¤¾ì¹ç¤Ï GenDic/Kaomoji.pm ¤Î defaults ¥á¥½¥Ã¥É¤òÊÔ½¸¤·¤Æ¤¯¤À¤µ¤¤¡£

(6) ¤½¤Î¾¡¤wikipedia¤Ë¤Ê¤¤¸Çͭ̾»ì¤Ê¤É¤òÄɲ乤ë (optional)

¾åµ­°Ê³°¤ËÄɲä·¤¿¤¤Ì¾»ì¤¬¤¢¤ë¾ì¹ç¤Ï misc/dic/simple_list.txt ¤Ë²þ¹Ô¶èÀÚ¤ê¤ÇÎóµó¤·¡¤

>bin/generate_dic.pl --target=simple_list

¤È¤¹¤ë¤È¡¤¤½¤ì¤é¤ò¤¹¤Ù¤Æ¡Ö̾»ì/¸Çͭ̾»ì/°ìÈ̡פȤ·¤ÆÆɤ߹þ¤ß¡¤misc/dic/simple_list.dic ¤Ë½ñ¤­½Ð¤·¤Þ¤¹¡£

ºîÀ®¤·¤¿¼­½ñ¤ÎÍøÍÑ

¾åµ­¥¹¥Æ¥Ã¥×(4)-(6)¤ÇºîÀ®¤·¤¿¥æ¡¼¥¶¼­½ñ¤Ï¡¤mecab ¤Î -u ¥ª¥×¥·¥ç¥ó¤Ç»ØÄꤷ¤ÆÍøÍѤǤ­¤Þ¤¹¡£

>mecab -u misc/dic/wikipedia.dic,misc/dic/kaomoji.dic,misc/dic/simple_list.dic
(»ÈÍÑÁ°)
> mecab
³³¤Î¾å¤Î¥Ý¥Ë¥ç¥Ë¥³Æ°¤Ç¸«¤¿
³³	̾»ì,°ìÈÌ,*,*,*,*,³³,¥¬¥±,¥¬¥±
¤Î	½õ»ì,Ï¢Âβ½,*,*,*,*,¤Î,¥Î,¥Î
¾å	̾»ì,Èó¼«Î©,Éû»ì²Äǽ,*,*,*,¾å,¥¦¥¨,¥¦¥¨
¤Î	½õ»ì,Ï¢Âβ½,*,*,*,*,¤Î,¥Î,¥Î
¥Ý¥Ë¥ç¥Ë¥³	̾»ì,°ìÈÌ,*,*,*,*,*
Æ°	̾»ì,°ìÈÌ,*,*,*,*,Æ°,¥É¥¦,¥É¡¼
¤Ç	½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*,¤Ç,¥Ç,¥Ç
¸«	Æ°»ì,¼«Î©,*,*,°ìÃÊ,Ï¢ÍÑ·Á,¸«¤ë,¥ß,¥ß
¤¿	½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿
EOS

(»ÈÍѸå)
>mecab -u misc/dic/wikipedia.dic
³³¤Î¾å¤Î¥Ý¥Ë¥ç¥Ë¥³Æ°¤Ç¸«¤¿
³³¤Î¾å¤Î¥Ý¥Ë¥ç	̾»ì,¸Çͭ̾»ì,°ìÈÌ,*,*,*,³³¤Î¾å¤Î¥Ý¥Ë¥ç,Wikipedia:1070057
¥Ë¥³Æ°	̾»ì,¸Çͭ̾»ì,°ìÈÌ,*,*,*,¥Ë¥³Æ°,Wikipedia:1347271
¤Ç	½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*,¤Ç,¥Ç,¥Ç
¸«	Æ°»ì,¼«Î©,*,*,°ìÃÊ,Ï¢ÍÑ·Á,¸«¤ë,¥ß,¥ß
¤¿	½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿
EOS

utf-8ÈÇ ipadic ¤ò¥Ç¥Õ¥©¥ë¥È¤Î¾ì½ê¤È¤Ï°ã¤¦¾ì½ê¤Ë½ñ¤­½Ð¤·¤¿¾ì¹ç¤Ï¡¤-d ¥ª¥×¥·¥ç¥ó¤â»ØÄꤷ¤Þ¤¹¡£

>mecab -d /usr/local/lib/mecab/dic/ipadic-utf8

¥Î¡¼¥Þ¥é¥¤¥¶¤ÎÍøÍÑ

Á°½Ò¤Î¤è¤¦¤Ë¡¤²òÀÏ»þ¤Ë¤Ï¼­½ñºîÀ®»þ¤ÈƱ¤¸¥Î¡¼¥Þ¥é¥¤¥¶¤òÄ̤µ¤Ê¤¤¤È°ÕÌ£¤¬¤Ê¤¤¤Î¤Ç¡¤ºîÀ®¤·¤¿¼­½ñ¤ò»È¤¦¾ì¹ç¤Ë¤Ï°Ê²¼¤Î¤è¤¦¤Ë MecabTrainer::NormalizeText ¤Î¥¤¥ó¥¹¥¿¥ó¥¹¤òÄ̤¹¤è¤¦¤Ë¤·¤Æ¤¯¤À¤µ¤¤¡£

use Encode;
use MecabTrainer::NormalizeText;
new $normalizer = MecabTrainer::NormalizeText->new(
    [decode_entities strip_single_nl nfkc lc]
);

$normalized_decoded_text = $normalizer->normalize(
    Encode::decode('utf8', $raw_input_text)
)

»ÈÍÑÊýË¡¤Ë¤Ä¤¤¤Æ¤Ï bin/normalize_text.pl ¤Î¥½¡¼¥¹¤Ê¤É¤â»²¾È¡£

¥³¥Þ¥ó¥É¥é¥¤¥ó¤Ç»È¤¦¾ì¹ç¤Ë¤Ï bin/normalize_text.pl ¤ò¥Ñ¥¤¥×¤Ç¤«¤Þ¤»¤ÆÍøÍѤ¹¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£

»ÈÍÑÎã¤È²òÀÏ·ë²Ì¤ÎÎã¤ò°Ê²¼¤Ë¤¤¤¯¤Ä¤«ºÜ¤»¤Æ¤ª¤­¤Þ¤¹¡£

> bin/normalize_text.pl | mecab -d /usr/local/lib/mecab/dic/ipadic-utf8/ -u misc/dic/wikipedia.dic,misc/dic/kaomoji.dic
¤Ò¤Þ¤Ê¤¦(¡­¡¦¦Ø¡¦¡®)
^D
¤Ò¤Þ	̾»ì,°ìÈÌ,*,*,*,*,¤Ò¤Þ,¥Ò¥Þ,¥Ò¥Þ
¤Ê¤¦	½õ»ì,½ª½õ»ì,*,*,*,*,¤Ê¤¦,¥Ê¥¦,¥Ê¥¦
( ́¡¦¦Ø¡¦`)	̾»ì,¸Çͭ̾»ì,°ìÈÌ,*,*,*,( ́¡¦¦Ø¡¦`),Wikipedia:700982
EOS

µ×¡¹¹¹¿·¡Á ¤ªÊ¢¤Ø¤Ã¤¿¤ç
^D
µ×¡¹	̾»ì,°ìÈÌ,*,*,*,*,µ×¡¹,¥Ò¥µ¥Ó¥µ,¥Ò¥µ¥Ó¥µ
¹¹¿·	̾»ì,¥µÊÑÀܳ,*,*,*,*,¹¹¿·,¥³¥¦¥·¥ó,¥³¡¼¥·¥ó
¡¼	µ­¹æ,°ìÈÌ,*,*,*,*,¨¡,¨¡,¨¡
¤ªÊ¢	̾»ì,°ìÈÌ,*,*,*,*,¤ªÊ¢,¥ª¥Ê¥«,¥ª¥Ê¥«
¤Ø¤Ã	Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥é¹Ô,Ï¢ÍÑ¥¿Àܳ,¤Ø¤ë,¥Ø¥Ã,¥Ø¥Ã
¤¿	½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿
¤ç	½õ»ì,½ª½õ»ì,*,*,*,*,¤è,¤è,¤è

Ä«¤«¤é¥Æ¥´¥Þ¥¹¤Î¤¢¤¤CM¤ä¤Ã¤Æ¤¿
^D
Ä«	̾»ì,Éû»ì²Äǽ,*,*,*,*,Ä«,¥¢¥µ,¥¢¥µ
¤«¤é	½õ»ì,³Ê½õ»ì,°ìÈÌ,*,*,*,¤«¤é,¥«¥é,¥«¥é
¥Æ¥´¥Þ¥¹¤Î¤¢¤¤	̾»ì,¸Çͭ̾»ì,°ìÈÌ,*,*,*,¥Æ¥´¥Þ¥¹¤Î¤¢¤¤,Wikipedia:2035668
cm	̾»ì,°ìÈÌ,*,*,*,*,£Ã£Í,¥·¡¼¥¨¥à,¥·¡¼¥¨¥à
¤ä¤Ã	Æ°»ì,¼«Î©,*,*,¸ÞÃÊ¡¦¥é¹Ô,Ï¢ÍÑ¥¿Àܳ,¤ä¤ë,¥ä¥Ã,¥ä¥Ã
¤Æ	Æ°»ì,Èó¼«Î©,*,*,°ìÃÊ,Ï¢ÍÑ·Á,¤Æ¤ë,¥Æ,¥Æ
¤¿	½õÆ°»ì,*,*,*,Æü졦¥¿,´ðËÜ·Á,¤¿,¥¿,¥¿
EOS

( Žß¢ÏŽß)Ž±ŽÊŽÊȬȬŽÉ¡³ŽÉ¡³ŽÉ¡³ŽÉ ¡À / ¡À/ ¡À
^D
( ゚¢Ï゚)¥¢¥Ï¥ÏȬȬ¥Î¡³¥Î¡³¥Î¡³¥Î  / / 	µ­¹æ,°ìÈÌ,*,*,*,*,( ゚¢Ï゚)¥¢¥Ï¥ÏȬȬ¥Î¡³¥Î¡³¥Î¡³¥Î  / / \n

¥«¥¹¥¿¥à¤ÎÆɤ߹þ¤ß¥¯¥é¥¹¤ÎºîÀ®

GenDic/ °Ê²¼¤Ë¥µ¥Ö¥¯¥é¥¹¤òºîÀ®¤¹¤ë¤³¤È¤Ç¡¤Ç¤°Õ¤ÎÆþÎϤ«¤é¥æ¡¼¥¶¼­½ñ¤ò¤Ä¤¯¤ë¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£MecabTrainer::GenDic ¥¯¥é¥¹¤ò·Ñ¾µ¤·¤Æ

  • ÆþÎÏ¥¹¥È¥ê¡¼¥à¤Î³«¤­Êý
  • ÆþÎϤò°ì¹Ô¤º¤ÄÆɤߡ¤¥Ñ¡¼¥¹¤¹¤ëÊýË¡
  • ÆɤߤȤä¿Ã±¸ì¤Ë¤É¤ó¤ÊÉʻ졤features¤ò³ä¤êÅö¤Æ¤ë¤«

¤òµ­½Ò¤·¤Æ¤ª¤±¤Ð¡¤À¸µ¯¥³¥¹¥È¤Î·×»»¤ä¼­½ñ¤Î¥³¥ó¥Ñ¥¤¥ë¤Ï¿Æ¥¯¥é¥¹¤¬¤¹¤Ù¤Æ¸ªÂå¤ï¤ê¤·¤Æ¤¯¤ì¤ë»ÅÁȤߤǤ¹¡£¾Ü¤·¤¯¤Ï GenDic ¥Ç¥£¥ì¥¯¥È¥ê°Ê²¼¤Î³Æ¥½¡¼¥¹¤ò»²¾È¤·¤Æ²¼¤µ¤¤¡£

generate_dic.pl ¤Î --target ¥ª¥×¥·¥ç¥ó¤Ç¥µ¥Ö¥¯¥é¥¹Ì¾¤ò»ØÄê (CamelCase¤ò¾®Ê¸»ú+"_"¤ËÃÖ¤­´¹¤¨) ¤¹¤ë¤³¤È¤Ç¡¤ºîÀ®¤·¤¿¥µ¥Ö¥¯¥é¥¹¤ò¸Æ¤Ó½Ð¤¹¤³¤È¤¬¤Ç¤­¤Þ¤¹¡£

(¥µ¥Ö¥¯¥é¥¹ TestClass.pm ¤ò»ØÄꤹ¤ë¾ì¹ç¤ÎÎã)
>bin/generate_dic.pl --target=test_class
¥ì¥¹¥Ý¥ó¥¹
¥³¥á¥ó¥È(0)
¥È¥é¥Ã¥¯¥Ð¥Ã¥¯(2)

¤³¤Î¥¨¥ó¥È¥ê¡¼¤ò¤Ï¤Æ¤Ê¥Ö¥Ã¥¯¥Þ¡¼¥¯¤ËÄɲÃ