Common Lispã§æ¸ãããå½¢æ ç´ è§£æå¨cl-igo / cl-mecabã使ã£ã¦ã¿ã
cl-igoã¯Common Lispãã使ããå½¢æ
ç´ è§£æå¨ã§ãè¾æ¸ã«ã¯mecabäºæã®è¾æ¸ã使ããã
roswellから入るようにgithubにミラーを作ったので、
ros install masatoi/charseq masatoi/cl-igo
ã§ã¤ã³ã¹ãã¼ã«ã§ãããSBCLæ¨å¥¨ã¨ã®ãã¨ã
igoã®ãã¤ããªè¾æ¸ãä½ã
- IPAè¾æ¸ããã¦ã³ãã¼ã https://sourceforge.net/projects/mecab/files/mecab-ipadic/2.7.0-20070801/
- igo-0.4.5.jarããã¦ã³ãã¼ã https://osdn.net/projects/igo/downloads/55029/igo-0.4.5.jar/
mecab-ipadic-2.7.0-20070801.tgzã¨igo-0.4.5.jarãåããã£ã¬ã¯ããªã«å ¥ã£ã¦ããã¨ãã¦ã
tar xzvf mecab-ipadic-2.7.0-20070801.tar.gz java -cp ./igo-0.4.5.jar net.reduls.igo.bin.BuildDic ipadic mecab-ipadic-2.7.0-20070801 EUC-JP
ã§ipadicã¨ãããã£ã¬ã¯ããªãã§ãã¦ããã~/igo/ipadicã«ã§ãç½®ãã¦ããã¨ããã
使ã£ã¦ã¿ã
(ql:quickload :igo) ;; è¾æ¸ãèªã¿è¾¼ã (igo:load-tagger "/home/wiz/igo/ipadic/") (igo:parse "åºã«ã¯äºç¾½ã«ãã¨ããããã")
(("åº" "åè©,ä¸è¬,*,*,*,*,åº,ãã¯,ãã¯" 0) ("ã«" "å©è©,æ ¼å©è©,ä¸è¬,*,*,*,ã«,ã,ã" 1) ("ã¯" "å©è©,ä¿å©è©,*,*,*,*,ã¯,ã,ã¯" 2) ("äº" "åè©,æ°,*,*,*,*,äº,ã,ã" 3) ("ç¾½" "åè©,æ¥å°¾,å©æ°è©,*,*,*,ç¾½,ã¯,ã¯" 4) ("ã«ãã¨ã" "åè©,ä¸è¬,*,*,*,*,ã«ãã¨ã,ãã¯ããª,ãã¯ããª" 5) ("ã" "å©è©,æ ¼å©è©,ä¸è¬,*,*,*,ã,ã¬,ã¬" 9) ("ãã" "åè©,èªç«,*,*,ä¸æ®µ,åºæ¬å½¢,ãã,ã¤ã«,ã¤ã«" 10) ("ã" "è¨å·,å¥ç¹,*,*,*,*,ã,ã,ã" 12))
(igo:wakati "åºã«ã¯äºç¾½ã«ãã¨ããããã")
("åº" "ã«" "ã¯" "äº" "ç¾½" "ã«ãã¨ã" "ã" "ãã" "ã")
mecab-ipadic-neologdã®ã¤ã³ã¹ãã¼ã«
mecab-ipadic-neologdã¯æ°èªãªã©ã大å¹
ã«å¢ãããè¾æ¸ã§ãSNSã®ãã¼ã¿ãåæããããããªã¨ãã«éè¦ã«ãªãã
Ubuntu14.04ã§ã®ã¤ã³ã¹ãã¼ã«ä¾ã¯ã
# å¿ è¦ããã±ã¼ã¸ãã¤ã³ã¹ãã¼ã« sudo apt-get install mecab libmecab-dev mecab-ipadic-utf8 git make curl xz-utils file git clone --depth 1 https://github.com/neologd/mecab-ipadic-neologd.git cd mecab-ipadic-neologd/ # éä¸ã§ç¢ºèªãåºãã®ã§yesãå ¥å ./bin/install-mecab-ipadic-neologd -n # ã¤ã³ã¹ãã¼ã«å echo `mecab-config --dicdir`"/mecab-ipadic-neologd" # /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd # åä½ç¢ºèª mecab -d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd
igoã®ãã¤ããªè¾æ¸ãä½ã
mecab-ipadic-neologdã¯UTF-8ã®ã¿å¯¾å¿ã¨ãããã¨ãªã®ã§ãæåã³ã¼ãã«UTF-8ãæå®ãã¦ãã¤ããªè¾æ¸ãä½ãã
cd build/ java -cp ../../igo-0.4.5.jar net.reduls.igo.bin.BuildDic ipadic-neologd mecab-ipadic-2.7.0-20070801-neologd-20161201/ UTF-8
ããã§ipadic-neologdã¨ãããã£ã¬ã¯ããªãã§ãããããã¯Javaçã®Igoãããªã以ä¸ã®ããã«ãã¦ä½¿ããã
java -cp ../../igo-0.4.5.jar net.reduls.igo.bin.Igo ipadic-neologd
ãããCommon Lispçã¯EUC-JPã®ã¿å¯¾å¿ã®ããã§ãigo:load-taggerã§èªã¿è¾¼ããã¨ããã¨ã¨ã©ã¼ã«ãªã£ã¦ãã¾ãããã¼ãã
cl-mecabã使ã£ã¦ã¿ã
Quicklispããå
¥ãcl-mecabã§neologdã使ãããã©ãã試ãã¦ã¿ãã
with-mecabãã¯ãã«ä¸è¨ã®mecabã®ã³ãã³ãã©ã¤ã³ãªãã·ã§ã³ã渡ããã¨ãã§ããã
2019年末版 形態素解析器の比較 - Qiita ã«ããneko.txtã§è©¦ãã¦ã¿ãã
(time (with-open-file (f "~/Downloads/neko.txt") (labels ((iter () (let ((text (read-line f nil nil))) (if text (progn (igo:parse text) (iter)))))) (iter)))) Evaluation took: 0.088 seconds of real time 0.087601 seconds of total run time (0.087601 user, 0.000000 system) 100.00% CPU 332,963,144 processor cycles 210,504,608 bytes consed (time (cl-mecab:with-mecab ("-d /usr/lib/x86_64-linux-gnu/mecab/dic/mecab-ipadic-neologd") (with-open-file (f "~/Downloads/neko.txt") (labels ((iter () (let ((text (read-line f nil nil))) (if text (progn (cl-mecab:parse* text) (iter)))))) (iter))))) Evaluation took: 0.376 seconds of real time 0.376776 seconds of total run time (0.336570 user, 0.040206 system) 100.27% CPU 1,432,001,462 processor cycles 207,520,864 bytes consed
ãã¯ãCommon Lispãã¤ãã£ããªåcl-igoã®æ¹ãè¥å¹²éãã