Rã§BKB
ãã話ãããã
http://gg-hogehoge.hatenablog.com/entry/2014/11/01/124501
ç§ããã¤ã¯å·å´ãã¤ã¯ã¯å¤§å¥½ããªã®ã§Rã§ä½ã£ã¦ã¿ãã
Windows7ãªã®ã§åæºåã¨ãã¦kakasiãã¤ã³ã¹ãã¼ã«ãã¦PATHãéãã¦ããã
RMeCabããã±ã¼ã¸ã§MeCabããNipponããã±ã¼ã¸ã§kakasiãå©ãã¦ããã
çµæ
ãããªæãã
> getBKB("ç§ã¯ä¸ã®ãã¯ããå¥å ¨ãªãç¾å¾³ æ¸ è²§ã ã®å¹ç´ã®ç²¾ç¥ã ã® å°è¦æ¬ ä¹ã«èã¸ãç¾å¾³ã ã® è¬è²ã®ç¾å¾³ãªã©ã¨ããµãã®ã¯ã¿ããªå«ã²ã§ ç¾å¾³ã§ã¯ãªã æªå¾³ã ã¨æã¤ã¦ãã") file = C:\************** ç§ã¯ä¸ã®ãã¯ããå¥å ¨ãªãç¾å¾³ æ¸ è²§ã ã®å¹ç´ã®ç²¾ç¥ã ã® å°è¦æ¬ ä¹ã«èã¸ãç¾å¾³ã ã® è¬è²ã®ç¾å¾³ãªã©ã¨ããµãã®ã¯ã¿ããªå«ã²ã§ ç¾å¾³ã§ã¯ãªã æªå¾³ã ã¨æã¤ã¦ãã ç¾å¾³ å¹ç´ ç¾å¾³ b! k! b! ã²ãã¼ã!!! > getBKB("åã¯åã®ãã¨ã馬鹿ã ã¨ãã") file = C:\************** åã¯åã®ãã¨ã馬鹿ã ã¨ãã å ã㨠馬鹿 b! k! b! ã²ãã¼ã!!!
ã³ã¼ã
getBKB <- function(txt, items=c("åè©", "åè©","形容è©","å¯è©")){ require(RMeCab) require(Nippon) tmpdir <- tempdir() td <- tempfile("tmp", tmpdir = tmpdir) write(txt , file = td) res_mecab <- RMeCabText(td) # ãã¡ã¤ã«ååºåãæå¶ããããããæ¹ãããããªã # åãã¡æ¸ãã®çµæï¼èªãåè©ãã«ãï¼ãåå¾ res_mecab <- do.call("rbind", lapply(res_mecab, function(x)x[c(1,2,10)])) trg_kakasi <- res_mecab[res_mecab[,2]%in%items, 1] # kakasiã§ã¢ã«ãã¡ãããã«å¤æãã¦èªã®å é ä¸æåã®ã¿ãåå¾ headchar <- paste(collapse="", lapply(strsplit(kakasi(trg_kakasi), split=""), function(x)x[1])) if(regexpr("b.*k.*b",headchar) == -1){ return("NoBKB...") } # bkbã®æ¢ç´¢ num <- regexpr("b.*k.*b",headchar)[1] num2 <- regexpr("k.*b",substring(headchar, first=num+1))[1] + num num3 <- regexpr("b",substring(headchar, first=num2+1))[1] + num2 # çµæåºå bkbword <- trg_kakasi[c(num, num2, num3)] cat(paste("\n",txt,"\n", paste(collapse=" ", bkbword), "\n", "b! k! b!","\n","ã²ãã¼ã!!!" )) }
åè
RMeCabTextã®çµæã¨ãã¦åºåããããã¡ã¤ã«åãæå¶ãããã®ã ãããæ¹ãããããªãã
ï¼è¿½è¨ï¼
Mecabã®ä»æ§ã§ããRã§ã¯å¾¡ããããã¨ã®ã¢ããã¤ã¹ãããã ããã®ã§ãä¸æãã¡ã¤ã«ã«é£ã°ããã¨ã«ãã¾ããã
ãããã¨ããããã¾ããï¼
capture.output(res_mecab <- RMeCabText(td), file=tempfile("tmp", tmpdir = tmpdir))