n-gramãçæããNgramCreator.javaãä½ã£ã
Javaã§Stringåã®æããn-gramãçæãããã£ãã®ã§ããï¼ããæãã®ãã®ãè¦ã¤ãããªãã£ãã®ã§èªä½ãã¾ããï¼
Gistã«ã³ã¼ãããããã®ã§ã¨ããããè²¼ã£ã¨ãã¾ãï¼
使ãæ¹ã¯é常ã«ç°¡åã§ï¼NgramCreator#createNgram(String text, int n)
ãå¼ã³åºãã ãã§ãï¼
第1å¼æ°text
ã«ã¯n-gramãçæãããæãï¼ç¬¬2å¼æ°n
ã«ã¯ä½gramãçæããããå
¥ãã¾ãï¼
ãã®ã¡ã½ããã®è¿ãå¤ã¯Map<String, Integer>åã§ï¼keyã«ã¯n-gramãï¼valueã«ã¯åºç¾åæ°ãæ ¼ç´ããã¦ãã¾ãï¼
ä¸è¨ã³ã¼ãã«ã¯ãã¹ãç¨ã®mainã¡ã½ãããç¨æãã¦ããã®ã§ï¼ãããå®è¡ããã¨ä»¥ä¸ã®ããã«åºåããã¾ãï¼
ãªãï¼å
¥åããã¹ãã«ã¯n-gram - Wikipediaãå©ç¨ãã¾ããï¼
typically are collected 1 probability, an n-gram 1 is a contiguous 1 or speech corpus. 1 items can be 1 a given sequence 1 The n-grams typically 1 are collected from 1 contiguous sequence of 1 computational linguistics and 1 ï¼ä»¥ä¸ç¥ï¼
n-gramã¨ï¼ãã®åºç¾åæ°ãåºåããã¾ãï¼
ã¾ãï¼ãã®ã¡ã½ããã§ã¯åè§ã¹ãã¼ã¹ãåèªã®åºåãã¨ãã¦æ±ãã¾ãï¼
ãã®ããï¼å¿
è¦ãããã°äºãããã¹ãã«åå¦çãæ½ãã¦ããã¨ããã§ãããï¼ï¼ç¹ã«æ¥æ¬èªã®å ´åï¼
ãããããã°ä½¿ã£ã¦ã¿ã¦ãã ãã^^