ç¶ IBM Model 1: null ãã¼ã¯ã³ãå ¥ãã
å æ¥ IBM Model 1 ãå®è£ ãã¦ã¿ã¾ããããf å´ã®åæã«ã¯ null ãã¼ã¯ã³ã追å ããå¿ è¦ãããã¾ãããKohen å çã®ã¹ã©ã¤ãã«ãã "Inserting Words" ã®ã¨ããã§ãã
ãããªããã§ãããã°ã©ã ãå°ãä¿®æ£ãã¾ããã³ã¼ãã¹ãèªã¿è¾¼ãã å¾ãf å´ã®æã« null ãã¼ã¯ã³ã追å ããããã«ãã¾ãã
def main(): sentence_pairs = load_sentence_pairs(sys.argv[1]) add_null_tokens(sentence_pairs) translation_prob = init_translation_prob(sentence_pairs) for i in range(0, 10): # initialize ...
add_null_tokens é¢æ°ã¯ããã ããããå æ¸ã§ãã __null__ ã¨ããæåå㧠null ãã¼ã¯ã³ã表ããã¨ã«ãã¾ããã
def add_null_tokens(sentence_pairs): return [(e, f.append('__null__')) for (e, f) in sentence_pairs]
試ãã¦ã¿ã¾ããå°ãæ°åãå¤ãã£ã¦ããããã§ãã
$ ./model1.py je_corpus.txt ã There 0.927606288093 ä»æ¥ today. 0.926667415895 ææ¥ tomorrow. 0.874001094428 å½¼ã They 0.84912926054 å½¼ He 0.815038442431 å ç teacher 0.814664299858 ç¶ father 0.813230475027 㨠island 0.808127514262 家 house 0.808048519329 æ¥æ¬ Japan 0.804555168981