2012-01-01ãã1å¹´éã®è¨äºä¸è¦§
å æ¥ IBM Model 1 ãå®è£ ãã¦ã¿ã¾ããããf å´ã®åæã«ã¯ null ãã¼ã¯ã³ã追å ããå¿ è¦ãããã¾ãããKohen å çã®ã¹ã©ã¤ãã«ãã "Inserting Words" ã®ã¨ããã§ãããããªããã§ãããã°ã©ã ãå°ãä¿®æ£ãã¾ããã³ã¼ãã¹ãèªã¿è¾¼ãã å¾ãf å´ã®æã« null ãâ¦
IBM Model 1 ã SQL ã§å®è£ ãã¦ã¿ã¾ããRDBMS ã«ã¯ PostgreSQL ã使ãã¾ãããã¾ããã³ã¼ãã¹ãæ ¼ç´ãããã¼ãã«ãä½æãã¾ããposition_id ã¯æä¸ã§ã®åºç¾ä½ç½®ã表ãå¤ã®ã¤ããã§ãããã㯠IBM Model 1 ã§ã¯ä¸è¦ãªã®ã§ãããæ°åçã«å ¥ãã¦ããã¾ããã CRâ¦
çµ±è¨æ©æ¢°ç¿»è¨³ã«é¢é£ãã¦ããã®ãã¼ã¸ã«ç½®ããã¦ãã Koehn å çã®ã¹ã©ã¤ããèªãã§ã¿ã¦ãã¾ãã http://www.statmt.org/book/Word-Based Model ã®ã¹ã©ã¤ãã« IBM Model 1 ã®èª¬æãåãããããæ¸ããã¦ããã®ã§ãã¹ã©ã¤ã 29 ãã¼ã¸ã®æ¬ä¼¼ã³ã¼ããèªåã§ãæ¸â¦
ãã£ããã©ã³ãã æçæãã§ããã®ã§ãã³ã¼ãã¹ãå¤ãã¦ããå°ãéãã§ã¿ã¾ãããé空æ庫ãããå¾è¼©ã¯ç«ã§ãããã使ã£ã¦ã¿ããã¨æãã¾ãã $ wget http://www.aozora.gr.jp/cards/000148/files/789_14547.htmlæåã³ã¼ããã·ãã JIS ãªã®ã§å¤æãã¦ããâ¦
Python ã® NLTK ã使ã£ã¦ãã©ã³ãã æçæã§éãã§ã¿ã¾ãããå¦ç¿ã«ä½¿ãã³ã¼ãã¹ãå¿ è¦ãªã®ã§ wikipedia ã®ãã¼ã¿ãè²°ã£ã¦ãã¾ãããè¦ç´ããä¸çªå°ããããªã®ã§ãããã«ãã¾ãããããã§ã 1.2GB ãããããã¾ãã $ wget http://dumps.wikimedia.org/jawikâ¦