ç 究éçºé¨ã®å島ã§ããå»å¹´ããã¯ã¬ã·ããµã¼ãã¹éçºé¨ãå ¼åãã¦ãã¾ãããã¡ãã®è©±ï¼æ¤ç´¢ã®è©±ï¼ã¯ããããããã¨ãã¦ãä»æ¥ã¯ç 究éçºé¨ã®è©±ï¼æ©æ¢°å¦ç¿ã®è©±ï¼ããã¾ãã
fastText
åèªã®åæ£è¡¨ç¾ãéè¦ã§ãããããã¥ã¼ã©ã«å ¨çæã®ç¾ä»£ã«ããã¦ã使ããªãã¨ããé¸æè¢ã¯ã»ã¨ãã©ãªãããã«æãã¾ãã
æåã«è©±é¡ã«ãªã£ãã®ã¯ã2013 å¹´ã«çºè¡¨ããã word2vec ã§ãããããkingãã®ãã¯ãã«ãããmanãã®ãã¯ãã«ãå¼ãããwomanãã®ãã¯ãã«ã足ããããqueenãã®ãã¯ãã«ã«ãªã£ãã¨ãã話ã¯æåã§ããä¸æ¹ãæè¿ã¯ã2018 å¹´ã«çºè¡¨ããã BERTï¼åã³ãããã«é¡ããã¢ãã«ï¼ã®è©±é¡ã§æã¡ããã§ããã
fastText ã¯ããåç¥ã®æ¹ãå¤ãã¨æãã¾ãããåæ£è¡¨ç¾ãå¦ç¿ããããã®ã©ã¤ãã©ãªã§ããå¦ç¿ã®ã¢ã«ã´ãªãºã èªä½ãæããã¨ãããããã«æãã¾ããfastText ã®è«æã¯ä»¥ä¸ã§ãã2017 å¹´ã«çºè¡¨ããããã®ãªã®ã§ãçºå±ãéããã®æ¥çã«ããã¦ã¯ããå¤ãè«æãªã®ããããã¾ããã
- Enriching Word Vectors with Subword Information. Piotr Bojanowski, Edouard Grave, Armand Joulin, Tomas Mikolov.
ãªã fastText ãªã®ãï¼
ã¯ãã¯ãããã§ã¯ fastText ããã使ã£ã¦ãã¾ããã§ã¯ããªã fastText ãªã®ã§ãããï¼ä¸ã§ã触ããããã«ãword2vec ã BERT ãªã©ã®é¸æè¢ãããã¾ãããã¡ãããfastText ã主è¦ãªé¸æè¢ã®ä¸ã¤ã§ã¯ããã¾ãããã©ãã㦠fastText ãªã®ã§ããããï¼
æ§ã ãªçç±ãããã¾ãããã¾ã¨ããã¨ããæ§è½ã¨éç¨ã®ãã©ã³ã¹ããããã¨ãã£ãã¨ããã§ããããã
æ§è½ã®é¢ã§ã¯ããµãã¯ã¼ãï¼é¨åæååï¼ãèæ ®ã§ããåãword2vec ãã㯠fastText ãããã§ããããä¸æ¹ãæèãèæ ®ãã表ç¾ãå¦ç¿ã§ããåãfastText ãã㯠BERT ãããããã§ãããã¡ããããããã¯ä¸è¬è«ã§ããå®éã«ã¯ã¿ã¹ã¯ãå¦ç¿ãã¼ã¿ã«ãã£ã¦è©±ãéã£ã¦ããã§ãããã
ä¸æ¹ãéç¨ã®é¢ã§ã¯ BERT ãã fastText ã word2vec ãããã§ããããBERT ã¯äºåå¦ç¿ã大å¤ã§ããã¯ãã¯ãããã§ãä½åº¦ããã©ã¤ãã¦ãã¾ããããéãæéããããã¾ããå¦ç¿ãã¼ã¿ãåèªåå²å¨ããµãã¯ã¼ãåå²å¨ãwhole word maskingããã¹ã¯ç¢ºçã...ã試è¡é¯èª¤ããã ãã§ãããªãã®ãéã¨æéããããã¾ãã
ãã¡ããããã¡ã¤ã³ãã¥ã¼ãã³ã°ã§æ¸ã¾ãã¨ããæãããã¾ããããããããã¨ã«ãä¸ã®ä¸ã«ã¯äºåå¦ç¿æ¸ã¿ã®ã¢ãã«ã沢山ããã¾ãããããã使ãã°ãäºåå¦ç¿ããå¿ è¦ã¯ããã¾ãããããããçµå±ããããã¤ããã«ã¯ã¢ãã«ã大ããã£ãããAPI ã¨ãã¦ä½¿ãã«ã¯æ¨è«ãé ãã£ããã¨ãã£ãåé¡ãæ®ãã¾ãã
ãã®ããã«ãæ§è½ã¨éç¨ã®ãã©ã³ã¹ãèããã¨ãfastText ã¯ãã¾ã§ãé常ã«åªããé¸æè¢ã ã¨æãã¾ãã
fastText ã使ã£ã¦ããåãçµã¿
ã¯ãã¯ããã㧠fastText ã使ã£ã¦ããåãçµã¿ã¨ãã¦ã¯ããã¨ãã°ã以ä¸ãããã¾ãã
- åèªåãè¾¼ã¿ãå©ç¨ããååã«å¯¾ãããã¼ã¯ã¼ãã®äºæ¸¬ï¼to appearï¼. å±±å£æ³°å¼, 深澤ç¥æ´, å島ç´. è¨èªå¦çå¦ä¼ç¬¬ 28 å年次大ä¼çºè¡¨è«æé.
ãã¡ãã¯ãã¯ãã¯ããããã¼ãã®åååãããé£æã表ããã¼ã¯ã¼ããäºæ¸¬ããåãçµã¿ã§ãããã¼ã¯ã¼ããåååããã¯ãã«ã«å¤æããã®ã« fastText ã使ã£ã¦ãã¾ããäºæ¸¬çµæã¯ã¯ãã¯ããããã¼ãã®ç®¡çç»é¢ã§ä½¿ããã¦ãã¾ãã
ä½è«ã§ããããã¡ãã®åãçµã¿ã¯ä»å¹´ã®è¨èªå¦çå¦ä¼ã§å§å¡ç¹å¥è³ãããã ãã¾ããããããã¨ããããã¾ãã
- ãã«ãã©ãã«åé¡ã«ããæææ¨è¦ã¢ãã«. 深澤ç¥æ´, 西å·èä», å島ç´. è¨èªå¦çå¦ä¼ç¬¬ 27 å年次大ä¼çºè¡¨è«æé.
ãã¡ãã¯ãã¬ã·ãã®ã¿ã¤ãã«ããããã®ã¬ã·ãã§ä½¿ãããã§ãããé£æãäºæ¸¬ããåãçµã¿ã§ããã¿ã¤ãã«ä¸ã®åèªããã¯ãã«ã«å¤æããã®ã« fastText ã使ã£ã¦ãã¾ããäºæ¸¬çµæã¯ã¬ã·ãã®æ稿ç»é¢ã§ä½¿ããã¦ãã¾ãã
- RedshiftML in Cookpad. 深澤ç¥æ´. Redshift MLãã³ãºãªã³ + re:Invent re:Cap Analyticsç·¨.
ãã¡ãã¯ãã¬ã·ãã®ã¿ã¤ãã«ããããã®ã¬ã·ãã®ã«ãã´ãªï¼e.g., èæçãéæçãéèæçã...ï¼ãäºæ¸¬ããåãçµã¿ã§ãããã¡ãããã¿ã¤ãã«ä¸ã®åèªããã¯ãã«ã«å¤æããã®ã« fastText ã使ã£ã¦ãã¾ããäºæ¸¬çµæã¯ãè¿æ¥ä¸ã«ãã¬ã·ãã®ããã¯ãã¼ã¯ç»é¢ã§ä½¿ãããäºå®ã§ãã
ãã®ä»ãã¾ã å®é¨æ®µéã®åãçµã¿ã§ã fastText ããã使ã£ã¦ãã¾ãã
fastText ã®å¦ç¿ã»å©ç¨ããã¼
以ä¸ã¯ãã¯ãã¯ãããã«ããã fastText ã®å¦ç¿ã»å©ç¨ããã¼ã§ããRedshift ããå¦ç¿ãã¼ã¿ãåå¾ããfastText ãå¦ç¿ããå¾ãã¢ãã«ã S3ã«ä¿åããã¨ããã®ãããã¾ããªæµãã§ãããããããã¨ã¯ãã¦ãã¾ãããã¡ãã£ã¨å¤ãã£ããã¨ãããã¨ããã°ãå¦ç¿ãã¼ã¿ã Redshift ã«ãããã¨ãããã§ããããã
1. å¦ç¿ãã¼ã¿ã®åå¾
fastText ã®å¦ç¿ã«ã¯ããã¹ããå¿ è¦ã§ããæ¥æ¬èªã®å ´åãããã«ãåèªåå²ãå¿ è¦ã§ãã
ã¯ãã¯ãããã®å ´åãå ¨ã¬ã·ãã®ããã¹ãï¼e.g., ã¿ã¤ãã«ï¼ã Redshift ã«ä¿åããã¦ãã¾ããã¾ãããã®åå²çµæã Redshift ã«ä¿åããã¦ãã¾ãã詳細ã¯ä»¥ä¸ã®è¨äºãã覧ãã ãããfastText ã®å¦ç¿ã«ã¯ããã使ã£ã¦ãã¾ãã
åå²çµæã®åå¾ã«ã¯ Queueryï¼ãã ããï¼ã¨ããã·ã¹ãã ã使ã£ã¦ãã¾ããQueuery ã¯ãUNLOAD ã使ããã¨ã§ãRedshift ãã¯ã©ã¤ã¢ã³ãã«è² è·ããããã« SELECT ãå®è¡ã§ããã·ã¹ãã ã§ããQueuery ã¯å»å¹´æ«ã« OSS åããã¾ããã詳細ã¯ä»¥ä¸ã®è¨äºãã覧ãã ãããç 究éçºé¨ã®å±±å£ã«ãã Python ã¯ã©ã¤ã¢ã³ããããã¾ãã
2. fastText ã®å¦ç¿
Python ã¹ã¯ãªããã«ä»¥ä¸ã® 2 è¡ãæ¸ãã ãã§ããfastTextã便å©ããã¾ãã...ã
import fasttext model = fasttext.train_unsupervised('data.txt', model='skipgram') # cbow ã§ãå¯
å ¨ã¬ã·ãï¼2022 å¹´ 4 ææç¹ã§ç´ 367 ä¸åï¼ã®ããã¹ãã使ã£ã¦ããå¦ç¿ã¯ç´ 10 åã§çµããã¾ããã¡ã¢ãªã 2GB ç¨åº¦ã§æ¸ãã§ãã¾ããå¦ç¿ã«ã¯ EC2 ã®ã¹ãããã¤ã³ã¹ã¿ã³ã¹ã使ã£ã¦ãã¾ãã
ãã©ã¡ã¼ã¿ã¯ç¹ã«ããã£ã¦ããããããã©ã«ãã®ã¾ã¾ã§ãããã¨ãã°ããã¯ãã«ã®æ¬¡å æ°ã¯ 100 ã§ãããã©ã¡ã¼ã¿ã®ãã¥ã¼ãã³ã°ã¯ä»å¾ã®èª²é¡ï¼å¾è¿°ï¼ã§ãã
3. ã¢ãã«ã®ä¿å
ã¢ãã«ã¯ S3 ã«ä¿åãã¦ãã¾ãããã¼ã«ããã¯ã§ããããã«ãéå»ã«å¦ç¿ããã¢ãã«ãæ®ãã¦ããã¾ãã幸ããå®éã«ãã¼ã«ããã¯ãå¿ è¦ã«ãªã£ããã¨ã¯ããã¾ãããã¾ã ç¹ã«å°ã£ã¦ãã¾ããããã©ã¤ããµã¤ã¯ã«ãããã¯è¨å®ãã¦ãããããããã¾ããã
4. ã¢ãã«ã®ãã¦ã³ãã¼ã
å¦ç¿æ¸ã¿ã®ã¢ãã«ã使ãããã¢ããªã±ã¼ã·ã§ã³ã«å¯¾ã㦠S3 ã®è©²å½ãã©ã«ãã¸ã® Read ã¢ã¯ã»ã¹ã許å¯ãã¾ããããã§åã¢ããªã±ã¼ã·ã§ã³ã§ã¢ãã«ããã¦ã³ãã¼ãã§ãã¾ãã
以ä¸ã fastText ã®å¦ç¿ã»å©ç¨ããã¼ã§ãããã®ä»ãè£è¶³äºé ã¨ãã¦ä»¥ä¸ãããã¾ãã
fastText ã¯ã¬ã·ãã®ãã£ã¼ã«ãï¼e.g., ã¿ã¤ãã«ãææã...ï¼æ¯ã«å¦ç¿ãã¦ãã¾ããããã¯ãfastText ã使ãã¿ã¹ã¯æ¯ã«çç®ãããã£ã¼ã«ããéãããã§ããã¿ã¤ãã«ã«çç®ããã¿ã¹ã¯ï¼e.g., ã¬ã·ãã®åé¡ï¼ã§ã¯ã¿ã¤ãã«ã§å¦ç¿ããã¢ãã«ã使ããããã«ãææã«çç®ããã¿ã¹ã¯ï¼e.g., ææã®åé¡ï¼ã§ã¯ææã§å¦ç¿ããã¢ãã«ã使ããããã«ãã¦ãã¾ãã
ã¸ã§ãã¹ã±ã¸ã¥ã¼ã©ã¼ããããã¤ãã¼ã«ã«ã¯ Kuroko2 ã hako ã使ã£ã¦ãã¾ããå®è¡ã¯åºæ¬çã«æ次ã§ããå¦ç¿æéãçãã®ã§ãæ¥æ¬¡ã§å®è¡ããã¨ããã§ãç¹ã«åé¡ã¯ããã¾ããããã ãåæ£è¡¨ç¾ã¯ãããªã«å¤ãããªãã ããã¨æãã®ã§ãæ次ã¨ãã¦ãã¾ãããããããã年次ã§ãããã®ããããã¾ããã
ä»å¾ã®èª²é¡
æå¾ã«ãä»å¾ã®èª²é¡ãä¸ã¤ã»ã©æãã¦ããã¾ãã
ä¸ã¤ç®ã¯ããfastText ã®å¦ç¿ãã§ã触ããããã«ããã©ã¡ã¼ã¿ã®ãã¥ã¼ãã³ã°ã§ããå¦ç¿ã¢ã«ã´ãªãºã ãå¦ç¿ãã¼ã¿ãå¦ç¿çããã¯ãã«ã®æ¬¡å æ°ããµãã¯ã¼ãã®ã¬ã³ã¸ãªã©ããã¥ã¼ãã³ã°ã®ä½å°ã¯ããããããã¾ãããã®è¾ºãã¯è °ãæ®ãã¦åãçµãã§ããããã§ãã
äºã¤ç®ã¯åæ£è¡¨ç¾ã®è©ä¾¡ã§ããä¸ã¤ç®ã®è©±ã¨ãé¢é£ããã®ã§ãããã©ã®ãããªåæ£è¡¨ç¾ããããã¯èªæã§ã¯ããã¾ãããåºæ¬çã«ã¯ãå¾æ®µã®ã¿ã¹ã¯ã«ãããè©ä¾¡ææ¨ãæé©åããåæ£è¡¨ç¾ãããæ°ããã¾ãããã ãå¾æ®µã®ã¿ã¹ã¯ã«ãããããããã®ã§ãæ©ã¾ããã¨ããã§ãã
ä¸ã¤ç®ã¯ä»£æ¿ã¢ãã«ã®èª¿æ»ã§ããããªã fastText ãªã®ãï¼ãã§ã触ããããã«ãæ¬çªã§ã®éç¨ã¾ã§èããã¨ãBERT ã®ãããªã¢ãã«ã fastText ããæããã«ããã¨ã¯è¨ãã¾ãããä¸æ¹ããã®æ¥çã®çºå±ã¯éããæ§ã ãªæ¸å¿µãææããã¢ãã«ãææ¥ã«ãçºè¡¨ãããããããã¾ãããæ¥çã®ååã«ã¯å¸¸ã«ã¢ã³ãããå¼µã£ã¦ããããã§ãã
ãããã«
ããããã°ãã¤ãæè¿ãå°±æ¥å½¢ã¤ã³ã¿ã¼ã³ã·ããã«ãæ©æ¢°å¦ç¿ã³ã¼ã¹ããéè¨ãã¾ãããä¸ã§æãã課é¡ã¯ãã¡ãããã¯ãã¯ãããã«ãããæ©æ¢°å¦ç¿ã«èå³ãããæ¹ã¯æ¯éãå¿åãã ããã
- å°±æ¥å½¢ã¤ã³ã¿ã¼ã³ã·ããï¼æ©æ¢°å¦ç¿ã³ã¼ã¹ï¼
ä¸éæ¡ç¨ã®ãå¿åããå¾ ã¡ãã¦ããã¾ãã
- æ©æ¢°å¦ç¿ã¨ã³ã¸ãã¢ï¼ç 究éçºï¼