ã¡ã³ãã©ã¡ããã¨å¦ã¶ãã£ã¼ãã©ã¼ãã³ã°ææ°è«æ
ã¡ã³ãã©ã¡ããããã£ã¼ãã©ã¼ãã³ã°ã®ææ°è«æãããªããªèªã£ã¦ãããã·ãªã¼ãºã§ãï¼Twitterã«æ稿ããã¹ã©ã¤ããã¾ã¨ãã¾ããï¼
ãµã ãç»å
ã¹ã©ã¤ãå ã®ããã¹ãæ½åºï¼æ¤ç´¢ã¨ã³ã¸ã³ç¨ï¼
ã¡ã³ãã©ã¡ããã¨å¦ãµã ããã£ã¼ããã©ã¼ãã³ã¯ãææ°è«æ 製ä½: Ryobot
ã¯ãããã« ä½è ⢠Ryobot (ããã»ãã£ã¨) ⢠NAIST修士2å¹´.RIKEN AIPå¤å (2017/7~) ⢠ãã£ããããããã®åæ§ã¨å¤æ§æ§ã®ç 究ããã¦ãã¾ã ⢠Twitter@_Ryobot ã¦ããæ°ã«å ¥ãè«æãç´¹ä»ãã¦ãã¾ã ã¹ã©ã¤ããã®æ¦è¦ ⢠ã¡ã³ãã©ã¡ããããææ°è«æããããªãããªèªã£ã¦ããã¾ã ⢠åéã¯ä¸»ã«èªç¶è¨èªå¦ç (æ©æ¢°ç¿»è¨³ã¨è¨èªç解) ã¦ãã ⢠Twitter ã¦ãæ稿ããã¹ã©ã¤ããã®ã¾ã¨ãã¦ãã ã¡ã³ãã©ã¡ãã ⢠ã·ãã§ã¤ãããæ§å¶ä½ã®LINEã¹ã¿ã³ããã¦ãã ⢠ä½è æ§ããããªã¼ç´ æç»åãé å¸ãã¦ãã¾ã ⢠æ°ã«å ¥ã£ãæ¹ã¯LINEã¹ã¿ã³ãããè²·ãã¾ããã
Another Diversity-Promoting Objective Function for Neural Dialogue Generation [arXiv: 1811.08100] ⢠主èè«æããAIç³»ã®ããããå½éä¼è° AAAI 2019 ã® DEEP-DIAL workshop ã« ãªã¼ã©ã«çºè¡¨ã¦ãæ¡æããã¾ãã ⢠ã³ã¼ããã¹å ã®ãã¼ã¯ã³é »åº¦ã«åºã¤ããç®çé¢æ° (Inverse Token Frequency loss) ãç¨ãã¦ãã¥ã¼ã©ã«å¯¾è©±çæã«ãããå¿çã®å¤æ§æ§ãä¿é²ãã¾ãã â» SRCãã人éã®çºè¨ï¼TGTãã人éã®å¿çï¼MLEã¨MMIããæ¢åææ³ã®å¿çï¼ITFããææ¡ææ³ã®å¿ç
ã¯ãããã¾ãã¦! TAã®ã¡ã³ãã©ã¡ãããã¦ãã£! ãã®ã¯ã©ã¹ã¦ãã¯ããã£ã¼ããã©ã¼ãã³ã¯ãã®ææ°è«æã ãã¡ã¨ä¸ç·ã«èªã¿ãªãããåå¼·ãã¦ããã¦ã㣠ããã²è«æã¨ã³ã¼ãã¼ãçæã«æ¥½ããã¦ããã£ã¦ãª! æ©éããã¨ãåºç¤ã®å¾©ç¿ããå§ãã¦ãã㪠ã¾ããã¯è¨èªã¢ããã« (Language Model) 1~k-1åèªç®ãå ¥åã㦠kåèªç®ãäºæ¸¬ããåºæ¬ã¿ã¹ã¯ã㪠RNNã¯ã·ã³ããã«! å ¥å表ç¾ãç´æ¥ æ½å¨è¡¨ç¾ã«è¿½å ãã LSTMã¯è¤é! 3ã¤ã®ã±ãã¼ãã¦ãå ¥å表ç¾ã æ½å¨è¡¨ç¾ãå¶å¾¡ããã¦ãã£! â» ããã®ã¡ã³ãã©ã¡ããã¯äºæ¬¡åµä½è¨å®ãã¦ãã£! å®éã¯TAã¡ãããé¢è¥¿å¼ã¡ãããããª~âª
æ©æ¢°ç¿»è¨³ã®ããã«ã½ã¼ã¹æãã¿ã¼ã±ãããæã«å¤æããã¢ããã«ã ç³»åå¤æã¢ããã« (Sequence-to-Sequence Model) ã£ã¦ããã¦ãã£! ã¨ã³ã³ã¼ã¿ãã¯åèªåãæ½å¨è¡¨ç¾ (ããã¯ãã«) ã«ç¸®ç´ã㦠ããã³ã¼ã¿ãã¯æ½å¨è¡¨ç¾ããã¨ã«åèªåãäºæ¸¬ãã¦ãããã...ð¤ 2016å¹´ã¾ã¦ãã¯LSTMã使ãã®ãã主æµãã£ããã¨ã æè¿ã®æ©æ¢°ç¿»è¨³ã¯Transformerã«é§éããã¡ãã£ããªã£ åæ¹åLSTMã¦ã ã¨ã³ã³ã¼ãããã㨠æ§è½ããé«ãã¦ã㣠Residual æ¥ç¶ã åãå ¥ãã¦æ·±å±¤ã« ããã¨ããã¦ãã£!
注æ (Attention) ã¯... æ¤ç´¢ã¯ã¨ãª (ããã³ã¼ã¿ãã®åºå) ã¦ã æèæ å ± (ã¨ã³ã³ã¼ã¿ãã®åºå) ã æ¤ç´¢ãã¦ããããª......ðªð¤ Attention = softmax(query Key ) Value ããã¯åã¹ããããã¦ã... æ¤ç´¢ã¯ã¨ãª (query) ã«ãã... æèæ å ± (Key-Valueè¾æ¸) ã®ç´¢å¼ã¨è§£éã¦ãã......ðªð¤ ã½ã¼ã¹æã¨ã¿ã¼ã±ãããæã® alignmentãå¯è¦åã¦ããããã ã¢ããã«ã®è§£éæ§ããä¸ãã...ð¤
(1)èªç¶è¨èªå¦çã«ãããCNN ç»åã®åããã¯ã»ã«ã®ããã«è¿åå士ã®ç¸é¢ããå¼·ãããã¼ã¿ ã«å¯¾ãã¦ã¯ç³ã¿è¾¼ã¿ã«ããç¹å¾´æ½åºããæå¹ããããª~ ããã¹ãã[åèªæ° x åãè¾¼ã¿æ¬¡å æ°]ã®è¡åã¦ã表ç¾ã㦠CNNã«çªã£è¾¼ããã¦ãã£! æ©æ¢°ç¿»è¨³ã¦ãã®æ§è½ã¯ Bi-LSTM < Convolution < Self-Attention ããªã£ Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction [Maha Elbayad, arXiv: 1808.03867]
Encoder-Decoderã使ããã Decoderãããã¦ãæ©æ¢°ç¿»è¨³ãã ã¦ãããããã£! ã·ã³ããã«ããªã£! Decoderã¯CNNã¦ã ç»åèªèã«ä½¿ããã¨ã DenseNetãªãã DenseNetã¯ãããã¯ã¼ã¯é ç·è¸ã®å§ç¥ãããã¦ãã£
ããã¤ã³ãã¯å ¥åã®ãã³ã½ã«ããªã£ ã½ã¼ã¹æã®åãã¼ã¯ã³ã¨ã¿ã¼ã±ãããæã®åãã¼ã¯ã³ã concat (é£çµ) ãã¦DenseNetã«å ¥åãã¦ã concatã«å¯¾ããå¦çã¦ããã¼ã¯ã³éã®é¢ä¿æ§ã å¾ãã®ã¯Relation Networkã«ä¼¼ã¦ããª~ ããããå ¥åãã³ã½ã«ã¯[source x target x embedding*2] ããã[height x width x channel]ã¨ãã¦å¦çãã¦ããã 翻訳ã¢ããã«ã®å¦ç¿ã«ã¯ Teacher Forcing (ã¿ã¼ã±ãããæãããã³ã¼ã¿ãã«å ¥å) ãä¸è¬çã«ä½¿ãã¦ã㣠ããããCNNã®å ´å Casual Convolution(å ¥åãã¹ãã©ã㦠æªæ¥ã®åèªãåç §ããªãç³ã¿è¾¼ã¿)ã«ããªãããã¦ãã£! ããã¯WaveNetã¨ãConvS2Sã¦ãã使ããã¨ã£ããªã£
IWSLTâ14ã¦ãSOTAããã㦠WMTâ14ã¦ããå®é¨ãã¦ã»ããã£ããª~ çãæãã¨ææ¡ææ³ã¨ãRNNsearchã®æ¹ããå¼·ã㦠é·ãæãã¨ConvS2Sã¨ãTransformerããã®æ¹ãã å¼·ãã£ã¦å®é¨çµæããããããªã£
(2)Transformer ã®ä»çµã¿ Attentionã¯ã¨ã³ã³ã¼ã¿ãã®åºåãKeyã¨Valueã«ä½¿ã£ã¦ ããã³ã¼ã¿ãã®åºåãqueryã«ä½¿ããã¨ã Self-Attentionã¯ã¨ã³ã³ã¼ã¿ãã®åºåãqueryã«ã使ããã㣠Maximum Likelihood Estimator ã¤ã¾ãqueryã¯èªåèªèº«ã«alignmentãè¨ç®ãã¦ã! ä¸å±¤ã®åèªç³»åå ¨åãåç §ã¦ãããã®ããå¼·ã¿ããª~ Transformerã®ãã¢ã¯ã¨ã³ã³ã¼ã¿ãã¨ããã³ã¼ã¿ãã« Self-Attentionã®å±¤ãå ããç¹ãã¦ãã£! Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures [Gongbo Tang, EMNLP 2018, arXiv: 1808.08946]
ã¨ããã¦ã(åçª) NIPS ã®åå ãã±ãã 11åã¦ãå®å£²ããããããã¨ã ã¿ããªã¯å ¥æããããª~? ãããããã©ã³ã¹ãã©ã¼ãã¼ãã~? ä»æ¥ã¯ãã¡ãã ãã©ã³ã¹ãã©ã¼ãã¼ â ã®ã¤ãã¤ãã ã¼ããã 解æãããã¦ã... ã£ã¦ãã£ã¡ã¡ãããã£! ãã£ã¡ãª â
æ©æ¢°ç¿»è¨³ãã¨RNNããCNNã¨Self-Attentionããå¼·ããã¨ãçç±ããã£ã¦ã¸ããã CNNã¨Self-Attentionã¯ãå ¥åãã¼ã¯ã³éãçãããã¹ã¦ãæ¥ç¶â é·è·é¢ä¾åã®ã¢ãããªã³ã¯ãããæå©âã¤ãã¤ããã£ã¦ä»®èª¬ãããã¨ãã»ãã¾ã«~? ããã¹ã¯ 4 recurrent ããã¹ã¯ 2 convolution ããã¹ã¯ 1 attention ãã¼ã¯ã³ x1 㨠x5 ã®æ¥ç¶ã«å¿ è¦ãªããã¹æ°ãã¦ã WMTâ14, â17 è±ç¬ã¦ãæ®éã®æ©æ¢°ç¿»è¨³ãã£ãã¨ãã® BLEU, Perplexity ã¨é·è·é¢ä¾åã¿ã¹ã¯ã®ç²¾åº¦ãã¦ã
2 ã¤ã®ã¿ã¹ã¯ã¦ã RNN, CNN, Transformer (Self-Attention) ã®å®åã«è¿«ãã¦ãã£! (1) ã¾ããã¯ä¸»èª-åè©éã®ãé·è·é¢ä¾åãã¿ã¹ã¯ ããã¤ãèªãã¨åè©ã¯ä¸»èªã®äººç©ã¨æ°ã«ãã£ã¦ èªå°¾ããå¤åãããããã®ç²¾åº¦ãæ¯ã¸ãããã ä¸å³ã㨠Transformer ã¨åæ¹å LSTM ããããåè² ããã¨ã CNN ã¯ããããª~ å·¦å³ã¯è¨ç·´ããã¼ã¿ã æ¸ãããçµæãª~ 精度似ã¨ã㪠BLEU 㨠PPL ãä¼¼ã¨ãã¦ã
Transformer 㯠Multi-head Attention ã®ããããæ°ã«ãã£ã¦é·è·é¢ä¾åã®ç²¾åº¦ãã å ¨ç¶ã¡ããã¦ã(h2 = 2ãããã) (2) æå³æ½åºããå¿ è¦ãªãèªç¾©ã®ææ§æ§è§£æ¶ (WSD)ãã¿ã¹ã¯ ç¬âè±ç¿»è¨³ãã¨æ§æã¦ãå¤æã¦ããã¸ãææ§èªãããããããã®ç²¾åº¦ãæ¯ã¸ããã¦ã㣠RNN, CNN ã«æ¯ã¸ã㦠Transformer ã¯ææ§æ§è§£æ¶ã®ç²¾åº¦ããããã¯ãæãã¦é«ããª~ ãã¤ãã CNN㨠Transformer ã¯é·è·é¢ä¾åã¿ã¹ã¯ã㨠RNN ããåªãã¨ãã¨ã¯éããã¦ã㣠Transformer ã¯ææ§æ§è§£æ¶ããã¤ãã¤ã!!!
(3)AIç³»ã®ããããã«ã³ãã¡ã¬ã³ã¹ ä¼è°å æ¦è¦ éå¬ æ©æ¢°å¦ç¿ã»æ·±å±¤å¦ç¿ NIPS è¿å¹´ã¯DLä¸å¿ï¼æ稿æ°ã¯4900 12æ ICML å®é¨éè¦ããã£ãããè¿å¹´ã¯çè«éè¦ã« 6æ ICLR DLå°éï¼OpenReviewã¦ãæ»èªï¼æ稿æ°ã¯1600 4æ 人工ç¥è½ AAAI AIå ¨è¬ï¼æ稿æ°ã¯7700ï¼æ¡æç16% 1æ IJCAI AIå ¨è¬ï¼2015å¹´ã¾ã¦ãéå¹´éå¬ï¼AAAIã¨åæ ¼ 7æ èªç¶è¨èªå¦ç ACL NLPåéã®ããããä¼è°ï¼æ稿æ°ã¯1500 7æ EMNLP è¿å¹´ã¯ACLã¨åæ ¼ï¼æ稿æ°ã¯2100 11æ ã³ã³ããã¥ã¼ã¿ããã·ãã§ã³ CVPR CVåéã®ããããä¼è°ï¼æ稿æ°ã¯5100 6æ ECCV ICCVã¨äº¤äºã«éå¹´éå¬ï¼CVPRã¨åæ ¼ 9æ æ°è¨ã®ICLRãé¤ãï¼ã¨ãã®ä¼è°ãæ¡æçã¯20%åå¾.è¿å¹´æ稿æ°ããçå¢ ãã¦ããï¼2050å¹´ã®AAAIã®æ稿æ°ã¯äººé¡ã®ç·äººå£ãè¶ ããè¦è¾¼ã¿ ICLR 2019 (Seventh International Conference on Learning Representations)
é±åããã£ã¼ããã©ã¼ãã³ã¯ãã®æéãã¦ãã£! å é±æ·±å±¤å¦ç¿ã®ããããä¼è°ICLR'19ã®æ稿ç´ããå ¬éãã å¼·åãªçµæãåºããè«æããç¶ã ã¨è©±é¡ã«ãªã£ã¦ããã æ©æ¢°ç¿»è¨³ãããªããã¤ã¬ããã«ãªæ¦ãã¦ã åæã«6ç´ããSOTAãå ±åããäºæ ã«...!? WMTâ14 En-De ã® BLEU ã¹ã³ã¢ãª ãããã«æ¼«ç»ããª? â Dual Learning, Hyperbolic, Attention ã£ã¦åèªããç®ç«ã¤ãªã£ ä¾ãã¯ã #BIGGAN ã¦ãæ¤ç´¢ãã㨠GANã¦ãçæããããã¡ã©ç©ä½ããè¦ã¤ããã¦ã
rank | paper | BLEU |
---|---|---|
1 | Multi-Agent Dual Learning | 30.67 |
2 | Dual Learning: Theoretical... | 29.97 |
3 | Pay Less Attention... | 29.7 |
4 | Quasi-hyperbolic ... | 29.45 |
5 | Universal Transformers | 28.9 |
6 | Hyperbolic Attention... | 28.52 |
- | (Previous SOTA) | 28.4 |
æ©æ¢°ç¿»è¨³ç³»ã®13ç´ã®ãã¡11ç´ããTransformerã æ¡ç¨ãã¦ã¦æ³å以ä¸ã®ããã£ãã¡ã¯ãã¹ã¿ã³ã¿ãã¼ããã«ð³ RNN, CNNã¯æ®ãã®2ç´ã®ã¿ æ稿ç´å ¨ä½ã¦ããRNNã¯èããä¸ç«ã«...â ãã¼ã¯ã¼ããã®åºç¾é »åº¦ã®ä¸ä¸(åå¹´æ¯) ã¾ãã6ä½!æ´»æ§åé¢æ°ã¨æ³¨ææ©æ§ã« å¤å¨ã»ã¨ã空éããåºãåæ²ç©ºéã使ãã¦ã㣠左ããã¦ã¼ã¯ãªããã空éï¼å³ããåæ²ç©ºéã®åãè¾¼ã¿ãª 4ä½!ã¢ã¼ã¡ã³ã¿ã SGDã®ä»£ããã« æºåæ²ã¢ã¼ã¡ã³ã¿ã SGDã使ãã¦ã㣠ã¤ã¾ãSGDã¨ã¢ã¼ã¡ã³ã¿ã SGDã®å éå¹³åããª
Transformer ã®æ¬¡ã¯ Dynamic Convolution! BLEUã¯é©ç°ã®29.7ã¦ã3ä½ (Trans~ã¯28.0) ã¾ããDepthwiseConvã¦ãããã©ã¡ã¼ã¿æ°ãåæ¸ã㦠time-stepä¾åã®åçã«ã¼ãã« ãå°å ¥ããã¦ãã£! 1ä½ã¨2ä½ã¯ Dual Learning! Dual Learningã£ã¦ ãã¡ããµããã¦ã ååããã¤ã¡ã¼ã·ã?ããã¦ãé å¼µããªã£ GLUã®åºåãã CNNã®ã«ã¼ãã«ã® éã¿ãæ±ãããã㣠ããããçºæ³ã£! è±èªâæ¥æ¬èªâè±èªã¦ãå ã«æ»ãããã«å¦ç¿ããã¦ã㣠対訳ã³ã¼ããã¹ããããã¦åè¨èªã³ã¼ããã¹ã¦ãå¦ç¿ã¦ããã ã¡ãªã¿ã«SOTAå¢ä»¥å¤ã¯ zero-shotå¦ç¿ï¼å¼±æ師ããå¦ç¿ éèªå·±å帰ã¢ããã«ï¼å¤æ§æ§ä¿é²ãã 1件ããã¤ãã£ãã¦ãã£!
Multi-Agent (=Multi-Path) Dual Learning ã®æ代å°æ¥! 対訳ã³ã¼ããã¹ + åè¨èªã³ã¼ããã¹ã®é翻訳 = ã¤ã ããã«~ å¤è¨èªéã¦ããã«ãããã¹ç¿»è¨³ + ãµã¤ã¯ã«ç¿»è¨³ = ã¤ãã¤ã f0 2è¨èª (X, Y) éã¦ãè¤æ°ã®ç¿»è¨³ã¨ã¼ã·ãã§ã³ããã Dual Learningã®åæ§æã¨ã©ã¼ãéãã Multi-Agent Dual Learning ããSOTAã«åè¨! new SOTA 㨠previous SOTA ã® BLEU ãªã£ WMT ãã¨åè¨èªã³ã¼ããã¹ãªãã¦ãã 29.92! Dual Learningã¤ãã£! ãã£ã¯ããã¡ã ãâ¡
(4)é翻訳ã«ããããã¼ã¿æ¡å¼µ æ©æ¢°ç¿»è¨³ã¯å¦ç¿ããã¼ã¿ã®è³ªã¨éãããã£ã¡ãéè¦ãããã£! é翻訳ã¢ããã«ã¦ãåè¨èªã³ã¼ããã¹ãæ¬ä¼¼å¯¾è¨³ã³ã¼ããã¹ã« ããã¼ã¿æ¡å¼µãã¦ï¼æ°´å¢ããå¦ç¿ããã¼ã¿ã¦ãé 翻訳ã¢ããã«ã è¨ç·´ãããã¨ã¦ãé«ãæ§è½ããå¾ãããã¦ãã£! Understanding Back-Translation at Scale [Sergey Edunov, EMNLP 2018, arXiv: 1808.09381]
ããããã£ã¡ã... æ©æ¢°ç¿»è¨³ã®ããã³ããã¼ã¯ evaluation: BLEU score using multi-bleu.perl on WMT newstest2014 En-De dataset: WMT parallel data (including ParaCrawl) and monolingual News Crawl
é翻訳ã¯æ©æ¢°ç¿»è¨³ã®ããã¼ã¿æ¡å¼µã«ä½¿ãããã㣠ã¿ã¼ã±ãããè¨èªã«åè¨èªã³ã¼ããã¹ãç¨æã å¦ç¿æ¸ã¿ã®é翻訳ã¢ããã«ã¦ã ã½ã¼ã¹æãçæããã¯ã æ¬ä¼¼çãªäºè¨èªã³ã¼ããã¹ã«ãªããã æ°´å¢ãããããã¼ã¿ã¦ãé 翻訳ã¢ããã«ãå¦ç¿ã¦ããã! ããããé翻訳ã¢ããã«ã®ããã³ã¼ããã£ã³ã¯ãæã«greedyæ¢ç´¢ãbeamæ¢ç´¢ã®ãã㪠ã¢ããã«åå¸ã®é ç¹ã®ã¿åºåããMAPæ¨å®ã使ãã¨çã®ããã¼ã¿åå¸ãé©åã« ã«ããã¼ããªãè¦åçãªæ¬ä¼¼ã½ã¼ã¹æã«ãªã£ã¦ãã¾ã MAP (Maximum A-Posteriori) æ¨å® ããã¦ãã¢ããã«åå¸ãã確ççã«ãµã³ãããªã³ã¯ãããã beamæ¢ç´¢ã«ãã¤ã¹ããä»å ãã¦ã¿ãããããããããã æ§è½ããçä¸ãããããããåå ã調ã¸ããããã¦ãã£
é 翻訳ã¢ããã«ã®ããã¼ããã¬ãã·ãã£ãæ¯ã¸ãã㨠é翻訳ã¢ããã«ã¸ã®ãµã³ãããªã³ã¯ãããã¤ã¹ãä»å 㯠MAPæ¨å®ããäºæ¸¬å°é£ãªã½ã¼ã¹æãçæã㦠é 翻訳ã¢ããã«ã®å¦ç¿ãé£ããããã¦ãã£! åé¢ï¼è±å¯ãªè¨ç·´ä¿¡å·ãæä¾ãããããªã£ å¯æãåã«ã¯æ ããããã£ã¦ãã¨ãã£! ä½è³æº (80K対訳æ) ç°å¢ã¦ã㯠ããããé翻訳ã¢ããã«ãã貧弱 (=äºæ¸¬å°é£ãªã½ã¼ã¹æ) ããã ãµã³ãããªã³ã¯ãã¯éå¹æã«ãªã æ¬ç©ã®å¯¾è¨³æ (bitext) 㨠åè¨èªã³ã¼ããã¹ã®é翻訳 (BT-news) ã¯ããªãè¿ãæ§è½ã«ãªãã¦ã ã¡ãªã¿ã«ããã¡ã¤ã³å¤åã«ã¯èå¼±ããª
大è¦æ¨¡ã»é«å質ãªå¯¾è¨³æã¦ãè¨ç·´ãã DeepLã®33.3ãããé«ã35.0ãéæ! WMTã¦ãã®åSOTAã¯29.2 (ParaCrawlããã¦ã29.8) Transformerã¯28.4 å»å¹´ã¯Transformerãªã¨ãã®ã¢ããã«è¨è¨ããç²¾æ»ãã ä»å¹´ã¯æ師ãªãæ©æ¢°ç¿»è¨³ãé翻訳ãªã¨ãã®å¦ç¿æ¹æ³ ã«ç 究ããã·ãããã¦ææããããã¦ãå°è±¡ããª~
(5)æ師ãªãå¦ç¿ã«ããæ©æ¢°ç¿»è¨³ 対訳ã³ã¼ããã¹ãªãã¦ãå¦ç¿ããæ師ãªãæ©æ¢°ç¿»è¨³ãã ç»å ´ãã¦å»å¹´æ«ããçãä¸ããã£ã¦ãã¨ãã¦ãã£! é翻訳ã¨ãã¤ã¹ãé¤å»ã åæã«å¦ç¿ã㦠翻訳ã¢ããã«ã ç²å¾ãããããª~ Machine Translation With Weakly Paired Bilingual Documents [Anonymous, ICLR 2019 id: ryza73R9tQ]
Wikipediaã¿ãããªå¤è¨èªWebãµã¤ãã® ææ¸ããã¢(é対訳æ)ãæ©æ¢°ç¿»è¨³ã®å¦ç¿ããã¼ã¿ã« 使ããããªã£~ã£ã¦ç 究ãã¦ã! å¤ãã®ãã¤ãã¼è¨èªé㯠対訳æãéããã®ãã大å¤ã£ð» 対訳æããããã®æ師ãªãå¦ç¿ã¯ æ師ããå¦ç¿ã«æ¯ã¸ããã ã¾ããæ§è½ããã¤ãã¤ãããããª~ð
主ã«2ã¤ã®å¼±ãæ師信å·ãã 翻訳ã¢ããã«ãå¦ç¿ããã¦ãã£! 1) ææ¸ããã¢ããå¼±ã対訳æãè¦ã¤ãã ææ¸ããã¢ã®åæãæ表ç¾ã«å¤æãã¦ã³ãµã¤ã³é¡ä¼¼åº¦ãã é¾å¤ããé«ã2æãå¼±ã対訳æã¨ãã¦å¦ç¿ããã¦ãã£! æ表ç¾ã¯MUSEã¦ãäºåå¦ç¿ããåèªã®åãè¾¼ã¿è¡¨ç¾ã åèªã®åºç¾é »åº¦ã«åºã¤ããã¦å éå¹³åãããã¤ãª~ 2) ææ¸ããã¢ã®ããããã¯åå¸ãè¿ä¼¼ãã ææ¸ããã¢éã®ããããã¯åå¸(âå ¨åèªåå¸ã®ç·å)ã¯åãã ã¨ä»®å®ããä¸æ¹ã®ææ¸ã翻訳ãã¦å¾ãåèªç¢ºçåå¸ã®ç·åã ã¨ã対ã®ææ¸ã®åèªåå¸ã®ç·åãéã®KLæ å ±éãªãã Wassersteinè·é¢ãç®çé¢æ°Ldã¨ãã¦æå°åããã¦ãã£! æå¾ã«åè¨èªãã¥ã¼ã¹è¨äºã¦ããæ師ãªãæ©æ¢°ç¿»è¨³ãããã¦ãã£! Ldaeã¯ãããã¤ã·ãã³ã¯ããªã¼ãã¨ã³ã³ã¼ã¿ãæ失ï¼Lrecã¯åæ§ç¯æ失 âæ師ãªãæ©æ¢°ç¿»è¨³ããEMNLPã®ããã¹ãããã¼ããã¼ã«!
ææ¡ææ³ã¯ã¡ããã¨ãæ師ãªãå¦ç¿ã¨æ師ããå¦ç¿ã®ä¸éãããã®æ§è½ããªã£! å³è¡¨ : æ§è½ã¯Lpã®è²¢ç®ãã大ãããª~ ä¸å³ : å¼±ã対訳æ(ææ¸ããã¢ã®ãã¡ ã³ãµã¤ã³é¡ä¼¼åº¦ããé«ã2æ)ã®ä¾ãªã£ 太åã®åèªã¯æ¬æ¥ 対訳æã«å«ã¾ãã¡ã ã¿ãã¡ãªåèªãã¦ãã£! å¦ç¿ã¸ã®æªå½±é¿ãã ã¨ãã®ç¨åº¦ãç¥ããããªã£
(6)è¨èªã¢ããã«ã®äºåå¦ç¿ ELMoããè¤æ°ã®ããã³ããã¼ã¯ã¦ãæé«æ§è½ãéæããã¦ãã£! 大è¦æ¨¡è¨èªã¢ããã«ã¦ãã¨ã³ã³ã¼ã¿ããäºåå¦ç¿ãããã¨ã¦ã æ§ã ãªNLPã¿ã¹ã¯ã¦ãæ§è½ããåä¸ããããããªã£ Source: Matthew Peters Semi-Supervised Sequence Modeling with Cross-View Training [Kevin Clark, EMNLP 2018, arXiv: 1809.08370]
æè¿NLPçéã¦ãpre-trainingã¨ãtransfer learningã¨ã multi-task learningããæ¿ã¢ãããã㣠ELMoãå¥æ©ã« ããã¼ã ã«ãªã£ã¦ããª~ ç¹ã«EMLoãã大è¦æ¨¡è¨èªã¢ããã«ãpre-trainingã㦠6ã¤ã®NLPã¿ã¹ã¯ã¦ãSOTAãéæããè¦ãã¨ãã£? EMLoã¯å¤å±¤åæ¹åLSTMã®ã¨ã³ã³ã¼ã¿ããäºåå¦ç¿ã㦠ãã®æ½å¨è¡¨ç¾ãããããªNLPã¿ã¹ã¯ã®åãè¾¼ã¿è¡¨ç¾ã«ä½¿ã£ãã¦ãã£
ELMoã¯å¤å±¤åæ¹åLSTMã¦ãè¨èªã¢ããã«ãè¨ç·´ãããã? ã¦ããå®éã®pre-trainingã㨠ãã©ã¯ã¼ããè¨èªã¢ããã«ã¨ ãããã¯ã¯ã¼ããè¨èªã¢ããã«ã åå¥ã«è¨ç·´ãã¦ãããã£! å¤å±¤LSTMã®ä¸é層㯠ãã©ã¯ã¼ããã®æ½å¨è¡¨ç¾ã¨ ãããã¯ã¯ã¼ããã®æ½å¨è¡¨ç¾ã concat (é£çµ) ã¦ããã¸ããããª~ concatããã äºæ¸¬ãã¸ãã åèªã®æ å ±ãã ãªã¼ã¯ãã¦ã¾ã ãããªã£â« ELMoã¯ã¢ããã«ã®æ§é å´ããå¶éããã¦ã¦ åå¾ã®æèãèæ ®ã¦ããã²ããã èè ãã¯å ¥åããã¼ã¿å´ãå¶éã㦠ãã«ã¢ããã«ã®æ¬æ¥ã®åãçºæ®ãããã¦ãã£!
æ師ããå¦ç¿ã㨠å®å ¨ãªå ¥åæãåãåã â主è¦ã¢ã·ãã¥ã¼ã«âãæ®éã«è¨ç·´ã㦠æ師ãªãå¦ç¿ã㨠å®å ¨ãªå ¥åæãåãåã â主è¦ã¢ã·ãã¥ã¼ã«âã®åºåãã©ããã«ã« ä¸é¨ããæ¬ æããå ¥åæãåãåã âè£å©ã¢ã·ãã¥ã¼ã«âãè¨ç·´ããã¦ã㣠åæ師ããå¦ç¿ã«ãã£ã¦éã¿å ±æãã å¤å±¤åæ¹åLSTMã®æ½å¨è¡¨ç¾ã®è³ªãã æ¹åãããããã£ã¦~ã£! ä»ã®ã¿ã¹ã¯ã®äºæ¸¬ã¢ã·ãã¥ã¼ã«ã追å ããã ç°¡åã«ãã«ãã¿ã¹ã¯è¨ç·´ããã¦ãããã¦ãã£ å ¨ã¦ã®ã¿ã¹ã¯ã¦ãã©ããã«ãªãããã¼ã¿ããæ¬ä¼¼çã©ããã« ããããã¼ã¿ã人工çã«ä½æã¦ãããããå©ç¹ããªã£â¡ pθããæ¬ä¼¼ç㪠æ師信å·ãªã£ ãã¨ã¦ãç´¹ä»ããBERTã¦ã㯠ä¸é¨ããæ¬ æããå ¥åæã¨ æ¬ æé¨åã®äºæ¸¬ã¿ã¹ã¯ã¦ã äºåå¦ç¿ã¢ããã«ãè¨ç·´ãã¦ã ããã¯è³ªçå¿çç 究㮠cloze task (ç©´åãåé¡) ã¨åããã¦ããã¹ã¯è¨èªã¢ããã« (Masked Language Model) ã¨å®ç¾©ãã¦ãã¦ãã£!
ã»ã¨ãã¨ãã®ã¿ã¹ã¯ã¦ãSOTA~! è¨ç·´ã»ãããæ¸ãã㨠ããã«å¨åãçºæ®ããã¦ãã£!
(7)èªç¶è¨èªå¦çã®ã¿ã¹ã¯ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding [Jacob Devlin, arXiv: 1810.04805]
ã¿ã¹ã¯ | æ¦è¦ | åSOTA | BERT |
---|---|---|---|
GLUE | 8種ã®è¨èªç解ã¿ã¹ã¯ | 75.2 | 81.9 |
1. MNLI | 2å ¥åæã®å«æ/çç¾/ä¸ç«ãå¤å® | 82.1 | 86.7 |
2. QQP | 2質åæãæå³çã«ç価ãå¤å® | 70.3 | 72.1 |
3. QNLI | SQuADã®æ¹å¤ï¼é³è¿°æã質åæã®è§£çãå«ããå¤å® | 88.1 | 91.1 |
4. SST-2 | æ ç»ã¬ãã¥ã¼ã®å ¥åæã®ãã¬ãã¸ãå¤å® | 91.3 | 94.9 |
5. CoLA | å ¥åæãè¨èªçã«æ£ãããå¤å® | 45.4 | 60.5 |
6. STS-B | ãã¥ã¼ã¹è¦åºãã®2å ¥åæã®æå³çé¡ä¼¼æ§ãã¹ã³ã¢ä»ã | 80.0 | 86.5 |
7. MRPC | ãã¥ã¼ã¹è¨äºã®2å ¥åæã®æå³çç価æ§ãå¤å® | 82.3 | 89.3 |
8. RTE | 2å ¥åæã®å«æãå¤å® | 56.0 | 70.1 |
SQuAD | 質çå¿çã¿ã¹ã¯ï¼é³è¿°æãã質åæã®è§£çãæ½åº | 91.7 | 93.2 |
CoNLL | åºæ表ç¾æ½åºã¿ã¹ã¯ï¼åèªã«äººç©/çµç¹/ä½ç½®ã®ã¿ã°ä»ã | 92.6 | 92.8 |
SWAG | å ¥åæã«å¾ç¶ããæã4ã¤ã®åè£æããé¸æ | 59.2 | 86.3 |
GLUE(8種ã®NLPã¿ã¹ã¯ï¼ã¹ã³ã¢ã¯å¹³å) SQuAD(質çå¿çã¿ã¹ã¯) è¨èªã¢ããã«ãäºåå¦ç¿ãã ã¨ã³ã³ã¼ã¿ãã質çå¿çã ã¿ã¯ãä»ãçã®NLPã¿ã¹ã¯ã« 転移å¦ç¿ããç 究ãã¦ã㣠ãµãã£ã¡ãããã®SOTAãã! ãããªãåã¦ããã£!
åæ¹åLSTMãTransformerã¦ãè¨èªã¢ããã«(次ã®åèªãäºæ¸¬)ãå¦ç¿ããå ´å ã¢ããã«ããäºæ¸¬ãã¸ããåèª(æ師信å·)ãåç §ããã®ãé²ããå¶ç´ããå¿ è¦ããã㪠BERT㯠ãã«ã¢ããã«ã® Transformer OpenAI Transformerãªã æªæ¥ã®åèªä½ç½®ã®ãããã¯ã¼ã¯ã« ãã¹ã¯ããããªããã ELMoãªãé æ¹åLSTM㨠éæ¹åLSTMãåæã«å¦ç¿ã¦ããã¸ã ããããã¢ããã«èªèº«ã¯åå¾ã®æèãèæ ®ã¦ããã¸ã... ããã¦ãã¢ããã«å´ããªãã¦ããã¼ã¿å´ãå¶ç´ããã¦ãã£! ãããã¯ã¼ã¯ã«ãã¹ã¯ãããããå ¥åããã¼ã¿ã«ãã¹ã¯ãããããã ãã«ã¢ããã«(åå¾ã®æèãèæ ®ã¦ãããã¢ããã«)ã¦ãæ®´ãããããªã£! ãã¹ã¯åèªãäºæ¸¬ ããã¯ä»¥åç´¹ä»ããCross-View Trainingè«æã¨åããåé¡æèãã¦ãã£
å è¡ç 究ã¨ã®éã ⢠ãã«ã¢ããã«ã®Transformerãä½¿ç¨ â¢ æ¬æ¥6層8注æããããã®Transformerã24層16注æããããã«å·¨å¤§å! ⢠BooksCorpusã¨Wikipediaæ··åã®å¤§è¦æ¨¡ã³ã¼ããã¹ã使ç¨! ⢠ãã¹ã¯åèªã®äºæ¸¬ + 2å ¥åæããé£ç¶ãå¦ãã®äºæ¸¬ãäºåå¦ç¿ã«è¿½å ⢠å¦ç¿å¹çããæªãã? ã¢ããã«ã¸ã®å¶ç´ã¯å ¨åèªããæ師信å·ããå¾ããããã¨ãããã¼ã¿ã¸ã®å¶ç´ã¯ãã¹ã¯åèªãããã æ師信å·ããå¾ããã¸ã.å®è·µçã«ã¯åæããããé ããªããããã¦ãæ§è½ããä¸ãããã¹ããã¼ããã¯æ©ãã¦ã㣠⢠費ç¨ã¯7000ããã«ããã?(TPU16å° x 4æ¥é x 24 æé x 4.5ããã«) SQuADã®å ´åï¼è³ªåæã¨é³è¿°æã®ååèªã åèªã»åèªä½ç½®ã»æã®åºåã®ããã¯ãã«ã«åãè¾¼ãã¦ã㣠Transformerã®ãã¤ããããã©ã¡ã¼ã¿ãªã£! L: 層æ°ï¼H: é ã層ãµã¤ã¹ãï¼A: 注æã®ããããæ° ã¢ããã«ã®å·¨å¤§åããããªãæ§è½ã¸å¯ä¸ãã¨ããª~
ã¾ã¨ã æ©æ¢°ç¿»è¨³ 1. WMTã®ç«¶äºæ¿å WMTããã³ããã¼ã¯ããNLPçImageNetç¶æ ã« 2. Self-Attentionã®æå LSTMã®æ代ããçµããï¼Self-Attentionããã¼ã¹ã®Transformerããæ¨æºå 3. æ師ãªãæ©æ¢°ç¿»è¨³ã®ç»å ´ æ師ãªãå¦ç¿(対訳æãããã)ã¦ãã翻訳ã¢ããã«ã®å¦ç¿ããå¯è½ã« 4. é翻訳ã»äºéå¦ç¿ã®çºå± é翻訳ãäºéå¦ç¿(翻訳ã¨é翻訳ã®é£æº)ã¦ã大ããæ§è½ããåä¸ è¨èªç解 5. è¨èªã¢ããã«ã®äºåå¦ç¿ ELMoã¨OpenAI Transformerããè¤æ°ã®è¨èªç解ã¿ã¹ã¯ã¦ãSOTAéæ 6. ãã¹ã¯è¨èªã¢ããã« BERTã¨CVTããç©´åãã¿ã¹ã¯åã®è¨èªã¢ããã«ãäºåå¦ç¿ãã¦SOTAå¤§å¹ æ´æ° ç»åçæ (ãã¾ã) 7. 256pxâ1028pxã¸ã¨é«ç»è³ªå StackGANâStackGAN v2âProgressive Growing GAN 8. ImageNetç´ã®å¤æ§æ§ BigGANããImageNet (1000ã¯ã©ã¹) ã®é«ç»è³ªçæã«åãã¦æå 9. VAEã»FLOWããã¼ã¹ã®ç»åçæ IntroVAEã¨GLOWããGANã¬ããã«ã®é«ç»è³ªçæã«æå
ãããã« 100件以ä¸ã®ææ³ãã¤ã¼ãããããã¾ãã. ããããããããç´5å²ï¼å 容ã¸ã®ææ³ãå¤ãã£ãã¦ãã. å¬ããã£ããã¤ã¼ããè¼ãã¾ã.å§åçæè¬ã¦ããã£! ã女ã®åããã¨ã¯ããã¨èªã¿ãããã¨æããã.ããããåãããã¨ãããã ãããã¯ãããã.女ã®åã®çµµãããããããã¦ãï¼ã¨ã£ã¤ãããããã段éãã ãç¾å°å¥³ããå°éããããªãããªèªãYouTuberéè¦ããããå¢ããããããã¦ãè«æèªã¿VTuberãã£ã¦ãããããã ãæå¦ä¼ã®ã¢ããã¹ããã¿ããªãããªæããã«ãªãã¯ãããã®ã«(?)ã ãç 究ä¼ã¨ãã®ã¹ã©ã¤ããï¼ã¿ããªãããªæããã«ãããã!ãµã¤ã³ã¼ãã!ã ããããªæããã¦ãæ·±å¤å¸¯ã«ä»é±ã®æ©æ¢°å¦ç¿çéäºæ ~ãã¢ãã¡åãã¦ã»ããã ãããã£ã¼ããã©ã¼ãã³ã¯ãã®å 容ããã¤ã©ã¹ãä»ãã¦ãã¾ã¨ãããã¦ãã¦ããããããã ãRyobotåï¼è«æç´¹ä»è¸ã®ç 究ã«ãä½å¿µãããªãã®ã¹ã³ããã ãããã»ãã£ã¨ããï¼æå 端ã®è«æãæ¥æ¬èªã¦ãç´¹ä»ãã¦ãããããããããå©ãã£ã¦ãã ãããã»ãã£ã¨æ°ã¿ãã漫ç»ããã¯ã«è«æãææ³ãç´¹ä»ããã®æ§ããã«è¨ã£ã¦å¥½ããããããæµè¡ã£ã¦æ¬²ããã ãã¾ãããã®ã·ãªã¼ã¹ã好ãããå¯æãè«æç´¹ä»ããã®ããããã ãThis one is so cute! Can we have an English version, please?ã ãone of those times when I wish I knew how to read Japaneseã ãOh My Gosh! So... fky cute! I nearly doubt that Iâm reading a paper...ã ãHow dare you make this so cute!!!! LOLã