NLPã«é¢ããã¤ãã³ãã¨ãã¦ç®ã«å ¥ã£ã¦ããã®ã§åå ãã¦ã¿ã¾ããã
æ¥çé¢ä¿è ã§ãèªåããã¡ãã¨ç¥ã£ã¦ããåéã§ããªããç´ç²ã«åå¼·ç®çã§ä¸åå è ã¨ãã¦åå¼·ä¼ã«åå ããã®ã¯ä¹ ã ã§ããããé常ã«å 容ã®æ¿ãã¤ãã³ãã§é¢ç½ãã£ãã§ãã主å¬ããããã¤ã¯ãã½ããæ§ã¨ç»å£è ã®çæ§ããããã¨ããããã¾ãã
以ä¸ãèªåã®ç解ã®ç¯å²ã§æ¸ããã¾ã¨ããè¨ãã¦ããã¾ãã
Attention is all you need !!! ãå ¥éããã¾ãã«ï¼
(Microsoft å¾ä¸ç«ä¸ãã)
Transformerè«æ Attention is All You Need ãèªãããã®åæç¥èãç´¹ä»ããã»ãã·ã§ã³ã
Attention is All You Need ã®è§£èª¬è¨äºã¯æ¥æ¬èªã§ãããã¾ãã
- Attentionã¯ã注ç®ãããã¼ã¿ã«å¾ã£ã¦åºåããä»çµã¿
- ä¾: ç»åå¦ç
- èæ¯ãæ ã£ãç»åããã®ã¾ã¾å¦çããã¨èæ¯æ å ±ãåãè¾¼ãã§ãã¾ã
- Attentionã使ããã¨ã§äººã ãã«æ³¨ç®ãããã¨ãã§ãã
- 人éã¯ã人éã移ã£ãã¨ãåçã«ã¤ãã¦èæ¯ãç¡è¦ãããã¨ãèªç¶ã¨ã§ããããããã¨åã
- Convolution ãäºå±¤ã«åå²ãã¦çæ¹ã§sigmoidã§åºåãã¦æå¾ã«åå²ãåæµããããsigmoidã0ã«è¿ãå¤ã¯ãã®å¾ã®å¦çã«å½±é¿ãä¸ããªããªã
- SENet
- Convolutionã«ãã£ã¦è¤æ°ã®åºåãå¾ãã縦横ã®ã¨ãã¸ãè²ãæãããªã©
- æ®éã®CNNã§ã¯ãããå ¨ã¦ã®ç¹å¾´ãå¾ç¶ã®å±¤ã§ãå©ç¨ãã
- 人éã注ç®ããå ´åã¯ã©ãã«æ³¨ç®ãããã¯ã±ã¼ã¹ãã¤ã±ã¼ã¹
- SENetã§ã¯ç»åã®ç¹å¾´ããã¤ãããã¯ã«æ±ºå®ããã¢ãã³ã·ã§ã³ã«ãªã
- ç»åå ¨ä½ã«å¯¾ãã¦Avg. PoolingãåããConv(1,1)ã§ç¹å¾´ãã¨ãæå¾ã«Sigmoidãåã
- ãã®ãããªä»çµã¿ãåConv Unit ã§ä½¿ãã°ãå°ãæ¼ç®éå¢ããç¨åº¦ã§æ§è½ãä¸ãããã¨ã確èªã§ãã¦ãã
- è¨èªå¦çã«ãããAttention
- å ¸åçãªæç« åé¡åé¡
- ç¹å®ã®ä½ç½®ã«ããåèªã«æ³¨ç®ããããã«ãããã¯ã¼ã¯ãæ§æãããã¨ãã§ãã
- åèªã®ç¹å¾´éãå¾ããã¦ããã¨ããã¨ãããã«å¯¾ãã¢ãã³ã·ã§ã³ãè¨ç®ããç¹å¾´ãã¯ãã«ã«æãåããã
- 翻訳
- Encoderã§ç¹å¾´æ½åºãè¡ããDecoderã§å¥ã®è¨èªã«æ¸ãåºãã¦ãã
- LSTMã§ã¯åèªãã次ã®åèªã¸ã®åºåãè¡ã£ã¦ãã
- å¥ç´æ¸ãªã©ã§ã¯åèªã®å¯¾å¿ãªã©ã®æ£ç¢ºæ§ãå¿ è¦
- Attentionãç¨ãã¦ãæåã«ç¿»è¨³ãã¹ãåèªã決ãã
- ãã®åèªããã¨ã«æ¬¡ã®Attentionã決ãã¦ãã
- ååèªã®ç¹å¾´ã¨Decoderã®åæãã¯ãã«ã®å ç©ãåãããããAttentionã®åæã¹ã³ã¢ã«ãªãããã®ã¹ã³ã¢ã«Softmaxããããã¨ããããAttentionã®ã¹ã³ã¢ã«ãªã
- ç»åã®ã¨ãã¯Sigmoidã使ã£ã¦å ¨ã¦ã®ãã¯ã»ã«ã§æ³¨ç®ãããå¦ãã表ç¾ãã¦ããããè¨èªã®å ´åã¯softmaxã使ã£ã¦ãå¨ãã«æ¯ã¹ã¦æ³¨ç®ãã¹ããã©ãããèãã
- ã©ã®ãããªç¿»è¨³ãããã®ããåºåããQuery vector 㨠Key vector ã®å ç©ãã¨ã£ã¦softmaxã«ããã
- Self Attention
- Query /Key / Value ãEncoder / Decoder ã§ã¯ãªãåãã¿ã¤ãã³ã°ã§ç¹å¾´æ½åºãè¡ã
- ããåèªã¯æç« ä¸ã®ä»ã®åèªã«ä¾ã£ã¦æå³ãå¤ãã
- ããªãããå«ããã§ã¯ããªãã
- Self Attentionã¯ä»ã®åèªã«ç¹å¾´ã¥ããåä½ããããããã¨ãã§ãã
- å«ãã¨ããQueryã¨å ¨ã¦ã®åèªã®Keyã¨ã®å ç©ãåããsoftmaxãåºããValueã¨ããããã®ãç¾å¨ã®åèªãã¯ãã«ã«è¶³ã
- Self Attentionã§ä»ã®åèªã使ã£ã¦èªåèªèº«ãç¹å¾´ã¥ãããã¨ãã§ãã
- Attention is All You Need
- Transformerã®è©±
- RNNã®åé¡ç¹: ä¸ã¤åã®åèªã®è¨ç®ãçµããã¾ã§æ¬¡ã®åèªã®è¨ç®ãã§ããªã
- RNNã®ã¬ã¤ã¤ã¼ãSelf Attentionã®ã¬ã¤ã¤ã¼ã«ãããããã®ãTransformer
- BERT / GPT-2 ãªã©ãTransformerã使ã
- ããã§åºç¤ç¥èã¯ã¤ããã®ã§ãã®è«æãèªãã§ã
çæç³»NLPã®ç 究åå
(Microsoft ä¼è¤ 駿汰ãã)
ä»åãã®ã»ãã·ã§ã³ãèãããã«åå ãã¾ããããé常ã«ããå 容ã§ããã
- æçæã¨ã¯
- æå: è¨èªã表è¨ããããã«ä½¿ãããè¨å·ã®æå°åä½
- åèª: æåãçµã¿åããã¦ä½ããããæå³ã表ãæ§æä¸ã®åããæã¤æå°åä½
- æ: åèªãçµã¿åããã¦ä½ãããã¾ã¨ã¾ãããèã
- åèªå: åèªã並ã¹ããã®ãæãåèªåã®ä¸ç¨®
- æçæ: åèªã®æ°ãåèªã®ç¨®é¡ãåèªã®é çªã決å®ãããã¨
- ä¸å®ã®å¶éãå ¥ããªãã¨è§£ããªãã次ã®åèªãäºæ¸¬ãéª¨æ ¼ã¨ããæçæ
- 次åèªäºæ¸¬: ããåèªåãä¸ããããã¨ãã次ã«æ¥ãåèªãäºæ¸¬ãããã¨ãä¾: ã¹ããã®äºæ¸¬å ¥å
- ããåèªåãä¸ããããã¨ãã次ã«ã©ã®åèªã«ãªãã確çãè¨ç®ããæã確çã大ããåèªãé¸ã¶
- æçææåã®åèªAãäºæ¸¬ããAããBãäºæ¸¬ããã¨ããæµããæãçµããã¾ã§ç¹°ãè¿ã
- æçæã¢ãã«ã®æ´å²
- æçæã¢ãã«ã®æ´å²ã¯è¨èªã¢ãã«ã®æ´å²ã¨çµ¡ãã§ãã
- è¨èªã¢ãã«: ããæwãçãã確çãä¸ãã確çåå¸P(w)ã®ãã¨
- P(w)ããããã¨
- è¤æ°æã§å°¤åº¦æ¯è¼ãã§ãã
- åå¸ãããããã§ãæãçæã§ããã
- è¨èªã¢ãã«
- æãæ§æããåèªã®æ°ã¯å¯å¤
- 1種é¡ã®è¨èªã«ã¯æ°ä¸ç¨åº¦ã®åèªãåå¨
- P(w)ã®è¨ç®ã¯ç¡ç
- é次çã¢ããã¼ãã§è¿ä¼¼ãã
- 次ã«æ¥ãåèªã¯ããããåã®åèªãä½ãã§æ±ºã¾ãã¨ä»®å®
- N-gramè¨èªã¢ãã«
- ããåèªããåã®åèªå ¨é¨ãè¦ãã®ãã¤ããã®ã§è¦ãæ°ã決ãã¦è¨ç®ã軽ããã
- åã®N-1åã®åèªã®ä¸¦ã³ã«å¯¾ãã次ã«ããããªåèªã®ç¢ºçããããã°P(w)ãè¨ç®ã§ãã
- ããããã®ãã¼ã¿ããçµ±è¨çã«ç¢ºçã¯å¾ããã
- P(wi|wi+1-N, Wi-1)ãããã°æ¬¡åèªäºæ¸¬ãã§ãã¦æçæãã§ãã
- çµ±è¨çææ³ãç¨ããæçæ
- ç¾å®ã«ã¯5-gramç¨åº¦ãéç
- Pitman-Yor過程ã使ã£ã¦å¯å¤é·N-Gramãªã©ã使ããã
- ç¾ä»£ã§ã¯ãã¾ã使ããã¦ããªã
- DNNã®å¦ç¿
- å ¥åã¨æ£è§£ãå¿ è¦
- äºæ¸¬ãæ£è§£ã«è¿ããªãããå¦ç¿ãé²ãã¦ãã
- RNN言語モデル(Mikolov 2010)
- RNN (Rumelhart 1986)
- P(w)ãå¾ãããªãå¤ããã«ãããåèªããããã¨ãã次ã«æ¥ãåèªã®ç¢ºçãäºæ¸¬ããã
- å ¥å: åèªãæ£è§£: 次ã®åèª
- RNNã®åé¡ç¹
- ãã¬ã¼ã³RNNã¯é ãéå»ã®ãã¼ã¿ã®æ å ±ãæ¥éã«æ¶ãã¦ãããããã¯ççºçã«å¢å¤§ãã¦ãã
- LSTM( Hochreiter 1997) ã GRU( Cho 2014) ã¨ãã£ãææ³ã«ãã£ã¦ååå¦ç¿å¯è½ãªæ°´æºã«å°é
- Seq2Seq (Sutskever 2014)
- æã®æå³ãåãåºãã¨ã³ã³ã¼ãã¼RNNãåãåºããæå³ããæãçæãããã³ã¼ãã¼RNNãã¤ãªãã¦ãæããæã¸ã®å¤æãè¡ãã¢ãã«
- S2S + Attention (Luong 2015)
- éå»ã®æ å ±ãéã¿ä»ããã¦å度å©ç¨ããAttentionæ©æ§ãSeq2Seqã«è¿½å ãã¦ç²¾åº¦ãæ¹åããã¢ãã«
- çãã¦ããåé¡: RNNã¯éå»ã®æ å ±ãå¦çãã¦ããã§ãªãã¨å¦çã§ããªã
- Transformer (Vaswani 2017)
- Seq2Seqã¨åãæå¤æãè¡ãã¢ãã«
- å帰æ§é ãæããªãNNã¨Attentionã®ã¿ã§æ§æãããé«é
- 翻訳ã¿ã¹ã¯ã«ããã¦ãRNNç³»ææ³ããã¯ããã«å°ãªãå¦ç¿ã§SOTA
- é·æä¾åãåããªãåé¡ã解決
- 軽éã»é«éã§ä¸¦åååãã®æ§é
- BERT (Devlin 2018)
- 巨大ãª12層ã®Transformer Encoder
- 2種é¡ã®è¨èªã¢ãã«çäºåå¦ç¿
- ãã¹ã¯ãããåèªã®äºæ¸¬
- æã®é£ç¶å¤å®: 2ã¤ã®æãé£ç¶ããæã§ãããã©ããã®ç¢ºçå¤
- è¨å¤§ãªãã¼ã¿ã§äºåå¦ç¿ + å°æ°ã®ãã¼ã¿ã§ç®çã¿ã¹ã¯ã«è»¢ç§»å¦ç¿
- NLPã®å¹ åºããã³ããã¼ã¯ã§SOTA
- å°éã®ãã¼ã¿ã§å¦ç¿ã§ããã¨ããã®ãç£æ¥å©ç¨ä¸é常ã«å¤§ããã£ã
- 工夫ãããäºåå¦ç¿ã«ãã£ã¦Transformerã®åæ¹åå¦ç¿ãå¯è½ã«ãè¨èªã¢ãã«çäºåå¦ç¿ã®æå¹æ§ã示ã
- GPT-2 (Radford 2019)
- 巨大ãªTransformer Decoder
- è¨å¤§ãªãã¼ã¿ã使ã£ã¦è¨èªã¢ãã«çãªå¦ç¿ãè¡ã£ãã¢ãã«
- N-gramãRNNè¨èªã¢ãã«ã¨åãé次çã«åèªãäºæ¸¬ãã¦ããæçæã¢ãã«
- ãã§ã¤ã¯ãã¥ã¼ã¹ãã¹ãã ã¡ã¼ã«ã«æªç¨ããããã¨ãæ¸å¿µããã一時的に公開を停止したほど(後に再公開)
- UniLM(Dong 2019)
- Transformer Prefix LM ã使ã£ãäºåå¦ç¿ã¢ãã«
- è¤æ°ç¨®é¡ã®è¨èªã¢ãã«å¦ç¿
- èªè§£ç³»ã¿ã¹ã¯ã§BERT並ãçæç³»ã¿ã¹ã¯ã§SOTA
- T5 (Raffel 2019)
- Encoder - Decoder æ§é ãæã¤å·¨å¤§äºåå¦ç¿ã¢ãã«(Transformerã¨åã)
- å ¨ã¦ã®ã¿ã¹ã¯ãæå¤æã¨ãã¦äºåå¦ç¿ãè¡ã
- GPT-3 (Brown 2020)
- 1750åãã©ã¡ã¼ã¿ã®è¶ 巨大ã¢ãã«
- BERT: 3.4å
- T5: 110å
- æ§é ã¯GPT-2ãè¸è¥²
- ã¢ãã«ã®ãã©ã¡ã¼ã¿ãå¢ãããã¨ã§å°ãªããã¼ã¿ã®è»¢ç§»å¦ç¿ã§ãæ§è½ãåºããããã«ãªã
- ãã¬ã³ãã®æ¨ç§»
- ä»å¾ã®ãã¬ã³ãäºæ¸¬
- Self-Attentionãªã©ã®Transformerã®åè¦ç´ ã®æå¹æ§ã«ã¤ãã¦ã®ç¥è¦ãéç©
- Transformerã¯ç²¾åº¦æ¹åã軽éåã®äºç¨®ãç»å ´
- GPT-3ã®æ¹åæ§ã¯å®ç¨ä¸æ¥µãã¦éè¦
- çºè¡¨è
ã®ç 究
- Memory Attention
- Seq2Seq ã«çºè©±ã«å¯¾ããå¿ççæãAttentionã使ã£ã¦é¸ãã§ãã
- ç 究ããå¾ãããç¥è¦
- Attentionã¨ã¯å®è³ªé¡ä¼¼åº¦ã®è¨ç®ãæ å ±ã®æ½åºã«å©ç¨ã§ãã
- è¨ç®è² è·ãå°ããã大å°é¢ä¿ã解éæ§
- Transformerã¯æ°ããæ å ±ã®æµãã追å ãããã¨ãé£ããããLSTMãã¼ã¹ã®Seq2Seqã¯æ¯è¼çç°¡å
- ã·ã³ãã«ãªãããå®è£ ãæ§é æ¤è¨ã«æéãããããªããã¨ããããã®å®è£ ã¨ãã¦ä¾¡å¤ãã
- æçæã®èªåè©ä¾¡ã¯é£ããã対話系çæã¢ãã«ã¯èªåè©ä¾¡ã極ãã¦å°é£
- BLEUãMETEORçããããã©ä¸é©å
NLPã½ãªã¥ã¼ã·ã§ã³éçºã®æåç·
(ISID 深谷å次ãããå°å·é太éããããã¡ã¤ãµã«ãã)
ISIDã5æã«ãªãªã¼ã¹ããæ°è£½åã®ã¢ã¼ããã¯ãã£ãè¨èªã¢ãã«ã®è§£èª¬ã
- ããã³ãã¨ã³ã
- Azure Blob Storage ã«éçãµã¤ãããããã¤
- Vue.js / Nuxt.js ãã¬ã¼ã ã¯ã¼ã¯ã§éçãµã¤ããæ§ç¯ããaxiosã§ããã¯ã¨ã³ãã¨éä¿¡
- APãµã¼ã
- MLãµã¼ãã¹
- è¨èªã¢ãã«
- ISIDãªãªã¸ãã«ALBERTãä½ã£ã
- æ¥åã·ã¹ãã ã«çµã¿è¾¼ãè¦ä»¶ã¨ãã¦GPUãªã½ã¼ã¹ã®ã³ã¹ãã¨ããã©ã¼ãã³ã¹ã®åé¡ããã£ããããé«éã»å°è¦æ¨¡ãªã¢ãã«ãé¸å®ãã
- ALBERT
- åãè¾¼ã¿è¡åã®å æ°å解
- ã¬ã¤ã¤ã¼ãã©ã¡ã¼ã¿ã¼ã®å
±æ
- 12åã®ã¬ã¤ã¤ã¼ã§1ã¤ã®ãã©ã¡ã¼ã¿ã使ã
- NSPã®ä»£ããã«Sentence-Order Predictionã®æ°è£å©ã¿ã¹ã¯
- LAMB ã¢ã«ã´ãªãºã 使ç¨
- 大ãããããã§å¦ç¿
- n-gram ãã¹ãã³ã°
- SentencePiece対å¿
- ALBERTæ¥æ¬èªçã¯ã¾ã 1ã¤ãããªã
- https://github.com/alinear-corp/albert-japanese
- ã¢ãã«ã®é·ãã¯512èªã§ãå®éã®æ¥åææ¸ãå ¥ã£ã¦ããªã
- SentencePieceã使ç¨ãã¦ãã
- ISIDãªãªã¸ãã«ALBERT
- ã¢ãã«é·1024èª
- Sudachiã使ç¨ãããã¯NICTã®å®é¨çµæã«ãããäºåã«å½¢æ ç´ è§£æãè¡ã£ããè¯ã精度ãéæã§ãããã¨ãããã£ã¦ãããã
- Whole Word Masking (WWM) ã使ã
- ã³ã¼ãã¹: Wikipediaæ¥æ¬èª
- ãã¼ã¯ãã¤ã¶ã¼: Sudachi ã¢ã¼ãC + Wordpiece
- Livedoorãã¥ã¼ã¹ã§ãã¡ã¤ã³ãã¥ã¼ãã³ã°
- ä»å¾ã®èª²é¡
- æé·ã®é·ãã1024ã«ããããæ¨è«æéã«å½±é¿ããã
- ãã¬ãã¸è¸çææ³ DistillBERTãªã©ã®æ¤è¨ãSparse Attentionã®ä½¿ç¨
Azure ML èªç¶è¨èªå¦çã®ææ°åå
(Microsoft 女é¨ç°å太ãã)
- Classical Text Explainer
- å¤å ¸çãªæ©æ¢°å¦ç¿ãã¤ãã©ã¤ã³
- sklearn ã®ç·å½¢ã¢ãã« coefs_
- Treeãã¼ã¹ã®ã¢ã³ãµã³ãã«ã¢ãã« feature_importances
- ããã©ã«ã 1-gram BoW + sklern CountVectorizer + LR
- Unified Information Explainer (Guan 2019)
- MSã®SOTAã®ç 究
- ç¸äºæ å ±éããã¼ã¹ã«ãã post-hoc ã®ã¢ããã¼ã
- DNNã®é ã層ã«ã¤ãã¦ã®èª¬æ
- ç¾å¨ã¯BERTã®ã¿å¯¾å¿
- Introspective Rationale Explainer (Yu 2019)
- MSã®EMNLPã§çºè¡¨ããç 究
- ã¢ãã«å¦ç¿ã®ä»çµã¿ã«åãè¾¼ãã¿ã¤ã
- å çççææ©(Introspective Generator) ãåå¦çã§å©ç¨
- å ¥åããã¹ããæ ¹æ (raitionales) ã¨åæ ¹æ (anti-rationales) ã«åå²
- æ ¹æ ã®ã¿ã使ã£ã¦ç²¾åº¦ãæ大ã«ãªãããã«å¦ç¿
- ã¢ãã«ã¯å ¥åããã¹ãããçæãããæ ¹æ ããã¿ãªã