BERT ã GPT ã®ç»å ´ã«ãããããã¹ããæ±ãã¢ãã«ã¯å¤§ããçºå±ãã¾ããããå¦å®ã¨ãããããããµããæä½ãæ±ãã®ãä¾ç¶é£ããã§ãã
æ¬ç¨¿ã§ã¯ããã®çç±ã¨ãé¨åçãªè§£æ±ºçãç´¹ä»ãã¾ãã
ç®æ¬¡
- ç®æ¬¡
- å¦å®æãç解ã§ããªãAIãã¡
- å¦å®æãç解ã§ããã«å°ããã¨
- ãªãå¦å®æããã¾ãæ±ããªãã®ã
- 対å¦æ³
- ãããã«
å¦å®æãç解ã§ããªãAIãã¡
BERT (tohoku-nlp/bert-base-japanese-v3) ã§
A =ãç§ã¯ã寿å¸ã好ãã§ããã
B =ãç§ã®å¥½ããªé£ã¹ç©ã¯ã寿å¸ã§ããã
ã®ããã¹ãåãè¾¼ã¿ã®ã³ãµã¤ã³é¡ä¼¼åº¦ãæ±ãã¦ã¿ã¾ããããA 㨠B ã¯åããããªãã¨ãè¨ã£ã¦ãããäºæ³ãããããã«ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.9695 ã¨é«ãã§ãã
ã§ã¯ã
A =ãç§ã¯ã寿å¸ã好ãã§ããã
C =ãç§ã¯ã寿å¸ã好ãã§ã¯ãªãã§ããã
ã¨ããæ£å対ã®ããã¹ãã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ã©ãã§ããããããªãã¨ãã㯠0.9762 ã¨ãå
ã»ã©ãããããã«é«ãå¤ã¨ãªãã¾ããæ£å対ã®ãã¨ãè¨ã£ã¦ããã®ã§ãç´è¦³çã«ã¯é¡ä¼¼ã¯ãã¦ãã¾ããããæ¤ç´¢ãåé¡ãªã©ã®å¤ãã®ã¢ããªã±ã¼ã·ã§ã³ã§ããããã¯ä¼¼ã¦ããªãã¨å¤å®ãã¦ã»ããã¯ãã§ããBERT ã¯ãã®ãããªé¡ä¼¼åº¦å¤å®ã«é©ãã¦ãã¾ããã
ChatGPT ã¯ã©ãã§ããããã
ãæ¥æ¬ã®é¦é½ã¯ããã§ã¯ãªããã ã®ããã«å½ã¦ã¯ã¾ãåèªãåºåããã¦ã¿ã¾ãããæ¥æ¬ã®é¦é½ã¯äº¬é½ã§ã¯ãªããããæ¥æ¬ã®é¦é½ã¯å²¡å±±ã§ã¯ãªããããªã©ãä½ã§ãããã§ãã
ããããChatGPT ã¯ãæ¥æ¬ã®é¦é½ã¯æ±äº¬ã§ã¯ãªããã¨ã¾ã£ããééã£ããã¨ãè¨ã£ã¦ãã¾ããå¦å®ãç¡è¦ãã¦è¯å®æã®ãæ¥æ¬ã®é¦é½ã¯ããã§ããããã¨åä¸è¦ãã¦ãã¾ã£ã¦ãããããªæ¯ãèãã§ããï¼æ³¨ï¼ä½åº¦ã試ãã¨ã京é½ã¨åºåãããã¨ãããã¾ããã¾ããChain-of-Thought ã許ãã°ãæ£è§£çã¯ä¸ããã¾ããããããä¾ç¶ã¨ãã¦ãè¯å®æã¨æ¯ã¹ãã¨å¦å®æãæ£ããæ±ããã¨ã«ã¯ããã«è¦å´ãã¾ããï¼
ææ ®æ·±ãæ£ç¢ºã¨ããã¦ãã ChatGPT o1 ããæ°ç§èããã®ã¡ãã¯ãééãã¾ããã
è±èªã§éãè¨ãæ¹ããã¦ããã¯ãééãã¾ãã
å¦å®æãç解ã§ããã«å°ããã¨
ææ¸æ¤ç´¢ããã£ãããããã§å¦å®æãå«ãã¦ã¼ã¶ã¼ã®çåã«å¿ããããªããã¨ãããããã¾ãã
ãã¢ã©ã¼ããåºã¦ããªããã¨è¨ã£ã¦ããã®ã«ãã¢ã©ã¼ããåºãå ´åã®å¯¾å¦æ³ãåºãã¦ãã¾ã£ããããã¡ã¼ã«ãéã£ã¦ã»ãããªããã¨è¨ã£ã¦ããã®ã«ãã¡ã¼ã«ãåãåãããå ´åã®æ¡å ãåºãã¦ãã¾ãããªã©ã§ãã
ã»ãã«ããã¢ã³ã±ã¼ãçµæãåæããããã«ãåãæè¦ã®äººãã¯ã©ã¹ã¿ãªã³ã°ãããã¨ãã¦ããããã®ãµã¼ãã¹ã好ãã§ããã¨è¨ã£ã¦ãã人ã¨ãããã®ãµã¼ãã¹ã¯å¥½ãã§ã¯ãªãã§ããã¨è¨ã£ã¦ãã人ãåãæè¦ã§ããã¨ã¾ã¨ãã¦ãã¾ãããããã¾ããã
ãªãå¦å®æããã¾ãæ±ããªãã®ã
åèªåãè¾¼ã¿ï¼ãã¼ã¯ã³åãè¾¼ã¿ï¼ãã¼ã¹ã®è¨èªã¢ãã«ã¯æ§é ä¸ãå¦å®æãæ±ãã®ãè¦æã§ãã
è¨èªã¢ãã«ã¯ã¢ãã³ã·ã§ã³ã MLP ãªã©ã§ããã¾ã§ã®æèãä¸æ¬ã®åãè¾¼ã¿ãã¯ãã« ã«ã¾ã¨ãããããåºååèªåãè¾¼ã¿å±¤ã«å ¥åããsoftmax é¢æ° ã§åèªãäºæ¸¬ãã¾ãã
ãæ¥æ¬ã®é¦é½ã¯ããã§ã¯ãªãããã®ããã®æ£è§£ã¯ãæ±äº¬ä»¥å¤ã®ãããããã®ã§ãããªã®ã§ããã® softmax ã®çµæã¯æ±äº¬ã ã 0 ãã¨ãã京é½ã»å²¡å±±ã»åå¤å±ã»æå¹ãªã©ã«å¤ãæã¤ã®ãæ£è§£ã§ãï¼ããããããããªã©ã«ãå¤ãæã¤ã¹ãããããã¾ããï¼ããããããããå®ç¾ããããã«ã¯ãæèãã¯ã㫠㯠ã ã ã¨ã¯è¿ãã ã¨ã¯é ããªããã°ãããªãããã§ããã 㨠ã¯è¿ãã«ããããã ãã¯ããã¨ãããããããã®ã«å¤ãç«ã¦ã¦ã ã«ã ãå¤ãä¸åç«ã¦ãªãã¨ãããããªãã¨ã¯ä¸å¯è½ã§ããã©ãã ãå段ã®ã¢ãã³ã·ã§ã³ã MLP ãªã©ã®ã¨ã³ã³ã¼ãã¼ãå¼·åã§ãã£ã¦ãããããããã®ãã㪠ãåå¨ããªãã®ã§é©åãªåã込㿠ãè¦ã¤ããã¯ããããã¾ããã
ã§ã¯è¨èªã¢ãã«ã¯ã©ããããã¨ããã¨ããæ¥æ¬ãã¨ãé¦é½ããªã©ã®åèªã®åºç¾ãææããã«ãã¦ããããã¨å ±èµ·ãããããæ±äº¬ãã¨åºåãã¦ãã¾ãã¾ããWikipedia ãªã©ã®ã³ã¼ãã¹ã§ã¯ããæ¥æ¬ããé¦é½ãã¨ç´åã§è¨åããã£ãå ´åã«ã¯æ¬¡ã®åèªã¯ãæ±äº¬ãã§ãã確çã極ãã¦é«ãããã®ãããªåç´ãªæ¨è«ã§ããªãæ£è§£ã§ãã¦ãã¾ãã®ã§ãããããæ¨è«æ¹æ³ãè¨ç·´ã§èº«ã«ä»ãããã¹ãæã«è¨ç·´ã§è¦ãªãã£ãå°ãæå°æªãªå¦å®æãæ¥ãæã«ãåã対å¿ããã¦ãã¾ãã¨ãã訳ã§ãã
ãªããã¾ã«æåããã®ã
åè¿°ã®ããã«ãä½åã試ãã¨äº¬é½ã¨åºåã§ãããã¨ãããã¾ããããã¯ã京é½ããæ±äº¬ã«é·é½ããé¢ä¿ã§ããæ¥æ¬ããé¦é½ãã¨ã京é½ããããç¨åº¦ã®å ±èµ·ãããã®ãä¸å ã§ãããããããã¨é·é½ã«é¢ããè¨è¿°ã§ãæ¥æ¬ã®é¦é½ã¯äº¬é½ã§ã¯ãªããªã£ãããªã©ã®ããã«æ示çã«å¦å®æã®å½¢ã§ã®è¨ç·´æããããããããã¾ããããæ¥æ¬ã®é¦é½ã¯ããã§ã¯ãªãããã®ããã¨ãã¦ã¯äº¬é½ãããç¨åº¦ã¯çããããããããã京é½ã¨çããã ããªãã°ã ã ã®æ¹åã«éããªã大ããããã°ãä»ã®åèªï¼æ±äº¬ãå«ãï¼ã®ç¢ºçã 0 ã«ãã¦ã京é½ã ããåºåãããã¨ãæ§é ä¸å¯è½ã§ãããã ããããã¯ããã¾ã§ã·ã§ã¼ãã«ããçãªå¯¾å¦ã§ãã£ã¦ãæ ¹æ¬ããå¦å®æã®åé¡ã«å¯¾å¦ã§ãã¦ãã訳ã§ã¯ãªããã¨ã«æ³¨æãã¦ãã ãããããã¯ãæ¥æ¬ã®é¦é½ã§ãªããã®ã¯äº¬é½ã§ãããã¨ããæå³ãè¯å®æçã«ãã®åé¡ãæ¸ãæãã¦è§£ãã¦ãããããã¯äº¬é½ä»¥å¤ã®æ¥æ¬ã®é¦é½ã§ãªããã®ï¼å²¡å±±ãæå¹ããããï¼ãç¡è¦ãã¦ããã®ã§ãå¦å®ãè«ççã«æ£ãã解ãã¦ããããã§ã¯ããã¾ãããäºæ¸¬åå¸ãçã®åå¸ï¼æ±äº¬ã ã 0 ã§ãã以å¤ã®ãããããã®ã«å¤ãæã¤ï¼ã¨ä¸è´ãã¦ãã訳ã§ã¯ãªãã交差ã¨ã³ãããã¼èª¤å·®ã¯æå°åããã¾ããã
ã¾ããæ¥æ¬ã®é¦é½ã¯äº¬é½ãå¥è¯ã大éªãªã©åãããããããããªããã®ã®æ£è§£ããããã®ã§ãã®ãããªã·ã§ã¼ãã«ããã§è§£ãããã¨ãããã¾ããããã®ãããªã·ã§ã¼ãã«ãããç¡ãã£ãããããé£ããã£ãããããã¨ãããã¾ããä¾ãã°ããã¤ã³ããã·ã¢ã®é¦é½ã¯ããã§ã¯ãªãããã¨ããã¨ãæ¥æ¬ã®å ´åããã¯ããã«é£ãããã¸ã£ã«ã«ã¿ã¨èª¤çãã¦ãã¾ã確çãé«ããªãã¾ãã
対å¦æ³
BERT ãªã©ã®ã¢ãã«ãå¦å®æãè¦æã¨ãããã¨ã¯ä»¥åããææããã¦ãã¾ãããWhat BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models [Ettinger TACL 2020] ãªã©ãæåã§ããã¾ããAn Analysis of Natural Language Inference Benchmarks through the Lens of Negation [Hossain+ EMNLP 2020] ã Understanding by Understanding Not: Modeling Negation in Language Models [Hosseini+ NAACL 2021] ã®ããã«ã対å¦ããæ¹æ³ãããã¤ãææ¡ããã¦ãã¾ãã
æ ¹æ¬çã«ã¯ãä¸è¿°ã®ããã«åºå層ã«ããã¦åºå®ããåèªåãè¾¼ã¿ãç¨ãã¦ãããã¨ãåé¡ãªã®ã§ããããã³ã³ããã¹ãã«ä¾åããããã«æ¹å¤ãããã¨ãå¿ è¦ã§ããããã®æ¹æ³ã¯ã³ã¹ãã大ãããããããããå¦å®æãæ±ããããã«ããããã ãã«æ¡ç¨ããããã¨ã¯ã»ã¨ãã©ãªãã対ççæ³çã«è§£æ±ºããæ¹æ³ã主æµã§ãã
ãã¡ã¤ã³ãã¥ã¼ãã³ã°
ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã§ããå ´åã«ã¯ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãæå¹ãªãã¨ãå¤ãã§ãã
ææ åæãååã®ã¬ãã¥ã¼å¤äºæ¸¬ãªã©ã§ã¯å¦å®æã®å¦çã極ãã¦éè¦ã§ãããããã®ã¿ã¹ã¯ãæ師ãªãã§è§£ãã®ã¯é£ããã§ããBERT ã使ã£ã¦ããç¨åº¦è§£ããã®ã¯ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã®ãããã§ãã
ãããã®ã¿ã¹ã¯ã§ã¯ãå¦å®ãå³å¯ã«æ±ãå¿ è¦ã¯ããã¾ãããã好ãã§ãªããã¨ããè¨è¿°ããã£ãã¨ããã好ãã§ãªãããªãä½ãè¤éãªææ ãæã£ã¦ããã®ã ããããªã©ã¨ãå¦å®ãå³å¯ã«å¦çãã¦ä»ã®å¯è½æ§ãå¾åã«èããå¿ è¦ã¯ããã¾ããããããã®ã¿ã¹ã¯ã§ã¯ã好ãã§ãªãããªãã°ãå«ãã ãï¼å°ãªãã¨ãä¸ç«ããã¯å«ãå¯ãã ï¼ã¨æ±ºãã¤ãã¦ãã¾ã£ã¦ãå·®æ¯ããªããããæå³ã·ã§ã¼ãã«ããçã«ããããã®ã¿ã¹ã¯ã解ããã¨ãã§ãã¾ããã·ã§ã¼ãã«ããã¨ãã£ã¦ããããã®ã¿ã¹ã¯ã«ããã¦ã¯ã·ã§ã¼ãã«ãããããã¨ãæ£è§£ã«ç¹ããæ£ç¾©ãªã®ã§åé¡ããã¾ããããã¡ã¤ã³ãã¥ã¼ãã³ã°ã«ãã£ã¦ãã·ã§ã¼ãã«ãããã¦ãåé¡ãªããã¨ãã·ã§ã¼ãã«ããããæ¹æ³ãå¦ç¿ãã¦ãæ§è½ã大ããä¸ãããã¨ãã§ãã¾ãã
ãã ããèªç¶è¨èªæ¨è« (natural language inference) ã®ããã«ãè«çãå³å¯ã«æ±ãå¿ è¦ãããã¿ã¹ã¯ã§ã¯ããã¡ã¤ã³ãã¥ã¼ãã³ã°ãããç¨åº¦ã¾ã§ã¯æå¹ãªãã®ã®ãå³å¯ãªå¦å®è«çãæ±ãå¿ è¦ãããå ´åã«ã¯ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã§ããã®åé¡ãå®å ¨ã«ã¯è§£æ±ºã§ããªããã¨ã«æ³¨æãã¦ãã ããã
ããã³ããã®å·¥å¤«
ChatGPT ãªã©ã®ãã³ã¼ãã¼åã®è¨èªã¢ãã«ãç¨ããå ´åã«ã¯ãäºåã«ãã©ã³ãã³ã°ããããªã©ãChain-of-Thought çãªããã³ããã®å·¥å¤«ãæå¹ã§ããåé ã®ä¾ã§ã¯å¦å®æã®é£ãããéç«ãããããã«ããã®åèªã ãåºåãã¦ãã¨ããç¸ããå ¥ãã¦ãã¾ããããä½åãªåºåãæãã§ããåºåãããã¨ã許ãã°ãæ£è§£ã®ç¢ºçã¯ããªãä¸ããã¾ãã
ä½åãªåºåã許ãã°ãçè«ä¸ã¯åè¿°ã® softmax ã®åé¡ã¯åé¿ã§ãã¾ããåè¿°ã®è°è«ã§ã¯ä¸çºã§åºåãããã¨ãåæã¨ãã¦ãã¾ããããããåºåããã¿ã¤ãã³ã°ãä»»æã§ããã°ãåã¿ã¤ãã³ã°ã§ã®åºåã京é½ã ãã岡山ã ããã®ãããªæ¥µç«¯ãªåå¸ã«ãªã£ã¦ãã¦ããããããéãåãããåå¸ã§ã¯æ±äº¬ä»¥å¤ã®ãããããã®ã«å¤ãæã¤ããã«ãããã¨ãçè«ä¸ã¯å¯è½ã§ããããã¯ã¤ã¾ããå¦å®æãç¡çããè¯å®æã«ç´ããã¨ãç¡éåç¹°ãè¿ãã°å®è³ªå¦å®æã¨ç価ã«ãªããã¨ãããã¨ã§ãã
ãã ããçè«ä¸ã¯å¯è½ã§ãæ§æã¯è¤éãªã®ã§ããã精度ããå¦ç¿ãããã¨ã¯é£ããã§ãããã¨ãæ£è§£ãåºããã¨ãã¦ããåå¸ã¨ãã¦ã¯æ£ç¢ºã«ãªãã¾ãããããã¯ãå¦å®ãç´æ¥æ±ã£ã¦ããã®ã§ã¯ãªãããæ¥æ¬ã®é¦é½ã§ãªããã®ã¯äº¬é½ã§ããããæ¥æ¬ã®é¦é½ã§ãªããã®ã¯å¤§éªã§ããããæ¥æ¬ã®é¦é½ã§ãªããã®ã¯ç«ã§ãããã®ããã«ãå¦å®æãç¡çããè¯å®æã«ç´ããã¨ãç¹°ãè¿ãã¦ããã ãã ããã§ãããããå®éã«ã¯ç¡éã®éãåããã§ã¯ãªãæéã®éãåããã«ãªãã®ã§ãå¦å®ã¨ç価ã«ã¯ãªãå¾ããåå¸ã¨ãã¦ç²¾åº¦ãæªããªãã¾ãããã®ãããä»ã«ãç¡æ°ã«é¸æè¢ãããã¯ããªã®ã«ãä¹±æ°ãæ¯ãç´ãã¦ãåãçãã°ããåºåããã¾ãã
ããã¹ãåãè¾¼ã¿ã欲ããå ´åã«ãåé¡ãèµ·ããã¾ããå ¸åçãªææ¸æ¤ç´¢ã«å ãã¦ããã£ãããããã«ããã RAG ã® retrieval é¨åãªã©ãããã¹ãåãè¾¼ã¿ãå¿ è¦ãªå ´é¢ã¯å¤ãã§ããä¸ã®ä¸ã®å¤§åã®åãè¾¼ã¿ææ³ã¯ããã®ããã¹ããå ¥åããæç¹ã®ã¢ãã«ç¶æ ãããã¹ãåãè¾¼ã¿ã¨ãã¦ç¨ãã¾ããããããããã¯åä¸ã®ç¶æ ããä¸çºã§çããåºãã¨ãããã¨ã§ãããããã¯ããã®åèªã ãåºåãã¦ãã¨ããç¸ããããç¶æ ã§è¨èªã¢ãã«ã«è§£ãããç¶æ³ã¨åãã§ããããã§ã¯ãã¯ãä¸è¿°ã® softmax ã¨åæ§ã®åé¡ãçãããããããã³ããã®å·¥å¤«ã§åé¡ã解決ãããã¨ãé£ããã§ãã
å¦å®æãæèããè¨ç·´
å¦å®æãæ±ãããããªè¨ç·´ãæ示çã«æ½ãææ³ãææ¡ããã¦ãã¾ããBERTNOT [Hosseini+ NAACL 2021] ã¨ããææ³ã¯ã"The capital of Japan is Tokyo" ã¨ããããã¹ããè¨ç·´ã³ã¼ãã¹ã«ããã°ããããã "The capital of Japan is not ___" ã¨ããå¦å®æã®äºæ¸¬åé¡ã人工çã«ä½ãã___ ã«å¯¾ããã¢ãã«ã®åºåã¨ã㦠Tokyoï¼ãã¨ãã¨ã®è¯å®æã§ããã«ãã£ãåèªï¼ã®ç¢ºçãä½ããªãããã«è¨ç·´ãã¾ãã
ã¢ã¼ããã¯ãã£ã¯åä¸ãªã®ã§ãsoftmax ã®æ§é ä¸ã®åé¡ã¯ä¾ç¶æ®ãã¾ãããé常ã®è¨ç·´ã¨æ¯ã¹ãã¨ããè¯ãè¿ä¼¼ãå¯è½ã§ããä¾ãã°ãå¦å®ã®å¾ã®åèªã®äºæ¸¬ã§ã¯æèãã¯ãã« ãã¼ãã«ãã¦ãã¾ãã°ãsoftmax ã®åºåã¯ä¸æ§åå¸ã«ãªãã¾ããåè£ã®åèªã 10000 åããã°ããæ±äº¬ããé¸ã°ãã確ç㯠1/10000 ã ãã§ããæ±äº¬ã以å¤ã®æ£è§£ãåºåããã確ç㯠9999/10000 ã§ãããã»ã¨ãã©æ£è§£ã¨ãªãã¾ããæ ¹æ¬çã«å¦å®ã解ãã¦ããããã§ã¯ãªãããããã·ã§ã¼ãã«ããã¨ããã°ã·ã§ã¼ãã«ããã§ãããåå¸ã¨ãã¦ã¯ã京é½ãã決ãæã¡ããããã¯çã®å¦å®ã«è¿«ã£ãããè¯ãè¿ä¼¼ã¨ããã¾ãã
å®é¨ã§ã¯è³ªåå¿çãèªç¶è¨èªæ¨è«ã«ããã¦ãBERTNOT ã¯é常㮠BERT ã¨æ¯ã¹ã¦å¦å®æãå«ãå ´åã«é«ãæ§è½ãéæãã¦ãã¾ãã
ææ¸æ°ãå¢ãã
ææ¸æ¤ç´¢ã§ã¯ææ¸æ°ãå¢ãããã¨ãæå¹ã§ãã
åé ã®ä¾
A =ãç§ã¯ã寿å¸ã好ãã§ããã
B =ãç§ã®å¥½ããªé£ã¹ç©ã¯ã寿å¸ã§ããã
C =ãç§ã¯ã寿å¸ã好ãã§ã¯ãªãã§ããã
㧠A ãæ¤ç´¢ã¯ã¨ãªã ã¨ãã¦ãA ã¨ä¼¼ãããã¹ãããã¼ã¿ãã¼ã¹ããæ¤ç´¢ãããã¨ãã¾ããããB 㨠C ããåè£ãç¡ãã¨ãä¼¼ãæè¦ã® B ã§ã¯ãªããæ£å対ã®æè¦ã§ãã C ãå¾ããã¦ãã¾ãã¾ãã
ãã ããBERT ã A 㨠C ãåä¸è¦ãã¦ããããã§ã¯ãªããã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.9762 (< 1)ã§ããã³ãµã¤ã³é¡ä¼¼åº¦ã 0.9763 ãã 1.0000 ã®éã®æ£è§£ããã¹ãããã¼ã¿ãã¼ã¹ä¸ã«ããã°æ£è§£ã§ãã¾ãã
A 㨠A ã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 1 ãªã®ã§ãæ¤ç´¢ã¯ã¨ãªã¨å
¨ãåãããã¹ãããã¼ã¿ãã¼ã¹ä¸ã«ããã°ãããæ£ããåã£ã¦ãããã¾ããã¾ãã
A =ãç§ã¯ã寿å¸ã好ãã§ããã
D =ãç§ã¯ã寿å¸ã大好ãã§ããã
ã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.9974 ãªã®ã§ãD ãæ£ããåã£ã¦ãããã¾ãã
ãã¼ã¿ãã¼ã¹ããã£ããè©°ã¾ã£ã¦ãã¦ãã³ãµã¤ã³é¡ä¼¼åº¦ã 0.9763 ãã 1.0000 ã®ããã¹ãã大éã«ããå ´åã«ã¯ãå¦å®æã誤ã£ã¦åã£ã¦ãã¾ã確çãä¸ãããã¨ãã§ãã¾ãã
å½ç¶ã¨ããã°å½ç¶ã®è§£æ±ºçã§ã¯ããã¾ãããã·ã³ãã«ãªã®ã§æ±ç¨æ§ã¯é«ã解決çã§ãã
ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã§åãè¾¼ã¿ã¢ãã«ãé¡ä¼¼åº¦é¢æ°ãæ¹åããã®ãåºæ¬çãªææ³ã§ã第ä¸ã«èããã¹ãã§ããããããåãè¾¼ã¿ã¢ãã«ã API ã§å©ç¨ãã¦ããå ´åããã¼ã¿ã足ããªãå ´åãã¾ããã¡ã¤ã³ãã¥ã¼ãã³ã°ãå¯è½ã§ãå¦å®æã®ããã«è¯ãé¡ä¼¼åº¦ãå®ç¾©ããã®ãä¸çç¸ã§ã¯ãããªãå ´åãããã¾ãããã®ã¨ãã«ã¯çæ³çãªé¡ä¼¼åº¦é¢æ°ã追ãæ±ããããããææ¸ã®å´ã工夫ããã¢ããã¼ãã«åã£ãæ¹ããã¾ããããã¨ã¯è¦ãã¦ããã¨è¯ããã¨ãããã¨æãã¾ãã
ã¯ã¨ãªã¨ãã¼ãæ¡å¼µãã
ææ¸æ¤ç´¢ã§ã¯ã¯ã¨ãªã¨ãã¼ãæ¡å¼µã»æ°´å¢ããããã¨ãæå¹ã§ããããã¯å¦å®æã«ã¤ãã¦ãåæ§ã§ãã
ä¾ãã°ãChatGPT ã«ããã¹ãã®ç¶ããçæããã¦ã¿ã¾ãããã
以ä¸ã®ãããªæ¡å¼µããã¹ããå¾ããã¾ããã
A' = ãç§ã¯ã寿å¸ã好ãã§ãç¹ã«ãã°ãããµã¼ã¢ã³ã®æ¡ãã¯å¤ãã¾ãããæ°é®®ãªéã®æ¨ã¿ã¨é
¢é£¯ã®çµ¶å¦ãªãã©ã³ã¹ããã¾ããªããé£ã¹ããã³ã«å¹¸ããªæ°åã«ãªãã¾ããæã
ãå人ã家æã¨å転寿å¸ã«è¡ããã¨ãããã¾ãããå°ãç¹å¥ãªæ¥ã«ã¯ã«ã¦ã³ã¿ã¼å¸ã®ãã寿å¸å±ã§è·äººãããæ¡ãæ¬æ ¼çãªã寿å¸ã楽ãã¿ã¾ããå£ç¯ã«ãã£ã¦å¤ãããã¿ãå³ããã®ã楽ãã¿ã®ä¸ã¤ã§ãæ¥ã¯é¯ãç§ã¯ãµã³ãã¨ãã£ãæ¬ã®å³è¦ãå ªè½ããã®ã好ãã§ããã寿å¸ã¯ãã ã®é£äºã§ã¯ãªããç¹å¥ãªæéãæ¼åºãã¦ãããæçã ã¨æãã¦ãã¾ããã
B' = ãç§ã®å¥½ããªé£ã¹ç©ã¯ã寿å¸ã§ããã寿å¸ã¯ã·ã³ãã«ãªè¦ãç®ãªããã使ãããé£æãè·äººã®æè¡ã«ãã£ã¦ç¡éã®ããªã¨ã¼ã·ã§ã³ãããã¾ããç¹ã«ããã°ãããµã¼ã¢ã³ã®æ¡ãã¯è¦ãç®ãç¾ãããå£ã«å
¥ããã¨ã¨ããããããªé£æããã¾ãã¾ãããå転寿å¸ã«è¡ãã®ã楽ããã§ãããæã«ã¯å°ã奮çºãã¦ã«ã¦ã³ã¿ã¼ã§é£ã¹ãã寿å¸ãæ ¼å¥ã§ããè·äººãããç®ã®åã§æ¡ã£ã¦ãããã寿å¸ã¯ããã¿ã®é®®åº¦ãã·ã£ãªã®æ¸©ãããæãããã¦ãã¾ãã§ç¹å¥ãªã馳走ã®ããã§ããæè¿ã¯ãæµ·å¤ã§ãã寿å¸ã人æ°ã§ãåµä½å¯¿å¸ããã¼ã«å¯¿å¸ã話é¡ã«ãªã£ã¦ãã¾ããããã£ã±ãã·ã³ãã«ãªæ¥æ¬ã®ã寿å¸ãä¸çªã ã¨æãã¾ããã
C' = ãç§ã¯ã寿å¸ã好ãã§ã¯ãªãã§ãããããã§ãå人ã家æã¨ä¸ç·ã«å転寿å¸ã寿å¸å±ã«è¡ããã¨ã¯ããããã¾ãããªããªããé£ã¹ç©ã ããå¤é£ã®æ¥½ãã¿ã§ã¯ãªãããã§ããã¿ããªãç¬é¡ã§ãããã¹ãããªããé£äºããã¦ããæéãé°å²æ°ã好ãã§ãããã ãã§ãååã«æ¥½ããã®ã§ããã寿å¸ã®ä»£ããã«ãåæããè¶ç¢è¸ãããã©ããªã©ã®ãµã¤ãã¡ãã¥ã¼ãé ¼ããã¨ãå¤ãã§ããæè¿ã§ã¯å¯¿å¸å±ã«ããããã£ãé¸æè¢ãå¢ããã®ã§ãç§ã®ããã«ã寿å¸ãè¦æãªäººã§ãå®å¿ãã¦æ¥½ãããããã«ãªãã¾ãããããã§ãããã寿å¸ãè¦æãªãã¦çããããã¨è¨ããããã¨ãããã¾ãã確ãã«ã寿å¸ã¯æ¥æ¬äººã®ã½ã¦ã«ãã¼ãã®ãããªåå¨ãªã®ã§ãå«ãã¨ããã¨é©ãããã®ããããã¾ãããã§ãã好ãå«ãã¯äººããããã§ãããèªåã«åã£ãç¾å³ãããã®ãè¦ã¤ããã®ãé£ã®æ¥½ãã¿ã®ã²ã¨ã¤ã ã¨æã£ã¦ãã¾ããã
ãã®æ¡å¼µããã¹ãã®ã³ãµã¤ã³é¡ä¼¼åº¦ã BERT (tohoku-nlp/bert-base-japanese-v3) ã§è¨ç®ããã¨ãA'ï¼ç§ã¯ã寿å¸ã好ãã§ããï¼ã¨ B'ï¼ç§ã®å¥½ããªé£ã¹ç©ã¯ã寿å¸ã§ããï¼ã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.9494ãA'ï¼ç§ã¯ã寿å¸ã好ãã§ããï¼ã¨ C'ï¼ç§ã¯ã寿å¸ã好ãã§ã¯ãªãã§ããï¼ã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.9316 ã§ãããåãæè¦ãè¿ããã¨ãæ£ããèªèã§ãã¦ãã¾ãã
ãã®è§£æ³ããã¢ãã«ãæ±ãã®ãé£ããå¦å®æãç¡çããè¯å®æã«ç´ãã¦è§£ãã¦ããã¨ã¿ããã¨ãã§ãã¾ããä¸è¨ã® C' ã§ã¯ãã寿å¸ã好ãã§ã¯ãªãããæ¡å¼µãããã¨ã§ãåæããè¶ç¢è¸ãããã©ããªã©ã®ãµã¤ãã¡ãã¥ã¼ãé ¼ããã¨ããè¯å®æã«ç¹ãã£ã¦ãã¾ããããã«ããè¯å®æã©ããã®æ¯è¼ããããã¨ã§æ¸ãã§ãã¾ããæ£é¢ããå¦å®æã対å¦ããã®ã§ã¯ãªããåãéããããã¨ã§å¦å®æãç´æ¥æ±ãã®ãé¿ããåé¡ã®çºçãæå¶ãã¦ããã¨è¨ãã¾ãã
ããããããã¾ã§æ¡å¼µãã¦ãã¾ãã°ãBERT ã使ãã¾ã§ããªããåç´ãª bag-of-words ãã¯ãã«ã§ããA' 㨠B' ã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.8424ãA' 㨠C' ã®ã³ãµã¤ã³é¡ä¼¼åº¦ã¯ 0.8223 ã¨ãªããåãæè¦ãæ£ããåå¾ãããã¨ãã§ãã¦ãã¾ãã¾ãã
ããã¹ããããç¨åº¦é·ããªãã¨ãåãæè¦ã®ããã¹ãã§ã¯åèªè¢«ããå¢ãã¦ãã¾ããããã¹ããé·ããªãã«ã¤ãã¦ãã©ãã¬ã¼ãºãããããã¦ãã©ãã©ãã¨åèªè¢«ããå¢ãã¦ããã¾ããçãããã¹ãã ã¨åãæè¦ã§ããã¾ãã¾éã表ç¾ããã¦ãã¾ã£ãããç°ãªãæè¦ã§ããã¾ãã¾è¡¨ç¾ã被ã£ãããããã¨ãããã¾ãããããã¹ããé·ããªãã«ã¤ãã¦ãããã£ãå¶ç¶ã«ããã©ã³ãã³ã°ã®ãã¬ãç¡ããªã£ã¦ããã¾ãããã®ãããããç¨åº¦ããã¹ããé·ããªãã¨ãåç´ãª bag-of-words ãã¯ãã«ã§ãååã«ãªã£ã¦ããã¾ãã
ããããçæ³çãªé¡ä¼¼åº¦é¢æ°ã追ãæ±ããããããææ¸ã®å´ã工夫ããã¢ããã¼ãã«åã£ãæ¹ããã¾ããããã®ä¾ã®ä¸ã¤ã§ãã
RAG ã§ã¯ããã¨åæ§ã®èããããæèä»ãæ¤ç´¢ (Contextual Retrieval) ãæå¹ã§ãã
å ¸åç㪠RAG ã®å®è£ ã§ã¯ãææ¸ã¯ããã»ã¼ã¸ããã£ã³ã¯ã¨å¼ã°ããæ°ç¾æåç¨åº¦ã®å°ããªåä½ã«åå²ããããã£ã³ã¯åä½ã§æ¤ç´¢ãè¡ãã¾ãããã®ãã£ã³ã¯ã¯çãã®ã§ãæ¤ç´¢ãé£ãããå¦å®ããã¾ãå¦çã§ããªãã£ãããã¾ãã
ãã®ã¨ãã«ãããã¹ãæ¡å¼µãæå¹ã§ããAnthropic ã®ææ¡ããæèä»ãæ¤ç´¢ã§ã¯ã以ä¸ã®ãããªããã³ããã§ãã£ã³ã¯ãæ¡å¼µãã¾ãã
<document> {{ææ¸å ¨ä½}} </document> 以ä¸ã¯ãææ¸å ¨ä½ã«é ç½®ããããã£ã³ã¯ã§ãã <chunk> {{ãã£ã³ã¯ããã¹ã}} </chunk> ãã®ãã£ã³ã¯ã®æ¤ç´¢çµæã®æ¹åãç®çã¨ãã¦ãããã¥ã¡ã³ãå ¨ä½ã«ãã®ãã£ã³ã¯ãé ç½®ããããã®ç°¡æ½ãªã³ã³ãã¯ã¹ããæ示ãã¦ãã ãããç°¡æ½ãªã³ã³ãã¯ã¹ãã®ã¿ãåçãããã以å¤ã¯åçããªãã§ãã ããã
ãããããã¨ã§ãææ¸å ¨ä½ã®æèã«æ²¿ã£ãå 容ã§ããã»ã¼ã¸ãæ¡å¼µã§ãããã£ã³ã¯ãã®ãã®ã«ä»éãã¦ããªãæ å ±ãæ¤ç´¢ã§ç¨ãããã¨ãã§ããããã«ãªãã¾ãã
æ¤ç´¢ãã¼ã¿ãã¼ã¹ã«ã¯ãªãªã¸ãã«ã®ãã£ã³ã¯ã¨æ¡å¼µããé¨åãé£çµãã¦å ¥ãã¦ãããæ¤ç´¢ãçµãã£ããã¨ã®çæ段éã§ã¯ãªãªã¸ãã«ã®é¨åã ãã使ãã¾ãã
ã¾ãããã®ããã³ããã¯ååã®
<document> {{ææ¸å ¨ä½}} </document> 以ä¸ã¯ãææ¸å ¨ä½ã«é ç½®ããããã£ã³ã¯ã§ãã <chunk>
ã®é¨åãå ¨ã¦ã®ãã£ã³ã¯ã§å ±éã§ãããªã®ã§ããã®é¨åã«ç¸å½ãããã³ã¼ãã¼ã®è¨ç®ã¯åè¨ç®ãã¦ãããæ®ãå ãã®ãã£ã³ã¯ã«ç¸å½ããé¨åã ããé½åº¦è¨ç®ããã°ãããªãã¾ããChatGPT ã Claude ã¯ãã®ãããªããã³ããã®ãã£ãã·ã¥ã«å¯¾å¿ãã¦ãããç¹ã« Claude ã¯ãã®ãã£ãã·ã¥ã«ããè²»ç¨ã®åæ¸ãé常ã«å¤§ããã®ã§ããã®æ¹æ³ã使ããã¨ã§å®ä¾¡ã«ãã£ã³ã¯ãæ¡å¼µãããã¨ãã§ãã¾ãã
Anthropic ã®å ±åã§ã¯ããã®æ¹æ³ã«ããææ¸åå¾ãã¹ã 5.0% ãã 2.9% ã«ã¾ã§æ¸å°ããã¨ããã¦ãã¾ããï¼æ³¨ï¼ãã®å ±åèªä½ã¯å¦å®æãæ±ããã®ã§ã¯ããã¾ãããä¸è¬ã«ããã®ãããªå¦çãè¡ãã¨æ¤ç´¢ã®ç²¾åº¦ãä¸ããã¨ããå ±åã§ãããã ãããã®æ¹åã¯å®ç§ã§ã¯ãªãã«ãããå¦å®æã«ãåç¨åº¦ã«ã¯æå¹ã§ããã¨èãããã¾ããï¼
ãããã«
ãã¯ããå¦å®æã®æ±ãã¯ã¾ã æ ¹æ¬ããã¯è§£æ±ºãã¦ããªãã¨ããå°è±¡ã§ãã
æ ¹æ¬çã«ã¯ãæ¬æä¸ã§ãè¿°ã¹ãããã«ãåºå層ã®åèªåãè¾¼ã¿ãã³ã³ããã¹ãã«ä¾åããããã«æ¹å¤ãããã¨ãå¿ è¦ã§ãããã ããããããããã¨è¨ã£ã¦ãã ã¡ã«è§£æ±ºããããã§ã¯ãªããè¨ç·´ã³ã¼ãã¹ä¸ã«å¦å®æã®å²åãå°ãªãããããæ¥æ¬ããé¦é½ãã¨ãæ±äº¬ãã®å ±èµ·ãå¼·ãã®ã§ããã®å ±èµ·ã®å£ãä¸åãã»ã©ã®æ師信å·ãå¾ããããçµå±ã·ã§ã¼ãã«ãããå¦ç¿ããã¦ãã¾ãå¯è½æ§ãããã¾ãã
æ¬æä¸ã§ã¯ã³ã³ããã¹ãã«ä¾åããåèªåãè¾¼ã¿ã使ãã®ã¯ã³ã¹ãã大ããã®å¦å®æãæ±ããããã«ããããã ãã«ããããæ¡ç¨ããããã¨ã¯å°ãªãã¨æ¸ãã¾ããããBig Tech æã¡åã®è¨ç®ãªã½ã¼ã¹ã¨ãã¼ã¿ãªã½ã¼ã¹ã§ãããªå£ã¯è»½ã ã¨è¶ ãã¦ããããããã¾ããããç´æ¥çã«è§£æ±ºããªãã¦ãã対ççæ³çãªææ³ã Big Tech æã¡åã®ç©éã§è£ããã¨ã§ãåå¨ãæããããªããããã®åé¡ã«ãªãããããã¾ããã
ãããã«ãã¦ããæ ¹æ¬çãªåé¡ã¯åºç¤ã¢ãã«ã®ã¢ã¼ããã¯ãã£ã¨è¨ç·´æ¹æ³ã¨ããæ·±ãã¨ããã«ããã®ã§ãã¦ã¼ã¶ã¼å´ããã¯æ ¹æ¬çã«ã¯è§£æ±ºã§ãã¾ããããªã®ã§ãã¾ãã¯ãã®ãããªåé¡ããããã¨ãèªèãããã®ä¸ã§ãã®åé¡ã«ã¶ã¡å½ãã£ãã¨ãã«ã¯ä¸ã§è¿°ã¹ã対ççæ³ã§ãããããã¦ãããã¨ãéè¦ã ã¨æãã¾ãã
å¦å®æã®æ±ãã«ã¤ãã¦ã¯ @awakia ããã¨ãã£ã¹ã«ãã·ã§ã³ãããã¡ã«èããæ·±ã¾ãã¾ãããããã«å¾¡ç¤¼ç³ãä¸ãã¾ãã
çããããã®åé¡ã«ã¤ãã¦ã®ãæè¦ããææ³ãªã©ããã° SNS çã§ã³ã¡ã³ãããã ããã°å¬ããã§ãããã²çããããã®åé¡ã«ã¤ãã¦èãã¦ã¿ã¦ãã ãããã
é£çµ¡å : @joisino_ / https://joisino.net
â¼æèçºå£²ä¸