è¨èªã¢ãã«ã®ç©çå¦ (Physics of Language Models) ã¨ã¯ãFAIR (Meta) ã® Zeyuan Allen-Zhu ãæå±ãããè¨èªã¢ãã«ã®ç ç©¶ãé²ããããã®ã³ã³ã»ããã§ãããã£ããè¨ãã¨ãããã®ã¢ãã«ã¯ãããã¨ãããã®ã¢ãã«ã¯ãã®ã¢ãã«ããããããã¨ãããããªåç©å¦çãªç¥èãæ·±ããã®ã§ã¯ãªãã17ä¸ç´ã«ã±ãã©ã¼ããã¥ã¼ãã³ãç©çå¦ã«ããã¦è¡ã£ããããªåçã«åºã¥ããç ç©¶ãé²ãããè¨èªã¢ãã«ã¯ãªããã®ãããªæ¯ãèããããã®ããã¨ããåãã«çããããããã«ãªãã¹ãã¨ããèãæ¹ã§ãã
è¨èªã¢ãã«ã®ç©çå¦ã®ç¹å¾´ã¯å¤§ãã2ã¤ããã¾ãã
第ä¸ã¯ãã¦ã§ãããåéããã³ã¼ãã¹ã使ããããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ã£ã¦è¨èªã¢ãã«ãè¨ç·´ããã¨ãããã¨ãã¦ã§ãã¯èª°ãå ¨ä½åãçè§£ã§ããªãã»ã©è¤éã§ããã¤ãºã«ã¾ã¿ãã¦ãã¾ããæ¬ç©ã®ç©çå¦ã§ãç©ºæ°æµæãæ©æ¦ãããã¨ããéçã¯ç¾½ãããæ©ãè½ä¸ããã®ã§ã¯ãªãããã¨ãããããªèª¤ã£ãèªèãçã¿ã¾ããåçãè¦ã¤ããã«ã¯ç空ã§å®é¨ãè¡ãã®ãé©åã§ãããã¤ãºã®å¤ãã¦ã§ããã¼ã¿ããè¨èªã¢ãã«ã«ãããç©ºæ°æµæã®ãããªæªå½±é¿ãããããããã¾ãããè¨èªã¢ãã«ã®ç©çå¦ã§ã¯ããã³ã³ããã¼ã«ããããã¼ã¿ã»ããã§ããã¹ã¯ã©ãããããè¨èªã¢ãã«ãè¨ç·´ãããã¨ã第ä¸ã¨ãã¦ãã¾ãããã¨ãã°ãå¾ã«ç´¹ä»ããããã«ã10 ä¸äººã®æ¶ç©ºã®äººç©ã®ä¼è¨ã使ããããã®ã¿ã§è¨èªã¢ãã«ãè¨ç·´ããå ´åãè¨èªã¢ãã«ã¯ãããã®äººç©ã«ã¤ãã¦ã©ã®ãããªãã¨ãè¨ããããã¨ãããããªå®é¨ããã¦ãã¾ãã
第äºã¯ãè¨èªã¢ãã«ã®å é¨ç¶æ ã調æ»ããã©ã®ã¿ã¤ãã³ã°ã§ãã©ã®å±¤ã§ãä½ãã§ããããã«ãªã£ããã調ã¹ãã¨ãããã¨ãè¨èªã¢ãã«ãåºåããæååã¯è¨èªã¢ãã«ããèããããã¨ã®ä¸é¨ã«éãã¾ãããè¨èªã¢ãã«ã®åºåã®è¡¨é¢ã ãè¦ã¦ãã¦ã¯ã使 ãã®ãããªæ¯ãèãã«ãªã£ãã®ãã¾ã§ã¯åããã¾ããããã®ãããè¨èªã¢ãã«ã®ç©çå¦ã§ã¯ãè¨èªã¢ãã«ã®å é¨ç¶æ ã«è¸ã¿è¾¼ã¿ãããç´°ãããªåæãè¡ãã¾ãããã¨ãã°ãå¾ã«ç´¹ä»ããããã«ãè¨èªã¢ãã«ã«å¤æ®µéã®ç®æ°ã®åé¡ãå ¥åãã¦ãè¨èªã¢ãã«ãé䏿®µéã§ééããã¨ããå®ã¯è¨èªã¢ãã«èªèº«ããã®æ®µéã§èªåãééãããã¨ã«æ°ã¥ãã¦ãã¾ããããã§ãè¨èªã¢ãã«ã¯åºåãç¶ããªãã¨ããã¾ããããããã®ã¾ã¾ããã¨ééãã¨åãã£ãä¸ã§æå¾ã¾ã§æµæ¢ã«åºåãç¶ãã¾ããããã¦å½ç¶ãæçµçµè«ãééãã¾ããåã«æ£è§£ãããééãããã¨ããåºåã¬ãã«ã®çè§£ã ãã§ãªããå é¨ã¾ã§èª¿ã¹ããã¨ã§ãããæ·±ãæ´å¯ãå¾ããã¨ãç®æãã¾ãã
æ¬ç¨¿ã§ã¯ãè¨èªã¢ãã«ã®ç©çå¦ã®åºæ¬çãªèãæ¹ãç´¹ä»ããã¨ã¨ãã«ã6 æ¬ã®è«æãããªããè¨èªã¢ãã«ã®ç©çå¦ã®ä¸é£ã®ç ç©¶ã«ã¤ãã¦è©³ãã解説ãã¾ããããããèªãã°è¨èªã¢ãã«ã®æ¯ãèãã®åçã¸ã®çè§£ãæ·±ã¾ãã»ããå®éã«è¨èªã¢ãã«ãè¨ç·´ããããã®ç¥è¦ã大ãã«æ·±ã¾ãã¯ãã§ã
ãªããæå±è ã® Zeyuan Allen-Zhu ã¯ï¼ãã®ï¼LoRA è«æã®èè ã®ä¸äººã§ãããã¨ã§æåã§ããããã¤ã¦ã¯å½éæ å ±ãªãªã³ããã¯ã§éã¡ãã« 2 æãACM-ICPC World Finals ã§ 2 ä½ãåããªã©ç«¶æããã°ã©ãã³ã°ã§åã馳ããã»ããFOCS ã STOC ã«æ¡æããã¤ã¤ ICML ã«ãä¸å¹´ã§ãããã¡ã¼ã¹ããªã¼ãµã¼ã§ã 5 æ¬æ¡æããããªã©ã®å®ç¸¾ãããã¹ã¼ãã¼ç ç©¶è ã§ãããã¨ãç³ãæ·»ãã¦ããã¾ãã

ç®æ¬¡
- ç®æ¬¡
- è¨èªã¢ãã«ã®ç©çå¦
- Physics of Language Models: Part 1, Learning Hierarchical Language Structures (arXiv 2023)
- Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process (ICLR 2025)
- Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems (ICLR 2025)
- Physics of Language Models: Part 3.1, Knowledge Storage and Extraction (ICML 2024)
- è¨èªã¢ãã«ã¯äºåè¨ç·´æã«ä½ããã®è³ªåãè¦ã¦ããå¿ è¦ããã
- è¨èªã¢ãã«ã¯ãã¼ã¿å¢å¼·ããã¨è³ªåã«çããããããã«ãªã
- è¨èªã¢ãã«ã¯ãã¼ã¿å¢å¼·ãããªãã¨å ¨ã¦ã®æ å ±ããããã¾ã§æ å ±ãæ½åºã§ããªã
- è¨èªã¢ãã«ã¯ä¸é¨ã®äººç©ã«ã¤ãã¦ã®ãã¼ã¿å¢å¼·ã ãã§ããã¾ãæ å ±ãè¨æ¶ã§ããããã«ãªã
- Physics of language models: Part 3.2, knowledge manipulation (ICLR 2025)
- Physics of language models: Part 3.3, knowledge capacity scaling laws (ICLR 2025)
- ãããã«
è¨èªã¢ãã«ã®ç©çå¦
è¨èªã¢ãã«ã®ç©çå¦ã®ç°¡åãªèª¬æã¯åé ã§è¿°ã¹ãéãã§ããããã¸ã§ã¯ããã¼ã¸ã«ã¯ä»¥ä¸ã®ã¹ãã¼ãã¡ã³ããæ²ãããã¦ãã¾ãã
Apples fall and boxes move, but universal laws like gravity and inertia are crucial for technological advancement. While GPT-5 or LLaMA-6 may offer revolutionary experiences tomorrow, we must look beyond the horizon. Our goal is to establish universal laws for LLMs that can guide us and provide practical suggestions on how we can ultimately achieve AGI. ï¼ãªã³ã´ã¯è½ã¡ããç®±ã¯åããããããéåãæ £æ§ã®ãããªæ®éçãªæ³åã¯æè¡ã®é²æ©ã«ä¸å¯æ¬ ã§ãããGPT-5 ã LLaMA-6 ãææ¥ãé©å½çãªä½é¨ããããããããããªãããç§ãã¡ã¯å°å¹³ç·ã®åããå´ãè¦æ®ããªããã°ãªããªããç§ãã¡ã®ç®æ¨ã¯ãç§ãã¡ãå°ããæçµçã« AGI ãéæããæ¹æ³ã«ã¤ãã¦ã®å®è·µçãªææ¡ãæä¾ã§ãã LLM ã®æ®éçãªæ³åã確ç«ãããã¨ã§ãããï¼
ãè¨èªã¢ãã«ã®ç©çå¦ãã¨è¨ã£ã¦ããåå¦ãé»ç£æ°å¦ãè¨èªã¢ãã«ã«å¯¾ãã¦é©ç¨ããã®ã§ã¯ããã¾ããããç©çæ³åã®ãããªæ®éçãªæ³åãè¨èªã¢ãã«ã«ããã¦è¦ã¤ããããã®ç ç©¶ãã¨ãããããã®æå³ã§ããã³ã³ã»ããåãå°ãç´ããããã§ãããããå°ãæè¨³ãããªããè¨èªã¢ãã«ã®çª®çå¦ããããã§ãããããæå±è ã® Zeyuan Allen-Zhu èªèº«ãä¸å½èªã§ãè¯è¨æ¨¡åç©çå¦ãã¨è¡¨ç¾ãã¦ããã®ã§ãæ¬ç¨¿ãããã«åãããè¨èªã¢ãã«ã®ç©çå¦ãã¨ããç¨èªãç¨ãã¾ãã
æ¬ç¨¿ã®åé ã§ã¯ãã¦ã§ãããåéããã³ã¼ãã¹ã使ããããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ã£ã¦è¨èªã¢ãã«ãè¨ç·´ãããã¨ã第ä¸ã®ç¹å¾´ã ã¨è¿°ã¹ã¾ãããè¨èªã¢ãã«ã®ç©çå¦ã®ããã¸ã§ã¯ããã¼ã¸ã§ã¯ãããã¨å ±ã«ããã³ããã¼ã¯ãéè¦ããç¾å¨ã®é¢¨æ½®ãæ¹å¤ãã¦ãã¾ããè¿å¹´ãæ¯æãæ¯é±ãæ¯æ¥ã®ããã«ããã³ããã¼ã¯ã®æ°ããªã¹ã³ã¢ãæã¡ç«ã¦ããã大ã çã«å®£ä¼ããã¦ãã¾ããããããããã¯æ¬å½ã«éè¦ãªãã¨ã§ããããï¼ããã¯æ¬å½ã«é²æ©ã¨è¨ããã§ããããï¼ãã³ããã¼ã¯ã¯æã¨ã¨ãã«é³è åããéå½ã«ããã¾ãããã³ããã¼ã¯ãæåã«ãªãã°ãªãã»ã©ããã®ãã³ããã¼ã¯ã¸ã®è¨åã解説ãå ·ä½ä¾ãã¦ã§ãã«å¢ãã¦ããã¾ãããã¹ããã¼ã¿ã®çããã©ããã®ã¦ã§ããµã¤ãã«è¼ããã¨ãããããããã¾ãããããããåãè¾¼ãã æ°ããã¢ãã«ããã®ãã³ããã¼ã¯ããã¾ãè§£ããããã«ãªãã®ã¯å½ç¶ã§ããããã¡ããã¨ããå®é¨ã§ã¯ãè¨ç·´ãã¼ã¿ãããã¹ããã¼ã¿ããã£ã«ã¿ãªã³ã°ãã¦ãã¾ãããå¥è¨èªã«ç¿»è¨³ãã¦æ²è¼ãããããæ°å¼ãè¨èã§èª¬æãã¦æ²è¼ãããããããã®ã¯é¤å»ãéãã¦ãã¾ãããããã¾ããããããã£ãæ å ±æ¼æ´©ã¯å®å ¨ã«ã¯é²ãããã¯ãªããæã¨ã¨ãã«ãã®æ°ã¯å¢ãã¦ãã¾ãã¾ããã¦ã§ãã¯èª°ãå ¨ä½åãçè§£ã§ããªãã»ã©è¤éã§ããã¤ãºã«ã¾ã¿ãã¦ãã¾ããã©ã®ãããªå½¢ã§ãããã®ãã¤ãºããã³ããã¼ã¯ã«å½±é¿ãããã¯èª°ã«ãåããã¾ãããè¨èªã¢ãã«ã®ç©çå¦ã§ã¯ãã®ç¹ãæ·±ãæ¸å¿µããã¦ã§ãããåéããã³ã¼ãã¹ã使ããããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ããã¨ã§ã確å®ãªå½¢ã§ãã®åé¡ãåé¿ãã¦ãã¾ãã
第äºã®ç¹å¾´ãè¨èªã¢ãã«ã®å
é¨ç¶æ
ã調æ»ã§ã¯ãç·å½¢ããã¼ãã³ã°ãä½ã©ã³ã¯ããã¼ãã³ã°ã¨ããæè¡ã主ã«ç¨ãã¾ããç·å½¢ããã¼ãã³ã°ã§ã¯ãå
é¨ç¶æ
ãå
¥åã¨ãã¦åãåãç·å½¢ã¢ãã«ã§ã©ã®ãããªã¿ã¹ã¯ãè§£ãããã調æ»ãã¾ããä½ã©ã³ã¯ããã¼ãã³ã°ã§ã¯ãã¢ãã«ã®ã¡ã¤ã³ãã©ã¡ã¼ã¿ãåºå®ããç¶æ
ã§ä½ã©ã³ã¯æåï¼ãããã LoRAï¼ãä»å ãã¦ã©ã®ãããªã¿ã¹ã¯ãè§£ãããã調æ»ãã¾ãããã®ãããªåç´ãªè£å©ã¢ãã«ã§ã¿ã¹ã¯ãè§£ãã ã¢ãã«ã¯æ¢ã«å
é¨ã§ãã®ã¿ã¹ã¯ãè§£ãã¦ããã¨èãã¾ããè¨èªã¢ãã«ã®ç©çå¦ã®è«æä¸ã§ã¯ mentallyï¼é ã®ä¸ã§ï¼ã¨ããåèªãããç»å ´ãã¾ããæ¬ç¨¿ã§ãããã¢ãã«ã¯é ã®ä¸ã§ãã®ã¿ã¹ã¯ãè§£ãã¦ããããé ã®ä¸ã§åãã£ã¦ãããã¨ãããããªè¡¨ç¾ãç¨ãã¾ããä¾ãã°ãåé ã§è¿°ã¹ãããã«ãã¢ãã«ãç®æ°ã®åé¡ãè§£ãã¦ããé䏿®µéã«ããã¦å
é¨ç¶æ
ãåãåºããç·å½¢ã¢ãã«ã«å
¥åããã¨ãã¹ãã¦ãã¾ã£ããããã¦ããªãããé«ã精度ã§åé¡ã§ãã¾ããããã¯ã¤ã¾ãããã¹ãã¦ãã¾ã£ãããã¦ããªããããåãåºããããå½¢ã§ï¼ç·å½¢ã¢ãã«ã§ãåãåºããå½¢ã§ï¼å
é¨ç¶æ
ã«ã¨ã³ã³ã¼ãããã¦ããã¨ãããã¨ã§ãããã¤ã¾ããèªåããã¹ãã¦ãã¾ã£ããã©ãããã¢ãã«ã¯é ã®ä¸ã§æ°ã¥ãã¦ããã¨ãããã¨ã§ããã¢ãã«ã®é ã®ä¸ã®åãã¨ããã¨ãæèã®é£é (Chain of ThoughtãDeepSeek-R1 ã OpenAI o1 ã®éä¸çµæãªã©ï¼ãæãæµ®ãã¹ãæ¹ãå¤ãããããã¾ãããããããããã¼ãã³ã°ã¯æèã®é£éã¨ã¯ç°ãªãã¾ããæèã®é£éã¯ããã°å£°ã«åºãã¦èãã¦ããç¶æ
ã§ããããã¯ã¢ãã«ã®æèã®ããä¸é¨ã«ããã¾ãããããã¼ãã³ã°ã使ãã°å£°ã«åºããã«ã¢ãã«ãèãã¦ãããã¨ããããã¾ãã以ä¸ã«è©³ããè¦ãããã«ãè¨èªã¢ãã«ã¯å£°ã«åºãããã¨ä»¥å¤ã«ããå®ã¯é ã®ä¸ã§è²ã
ãªãã¨ãèãã¦ããã®ã§ãã
è¨èªã¢ãã«ã®ç©çå¦ããã¸ã§ã¯ãã§ã¯ã2025 å¹´ 3 æç¾å¨ã6 æ¬ã®è«æãå ¬éããã¦ãã¾ãã大ãã 3 ã¤ã®ç« ã«åããã¦ããã第ä¸ç« 㯠Hierarchical Language Structuresï¼é層çè¨èªæ§é ï¼ã第äºç« 㯠Grade-School Mathï¼å°å¦æ ¡ã®ç®æ°ï¼ã第ä¸ç« 㯠Knowledgeï¼ç¥èï¼ã¨åä»ãããã¦ãã¾ããããããã®è«æã¯ç¬ç«ããè«æã¨ãã¦èªãã¾ããã以ä¸ã«è¿°ã¹ãã³ã³ã»ãããä¸é£ã®è«æãç¹ã縦糸ã¨ãã¦éã£ã¦ãã¾ãã
以ä¸ãããããã®è«æã解説ãã¾ãã
Physics of Language Models: Part 1, Learning Hierarchical Language Structures (arXiv 2023)
æ¥æ¬èªã«è¨³ãã¨ãé層çè¨èªæ§é ã®å¦ç¿ãã§ãã
å®ã¯ãã®è«æãè¨èªã¢ãã«ã®ç©çå¦ã®ä¸ã§æãé£ããè«æã§ããè¨èªã¢ãã«ã®ç©çå¦ãå¦ã¼ãã¨æã£ãäººãæ«æããè¦å ã¯ããã«ããããã«æãã¾ããã§ããã ãå¹³æãªèª¬æãå¿ããã¾ããããã®è«æã®å 容ã¯ä»¥éã®ç« ã«å½±é¿ããªãã®ã§é£ããã¨æã£ãæ¹ã¯ãã®ç« ã¯é£ã°ãã¦æ¬¡ã«é²ãã§ããã£ã¦åé¡ããã¾ããã
ãã®è«æã®ä¸»è¦ãªçµæãä¸è¨ã§è¡¨ãã¨ãè¨èªã¢ãã«ã¯è¤éãªæèèªç±ææ³ãæ£ç¢ºã«å¦ç¿ã§ããã¨ãããã¨ã§ããããã¦è¨èªã¢ãã«ã¯é ã®ä¸ã§ã¯ãåçè¨ç»æ³ãé§ä½¿ãã¦ãã®åé¡ãè§£ãã¾ãã
ãã®è«æã¯è¨èªã¢ãã«ã®ç©çå¦ã®èãæ¹ã®ãã¡ã第ä¸ã®ç¹å¾´ã§ããããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ã£ã¦è¨èªã¢ãã«ãè¨ç·´ãããã¨ããã¨ãããçªãè©°ãããã®ã¨è¨ãã¾ããæèèªç±ææ³ã¨ããï¼å®ç§ãªãªã¢ãªãã£ããç¡ããããããªããï¼ããã³ã³ããã¼ã«ãããè¦åã«æ²¿ã£ã¦ã³ã¼ãã¹ãçæãããããå ã«è¨èªã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ãã¾ãã
åé¡è¨å®
æèèªç±ææ³ã¨ã¯ã[root] ã¨ããè¨å·ããã¯ããã¦
[root] â [æ] [æ] â [åè©å¥] [åè©å¥] [åè©å¥] â [å è©] [åè©] | [åºæåè©] [åè©å¥] â [åè©] [åè©å¥] [å è©] â "the" | "a" [åè©] â "cat" | "dog" | "saw" | "telescope" [åè©] â "saw" | "loved" [åºæåè©] â "John" | "Mary"
ã¨ãããããªçæè¦åã«æ²¿ã£ã¦ç½®æãç¹°ãè¿ãã¦çæããå½¢å¼è¨èªã®ã¯ã©ã¹ã§ããä¾ãã°ä¸ã®ææ³ã®å ´åã
- [root] â [æ] â [åè©å¥] [åè©å¥] â [å è©] [åè©] [åè©] [åè©å¥] â [å è©] [åè©] [åè©] [å è©] [åè©] â the cat saw a dog
- [root] â [æ] â [åè©å¥] [åè©å¥] â [åºæåè©] [åè©] [åè©å¥] â [åºæåè©] [åè©] [åºæåè©] â Mary loved John
- [root] â [æ] â [åè©å¥] [åè©å¥] â [åºæåè©] [åè©] [åè©å¥] â [åºæåè©] [åè©] [å è©] [åè©] â John saw a saw
ã®ãããªæãçæã§ãã¾ãã
éè¦ãªç¹å¾´ãäºã¤ããã¾ãã
第ä¸ã¯ææ§æ§ãä¸ã®ä¾ã®ããã« saw ã¯åè©ï¼è¦ãï¼ã¨åè©ï¼ã®ãããï¼ã®äºã¤ã®å½¹å²ãããã¾ããåºåããããã¼ã¯ã³ saw ãè¦ãã ãã§ã¯ãã©ã¡ãã®å½¹å²ããåããã¾ããã
第äºã¯é·ãä¾åæ§ãä¸è¨ã®ä¾ã¯åç´ãªã®ã§ããã¯çºçãã¾ããããä¾ãã° [åè©å¥] â "the" [åè©] "that" [åè©å¥] ãç¾å¨å½¢ã»è¤æ°å½¢ãªã©ã®è¦åããããthe cats that saw a dog that loved a telescope ã¾ã§çæããã¨ããæ¬¡ "jump" ãåºåãã¦ããã®ã "jumps" ãåºåãã¦ããã®ãã¯ãé·ãç¯ã®å ãåæ°å½¢ãªã®ãè¤æ°å½¢ãªã®ãã確èªããªãã¨æ±ºå®ã§ãã¾ããï¼çãï¼cats ã¨è¤æ°å½¢ãªã®ã§ jumps ãåºåãã¦ã¯ãªãããjump ãåºåããã¹ãã§ããï¼
ãããã®ç¹å¾´ã¯ç¾å®ã®è¨èªã«ããããæèèªç±ææ³ã¯ãããç´ç²ãªå½¢ã§åãåºãã¦ãã¾ããå®é¨ã§ã¯ãè¨èªã¢ãã«ã¯ãããã®ææ§æ§ã¨é·ãä¾åæ§ã確ããªå½¢ã§æ±ãããã¨ã確èªãã¾ãã
ãã®è«æã§ã¯ã以ä¸ã®ãããªããªãè¤éãªæèèªç±ææ³ãç¨ãã¾ããããã¯éå¸¸ã«ææ§æ§ãé«ããé·ãä¾åæ§ãæã¡ã¾ãã

ãã®ææ³ã«å¾ããé·ã 512 ã®ããã¹ããã©ã³ãã ã« 960 ä¸åçæããè¨èªã¢ãã«ãã¹ã¯ã©ãããã next token prediction ã§ 1 ã¨ããã¯è¨ç·´ãã¾ããè¨èªã¢ãã«ã«ã¯ææ³ã«ã¤ãã¦ã®ç¥èãç´æ¥ã¯ä½ãæç¤ºããªããã¨ã«æ³¨æãã¦ãã ãããææ³ããçæãããæååããã è¨ç·´ãã¼ã¿ã¨ãã¦å ¥åããã ãã§ãã
ãã¹ãæã«ã¯ãè¨ç·´ããè¨èªã¢ãã«ã«ãã¡ããããã¹ããçæãããããææ³ã«å¾ã£ã¦æ°ããçæããæååãéä¸ã¾ã§å ¥åããç¶ããçæãããããã¾ãã
è¨èªã¢ãã«ã¯è¤éãªæèèªç±ææ³ãæ£ç¢ºã«å¦ç¿ã§ãã
ãããã¦è¨ç·´ãããè¨èªã¢ãã«ããã¡ããçæããããã¹ãã¯ã»ã¼ç¢ºå®ã«ï¼99%~ ã®ç¢ºçã§ï¼æ£ããææ³ã§ããã ã¾ããçæããããã¹ãã¯ååãªå¤æ§æ§ãæã¡ããçã®ç¢ºçåå¸ãã¨ã® KL ãã¤ãã¼ã¸ã§ã³ã¹ã¯å°ãããã¨ã確èªããã¾ãããææ³ã®æ£ããããã§ãã¯ããã ãã§ã¯ãªãã夿§æ§ãåå¸è·é¢ããã§ãã¯ãããã¨ã¯éè¦ã§ããä¾ãã°ãä¸è¿°ã® the cats that saw a dog that loved a telescope ã®ä¾ã®å ´åãã¨ãããã jumped ã¨éå»å½¢ã§åºåãã¦ããã°ãåæ°ã»è¤æ°ã®åé¡ãæ±ããã«æ¸ã¿ãæ£ããææ³ã®æãåºåã§ãã¾ããããããããã§ã¯ææ³ãæ£ããå¦ç¿ããã¨ã¯è¨ãã¾ãããåå¸è·é¢ã調ã¹ããã¨ã§ç¢ºãã«è¨èªã¢ãã«ã¯ææ³ã®ãããããã¿ã¼ã³ãæ£ãããéãããã¨ãªããå¦ç¿ã§ãã¦ãããã¨ã確èªã§ãã¾ãããçã®ç¢ºçåå¸ãã¨ã®è·é¢ã¯ãªã¢ã«ãã¼ã¿ã ã¨è¨ç®ãã§ããªãã®ã§ãããã確èªã§ãããã¨ã¯ããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ã£ã¦è¨èªã¢ãã«ãè¨ç·´ããããã¨ã®å©ç¹ã¨è¨ãã¾ãã
è¨èªã¢ãã«ã¯é ã®ä¸ã§åçè¨ç»æ³ã使ã£ã¦ãã
ç¶ãã¦ãå é¨ç¶æ ã®èª¿æ»ã«ã¤ãã¦ã§ãã

ã¾ããå é¨ç¶æ ã«å¯¾ããç·å½¢ããã¼ãã³ã°ã§ãååºåãã¼ã¯ã³ã®å ã¨ãªãéçµç«¯è¨å·ï¼[åè©å¥] ã [åè©å¥] ãªã©ï¼ãæ£ç¢ºã«æ¨å®ã§ãã¾ããã¤ã¾ããè¨èªã¢ãã«ã¯ "saw" ã®ããã«ææ§ãªãã¼ã¯ã³ãåºåãã¾ãããé ã®ä¸ã§ã¯ãããåè©ãåè©ããåãã£ãä¸ã§åºåãã¦ãã¾ããç¹°ãè¿ãã«ãªãã¾ãããè¨èªã¢ãã«ã«ã¯ææ³ã«ã¤ãã¦ã®ç¥èãç´æ¥ã¯ä½ãæç¤ºããªããã¨ã«æ³¨æãã¦ãã ãããåã« next token prediction ã§çã®ããã¹ãã§è¨èªã¢ãã«ãè¨ç·´ããã ãã§ããåè©ããåè©ããªã©ã®ææ³æ§é ãé ã®ä¸ã§ç¿å¾ã§ããã¨ãããã¨ã§ããããããããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ã£ã¦è¨èªã¢ãã«ãè¨ç·´ããããã¨ã®å©ç¹ã§ããã¦ã§ãã«ã¯ææ³ã®è§£èª¬ãæ²è¼ããã¦ãããè¨èªã¢ãã«ã¯ããã§ææ³ãå¦ãã å¯è½æ§ããã訳ã§ãããããã§ã¯ãªããææ³ãä¸åæç¤ºããã¦ããªãçã®ããã¹ãã ãããã§ãææ³ãé ã®ä¸ã§ç¿å¾ã§ããã¨ãããã¨ãã³ã³ããã¼ã«ãããå®é¨ã§è§£æããã¾ããã
ã¾ããè¨èªã¢ãã«ã¯é ã®ä¸ã§åçè¨ç»æ³ã使ã£ã¦ããã¨ãã示åãå¾ããã¾ãããè¾¼ã¿å ¥ã£ã¦ããã®ã§ç°¡åãªè§£èª¬ã«çãã¾ãããæèèªç±ææ³ã¯ã«ã¼ã«ãã¼ã¹ã§è§£æããã¨ããåçè¨ç»æ³ãç¨ãã¦ãã¼ã¹ããã鿬¡çæããããããã¨ãå¯è½ã§ããããã¯æèèªç±ææ³ã®ç½®æã¨ã¯éæ¹åã®æ¨è«ãããè¦é ã§ãthe cat saw a dog ã«ããã¦ã1 ãã¼ã¯ã³ç®ãã 2 ãã¼ã¯ã³ç® (the cat) ãåè©å¥ã§ã3 ãã¼ã¯ã³ç®ãã 5 ãã¼ã¯ã³ç® (saw a dog) ãåè©å¥ã§ããã°ãããããããã㦠1 ãã¼ã¯ã³ç®ãã 5 ãã¼ã¯ã³ç®ã¯æãå½¢æãããã¨ããããã«ãåºéã大ããªåä½ã«ãã¼ã¸ãã¦ããã¾ãããã®ã¨ããã©ã®åºéã¨ã©ã®åºéããã¼ã¸ã§ããããèªèã§ãããã¨ãéè¦ã§ããè¨ç·´æ¸ã¿ã®è¨èªã¢ãã«ãåæããã¨ããã¼ã¸ã§ããåºéã®å³ç«¯ããå³ç«¯ãä¾ãã° [the cat] [saw a dog] ã§ããã° dog ãã cat ã«å¯¾ãã¦ææã«å¼·ãã¢ãã³ã·ã§ã³ãããã¦ãããã¨ãåããã¾ããããã¼ã¸ã§ããåºéãè¨ç®ãããã¨ã¯èªæã§ã¯ããã¾ããããè¨èªã¢ãã«ã¯é ã®ä¸ã§ãããã¡ããã¨è¨ç®ãã¦ãã¾ãããã®ã»ããåçè¨ç»æ³ãåãããã«å¿ è¦ãªæ å ±ãè¨èªã¢ãã«ã¯é ã®ä¸ã§èªèããªãããã¼ã¯ã³ãçæãã¦ãã¾ããåçè¨ç»æ³ã®å®è£ æ¹æ³ã¯ç¡æ°ã«ããã®ã§ãè¨èªã¢ãã«ã¯ç¢ºå®ã«åçè¨ç»æ³ãåãã¦ããã¨ã¾ã§ã¯è¨ãã¾ããããï¼å°ãªãã¨ãæèèªç±ææ³ã®åçè¨ç»æ³ãç¥ã£ã¦ãã人ããè¦ãã¨ï¼ããªãåçè¨ç»æ³ã«è¿ãæ¯ãèãããã¦ãããã¨ã観å¯ããã¾ãããã¾ãããç¹°ãè¿ãã«ãªãã¾ãããè¨èªã¢ãã«ã«ã¯ææ³ã«ã¤ãã¦ã®ç¥èãç´æ¥ã¯ä½ãæç¤ºããªããã¨ã«æ³¨æãã¦ãã ãããåã« next token prediction ã§çã®ããã¹ãã§è¨èªã¢ãã«ãè¨ç·´ããã ãã§ãè¨èªã¢ãã«ã¯é ã®ä¸ã§åçè¨ç»æ³ãç¿å¾ã§ããã¨ãããã¨ã§ãã
è¨èªã¢ãã«ã¯ãã¤ãºã®ãããã¼ã¿ã§è¨ç·´ããã¨ãã¤ãºèæ§ãã¤ã
以ä¸ã®å®é¨ã§ã¯ãè¨ç·´ãã¼ã¿ã¯å®ç§ã«ææ³ã«æ²¿ã£ããã®ã§ãããããããç¾å®ã®ãã¼ã¿ã«ã¯ãã¤ãºãå ¥ã£ã¦ãããå¿ ãããæ£ããææ³ã«æ²¿ã£ã¦ããã¨ã¯éãã¾ããããã®åé¡ãèæ ®ããå®é¨ãè¡ãã¾ãã
å®ç§ã«ææ³ã«æ²¿ã£ããã¼ã¿ã ãã§è¨ç·´ããè¨èªã¢ãã«ã¯ãã¤ãºã«å¼±ãã§ããå®ç§ã«ææ³ã«æ²¿ã£ããã¹ããã¼ã¿ x ãç¨æããååé¨å x[:50] ãåãåºããä¸é¨ã®ãã¼ã¯ã³ãã©ã³ãã ã«å ¥ãæ¿ãã¦ãã¤ãºã®ããå ¥å x'[:50] ãä½ãããã®ç¶ããè¨èªã¢ãã«ã«çæããã¾ãããã®åºåçµæã¨ãã¤ãºãå ¥ããåã®ååé¨åãé£çµã x[:50] + output ãæ¤æ»ããã¨ãããã¯ææ³ã«æ²¿ã£ã¦ãã¾ãããã¤ã¡ã¼ã¸ã¨ãã¦ã¯ãå è©ãå¿ãããåæ°å½¢ã»è¤æ°å½¢ãééããããªã©å°ãææ³ãã¹ã®ããããã¹ãã®ç¶ããçæããããã¨ãã¦ãããã¹ãããæãã«æ¨æ¸¬ãã¦åãã¦ããããã¨ã¯ãªããç¶ãã®çæãæ··ä¹±ãã¦ééãã¨ãããã¨ã§ãã
ç¶ãã¦ãè¨ç·´ãã¼ã¿ã®ä¸é¨ï¼ä¾ï¼15% ã®ãã¼ã¿ï¼ã«ãã¤ãºãå ¥ãã¦ãã¹ã¯ã©ããããè¨èªã¢ãã«ãè¨ç·´ãã¦ã¿ã¾ãã
ãã¤ãºã®ãããã¼ã¿ã§è¨ç·´ããè¨èªã¢ãã«ã¯ãã¹ãæã«ããã¤ãºã«å¼·ãã§ããå
ã»ã©ã¨åæ§ã®å®é¨ãè¡ãã¨ãä»å㯠x[:50] + output ãé«ã確çï¼99%~ï¼ã§æ£ããææ³ã§ããããã ããå¸¸ã«æ£ããææ³ãåºåãããã«ã¯ä½ã温度ãå¿
è¦ã§ããæ¸©åº¦ãè¨ç·´æã¨åãã ã«ããã¨ããã¤ãºã®åºåã®ä»æ¹ããæ£ãããè¦ãã¦ãã¾ãã¾ããå
·ä½çã«ã¯ããã¤ãºã®ç¡ããã¹ããã¼ã¿ãå
¥åããã¨å¸¸ã«æ£ããææ³ã®ç¶ããçæãããã¤ãºã®ãããã¹ããã¼ã¿ãå
¥åããã¨å¸¸ã«ééã£ãææ³ã®ç¶ããçæããä¸ããçæãããã¨è¨ç·´ãã¼ã¿ã®ãã¡ãã¤ãºãå«ãå²åï¼ä¾ï¼ç¢ºç 0.15ï¼ã§ééã£ãææ³ã®æãçæããã¯ãªã¼ã³ãªãã¼ã¿ã®å²åï¼ä¾ï¼ç¢ºç 0.85ï¼ã§æ£ããææ³ã®æãçæããããã«ãªãã¾ããNext token prediction ã§ãããããåºååå¸ã«ãªãããã«è¨ç·´ããã®ã ããå½ç¶ã¨ããã°å½ç¶ã§ããæ¸©åº¦ã
ãªã©ã¨ä¸ããã¨ããããã®å ´åããã¤ãºã®ç¡ãæãçæããããã«ãªãã¾ãã
ããããå¾ãããå®è·µä¸ã®æè¨ã¨ãã¦ã¯
- ãã¤ãºèæ§ã®ããè¨èªã¢ãã«ãä½ãã«ã¯è¨ç·´ãã¼ã¿ã«ãã¤ãºãå«ããã¼ã¿ãå ¥ãããã¨ãéè¦
- æ£ããæã ããçæããããã¨ãã«ã¯çææã«æ¸©åº¦ãä¸ãããã¨ãéè¦
ã¨ãããã¨ã§ããç´æçã«ã¯å½ããåã§ãããããã¾ã§ã¯å ´å½ããçã«æ¤è¨¼ããã¦ãããããã®ç¥è¦ããè¨èªã¢ãã«ã®ç©çå¦ã®æµåãã¤ã¾ãã³ã³ããã¼ã«ãããç¶æ³ã§ãå¯¾ç §å®é¨ãéãã¦ã確èªããããã¨ãéè¦ã§ãã
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process (ICLR 2025)
æ¥æ¬èªã«è¨³ãã¨ãå°å¦æ ¡ã®ç®æ°ã¨é ããæ¨è«ããã»ã¹ãã§ãã
ãã®è«æã®ä¸»è¦ãªçµæãä¸è¨ã§è¡¨ãã¨ãè¨èªã¢ãã«ã¯ç®æ°ã®åé¡ã人éã¨ã¯å°ãéã£ãããã»ã¹ã§è§£ãã¦ããã¨ãããã¨ã§ãã
ãã®è«æã§ã¯å°å¦æ ¡ã¬ãã«ã®ç®æ°ã®äººå·¥çãªåé¡ãããªãã³ã¼ãã¹ãæ°ãã使ãã¾ãããä¾ãã°ä»¥ä¸ã®ãããªå顿ã§ãã
ãªãã¼ãã¥ã¼é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ° ã¯ããæ åã¹ã¿ã¸ãªç¨ã®ãªã¥ãã¯ã®æ°ãã¨ããã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã ãè¶³ããå¤ã® 5å ã§ããã æ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ° ã¯ããæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ãã¨ãã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ã ãè¶³ããå¤ã« 12ãå ãããã® ã§ããã ã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ° ã¯ãããã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ãã¨ãæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã ãè¶³ããå¤ã«çããã ãªãã¼ãã¥ã¼é«æ ¡ã®ãã³ã¹ã¹ã¿ã¸ãªã®æ° ã¯ããæ åã¹ã¿ã¸ãªç¨ã®ãªã¥ãã¯ã®æ°ããæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ããæ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ããã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã ã®åè¨ã¨çããã ãã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã¯ 17 ã§ããã æ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã¯ 13 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã¯ããã¤ãï¼
ããã«å¯¾ããæèã®é£éä»ãã®åçã¯ä»¥ä¸ã®ããã«ãªãã¾ãã
ãã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã p ã¨å®ç¾©ããã ãããã£ã¦ãp = 17 ã§ããã æ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã W ã¨å®ç¾©ããã ãããã£ã¦ãW = 13 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ã B ã¨å®ç¾©ããã ãããã£ã¦ãB = p + W = 17 + 13 = 7 ã§ããã æ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã g ã¨å®ç¾©ããã R = W + B = 13 + 7 = 20 ãªã®ã§ãg = 12 + R = 12 + 20 = 9 ã§ããã æ åã¹ã¿ã¸ãªã®ãªã¥ãã¯ã®æ°ã w ã¨å®ç¾©ããã ãããã£ã¦ãw = g + W = 9 + 13 = 22 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã c ã¨å®ç¾©ããã ãããã£ã¦ãc = B à w = 7 à 22 = 16 ã¨ãªãã çãï¼16
17 + 13 = 7 ã¨ãªã£ã¦ãã¾ãããããã¯èª¤åã§ã¯ãªãããã®ãã¼ã¿ã»ããã§ã¯ mod 23ã§è¨ç®ããããã«ãªã£ã¦ãã¾ããããã¯åé¡ãé·ããªã£ãã¨ãã«ç¡é§ã«å¤§ããªå¤ãåºã¦ãã¦ãè¨ç®ãç¡é§ã«ããããããªãã®ãé²ãããã§ãã以ä¸ãè¨ç·´ããã¹ããå
¨ã¦ mod 23ã®ä¸çã§è¡ãã¾ãã
å顿ãåçãé·ãã¦å°ããããããã§ãããè¦ããã«ãããªãã¼ãã¥ã¼é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ãããæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã ã®ãããªé©å½ãªåè©ãããªã夿°ãç¨æãã¦ããããã®é¢ä¿ããã³ãã¬ã¼ãã«æ²¿ã£ã¦æã¨ãã¦è¡¨ç¾ãããããã並ã¹ããã®ããã®å顿ã§ããããã¨åãè¦é ã§ãåãã¿ã¤ãã®å顿ã¨åçãããããèªåçæãã¾ãããã³ãã¬ã¼ãã«æ²¿ã£ãçæã§ããã夿§æ§ã¯ååã§ãå®è³ªçã«ç°ãªãåé¡æã®æ°ã¯ 90 å åã«ãä¸ãã¾ãããªã®ã§ãè¨ç·´ã¨ãã¹ãã§å顿ã被ããã¨ã¯ããå¾ããã¢ãã«ãåé¡ãæè¨ããå¿é ãããã¾ããã
è¨ç·´æã«ã¯ãããããé£çµãã
ãªãã¼ãã¥ã¼é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ° ã¯ããæ åã¹ã¿ã¸ãªç¨ã®ãªã¥ãã¯ã®æ°ãã¨ããã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã ãè¶³ããå¤ã® 5å ã§ããã æ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ° ã¯ããæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ãã¨ãã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ã ãè¶³ããå¤ã« 12ãå ãããã® ã§ããã ã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ° ã¯ãããã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ãã¨ãæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã ãè¶³ããå¤ã«çããã ãªãã¼ãã¥ã¼é«æ ¡ã®ãã³ã¹ã¹ã¿ã¸ãªã®æ° ã¯ããæ åã¹ã¿ã¸ãªç¨ã®ãªã¥ãã¯ã®æ°ããæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ããæ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ããã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã ã®åè¨ã¨çããã ãã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã¯ 17 ã§ããã æ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã¯ 13 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã¯ããã¤ãï¼ ãã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã p ã¨å®ç¾©ããã ãããã£ã¦ãp = 17 ã§ããã æ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã W ã¨å®ç¾©ããã ãããã£ã¦ãW = 13 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ã B ã¨å®ç¾©ããã ãããã£ã¦ãB = p + W = 17 + 13 = 7 ã§ããã æ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã g ã¨å®ç¾©ããã R = W + B = 13 + 7 = 20 ãªã®ã§ãg = 12 + R = 12 + 20 = 9 ã§ããã æ åã¹ã¿ã¸ãªã®ãªã¥ãã¯ã®æ°ã w ã¨å®ç¾©ããã ãããã£ã¦ãw = g + W = 9 + 13 = 22 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã c ã¨å®ç¾©ããã ãããã£ã¦ãc = B à w = 7 à 22 = 16 ã¨ãªãã çãï¼16
ãå ¥åããã¹ãã¨ãã¦ããã¯ã next token prediction ã§è¨èªã¢ãã«ãè¨ç·´ãã¾ããå®é¨ã§ã¯ããã®ç¨®ã®ããã¹ããç´ 5000 ä¸åç¨æããè¨èªã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ãã¦ãã¾ãããã©ãããã§ãããã¦ã§ãä¸ã®ã³ã¼ãã¹ã¯ä¸å使ããããã®è¦é ã§ä½æããå顿ã®ã³ã¼ãã¹ã®ã¿ã§è¨èªã¢ãã«ãè¨ç·´ãã¾ãã
è¨èªã¢ãã«ã¯æªç¥ã®åé¡ãè§£ãã
LLM ã®é¦´æã¿ã®ããçããã§ããã°é©ãã®ãªããã¨ããããã¾ãããããããã¦è¨ç·´ãããè¨èªã¢ãã«ã¯ãåãã¦è¦ããã¹ãç¨ã®å顿ã«å¯¾ãã¦ã精度è¯ãï¼99% 以ä¸ã®æ£ççã§ï¼æ£çã§ãã¾ãã
éèªæãªç¹ã¨ãã¦ã¯ãè¨ç·´æã«è¦ããã¨ã®ãªãé£ããã®åé¡ã§ãè§£ããããã«ãªãã¨ãããã¨ã§ããè¨ç·´æã«ã¯é«ã 21 ã¹ãããã§è§£ããå顿ããä¸ããããã¹ãæã«ãããªã 28 ã¹ããã以ä¸å¿ è¦ãªåé¡ãè¦ãã¦ãã精度ããçãããã¨ãã§ãã¾ãããããã¯äººéãè¶ ãã AI ã使ããããã«éè¦ãªçµæã§ããè¨èªã¢ãã«ã«å¯¾ããè¨ç·´ãã¼ã¿ã¯äººéã使ãã¾ããããè¨èªã¢ãã«ãç²å¾ã§ããç¥è½ããã¼ã¿ã使ãã人éã®ç¥è½ãéçã«ãªãã¨ããåé¡ãããã¾ãããããããã®çµæã¯è¨ç·´ãã¼ã¿ã®ç¥è½ãéããã¦ãã¦ããèªç¶ã«ãã以ä¸ã®ç¥è½ãç²å¾ã§ããå¯è½æ§ã示åãã¦ãã¾ããããã¾ã§ã Weak-to-Strong Generalization [Burns+ ICML 2024] ã®ããã«ãã®åé¡ã«å¯¾ããç ç©¶çµæã¯ããã¾ããããã¦ã§ãã³ã¼ãã¹ãç¨ãã¦ããããã«ãæ¬å½ã«ãã¹ãåé¡ãè¨ç·´ã§ä½¿ç¨ãããã¼ã¿ãããçã«é£ãããã確å®ã«æ ä¿ã§ãã¦ãã¾ããã§ãããæ¬ç ç©¶ã§ã¯ãã³ã³ããã¼ã«ãããè¨ç·´ãã¼ã¿ãç¨ãããã¨ã§ããããã¾ãæ ä¿ã§ãã¦ãã¾ãã
è¨èªã¢ãã«ã¯æççµè·¯ã§çããåºã
æç« é¡ã«æ£çããæ¹æ³ã¯ç¡æ°ã«ããã¾ããéä¸ã§ä½è¨ãªå¤æ°ãè¨ç®ãã¦ãã¾ã£ã¦ãããã§ãããæçµçãªçããããã£ã¦ããã°æ£çã¨å¤å®ããã¦ãã¾ãããããããè¨èªã¢ãã«ãåºåããæèã®é£éãåæãã¦ã¿ãã¨ãå顿ä¸ã«ä½è¨ãªå¤æ°ãç»å ´ãã¦ããã¨ãã¦ãã大æµã®å ´åä¸å¿ è¦ãªè¨ç®ãä¸åãããæçã§çãã¾ã§ãã©ãçãã¦ãã¾ãããããã¯è©¦è¡é¯èª¤ããã人éã®åçæ¹æ³ã¨ã¯å¯¾ç §çã§ãããªããã®ãããªãã¨ã«ãªã£ãã®ããè¨èªã¢ãã«ã®é ã®ä¸ãè¦ãã¦åæãã¦ã¿ã¾ãã
è¨èªã¢ãã«ã¯é ã®ä¸ã§å¤æ°ã®ä¾åé¢ä¿ãåæãã¦ãã

ããã¼ãã³ã°ã«ãã£ã¦ä»¥ä¸ã®ãã¨ãåããã¾ããã
- åé¡ã®åæãå ¥åãããç´å¾ã質åæãå ¥åãããç´åã®ç¶æ ã§ãè¨èªã¢ãã«ã¯é ã®ä¸ã§ãå ¨ã¦ã®å¤æ°å¯¾ã®ä¾åé¢ä¿ãåãã£ã¦ãããä¾ãã°ããã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ããè¨ç®ããã«ã¯ãããã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ãã¨ãæ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ããå ã«ç¥ã£ã¦ããå¿ è¦ãããã¨ãããã¨ãé ã®ä¸ã§åãã£ã¦ãã¾ãã
- 質åæãå ¥åãããç´å¾ãæèã®é£éãå§ããç´åã®ç¶æ ã§ãè¨èªã¢ãã«ã¯é ã®ä¸ã§ã質åæã«åçããã®ã«å¿ è¦ãªå¤æ°ã®éåãåãã£ã¦ãããéã«è¨ãã°è³ªåæã«åçããã®ã«å¿ è¦ã®ãªãä½è¨ãªå¤æ°ãä½ããåãã£ã¦ããã¨ãããã¨ã§ãã
- æèã®é£éãçæãã¦ããéä¸ãè¨èªã¢ãã«ã¯é ã®ä¸ã§ããã§ã«è¨ç®ãã夿°ã®éåã¨æ¬¡ã«è¨ç®ã§ãã夿°ã®éåãåãã£ã¦ããã
è¨èªã¢ãã«ã¯ãããã®ãã¨ãé ã®ä¸ã§åãã£ã¦ããããã«ã質åæã«åçããããã«å¿ è¦ã§ãã£ã¦ããã¤æ¬¡ã«è¨ç®ã§ãã夿°ã®éåããé çªã«è¨ç®ããããã¨ã§ããã¹ãªããã¤æçã§çããå°ãåºãã¦ããã¨ããããã§ãã
ãã©ãããã§ãããè¨èªã¢ãã«ã«ã¯ãããã£ãè§£ãæ¹ã¯æãããåã« next token prediction ã§è¨ç·´ãããã¨ãåã³æãèµ·ããã¦ãã ãããåã« next token prediction ã§çã®ããã¹ãã§è¨èªã¢ãã«ãè¨ç·´ããã ãã§ãè¨èªã¢ãã«ã¯é ã®ä¸ã§ãããã£ããæç« é¡ã®æ£ããè§£ãæ¹ããèªåã§ç²å¾ãã¾ãã
è¨èªã¢ãã«ã誤çããã¨ããé ã®ä¸ã§ããã¹ãã¦ãã
ããè¨ç·´ãããè¨èªã¢ãã«ã§ãç®æ°ã®åé¡ãééããã¨ãããã¾ããééããå ´åã«å é¨ç¶æ ãåæãã¦ã¿ãã¨ãå¤ãã®å ´åãè¨èªã¢ãã«ã¯é ã®ä¸ã§æ¬å½ã¯ã¾ã è¨ç®ã§ããªã夿°ãè¨ç®ã§ããã¨åéããã¦ãã¾ããã
é¢ç½ããã¨ã«ãè¨èªã¢ãã«ãã¾ã 䏿åãåºåãã¦ããªã段éã§ããã®é ã®ä¸ï¼å é¨ç¶æ ï¼ãè¦ãã°ããã®ã¢ãã«ã¯ãã®å¾å¿ ãééããç¯ããã¨äºæ¸¬ã§ããå®éãã®ã¾ã¾è¨èªã¢ãã«ã«åºåãããã¨ééã£ãåçãåºåãã¾ãã
ãã®ä»çµã¿ãå©ç¨ãã¦ãæ¨è«ã®ç²¾åº¦ãåä¸ããããã¨ã¯æ¬¡ã®è«æã§æ±ãã¾ãã
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems (ICLR 2025)
æ¥æ¬èªã«è¨³ãã¨ãå°å¦æ ¡ã®ç®æ°ã®åé¡ã«ããã¦ééãããå¦ã¶æ¹æ³ãã§ãã
ãã®è«æã®ä¸»è¦ãªçµæãä¸è¨ã§è¡¨ãã¨ãè¨èªã¢ãã«ã¯èªåã§ééãã«æ°ã¥ãã¦ä¿®æ£ã§ããã¨ãããã¨ã§ãã
使ç¨ãããã¼ã¿ã¯å ã»ã©ã¨åãå°å¦æ ¡ã®ç®æ°ã®ãã¼ã¿ã»ããã§ãã
è¨èªã¢ãã«ã¯ééããåºåãããã¨ãé ã®ä¸ã§å¾æãã¦ãã
å ç¨ã®è«æã§ãééããå ´åã«å é¨ç¶æ ãåæãã¦ã¿ãã¨ãå¤ãã®å ´åãè¨èªã¢ãã«ã¯é ã®ä¸ã§æ¬å½ã¯ã¾ã è¨ç®ã§ããªã夿°ãè¨ç®ã§ããã¨åéããã¦ããã¨è¿°ã¹ã¾ãããããã®è¨ç®ã§ããªã夿°ããè¨ç®ãããã¨å®£è¨ããç´å¾ãä¸å®ã®å²åï¼60% ç¨åº¦ï¼ã§ã¯é ã®ä¸ã§ãã¾ã è¨ç®ã§ããªããã¨åãã£ã¦ãã¾ããã¤ã¾ãããè¨ç®ãããã¨è¨ã£ããã¨ãé ã®ä¸ã§å¾æãã¦ãããããã¦è¨ç®ããã¨å®£è¨ãã¦ãã¾ã£ãæåãå¼ãè¿ããã«ãã®ã¾ã¾ãã¿ã©ã¡ãªè¨ç®ãåºåãã¾ãã
ããã¯èªå·±å帰ã¢ãã«ã®èªå·±æ¬ºç (self-delusions) [Ortega+ DeepMind Technical Report 2021] ã¨é¢ä¿ãã¦ãã¾ããèªå·±å帰åã®è¨èªã¢ãã«ã¯çºè¨ãããã¨ãåãæ¶ãã¾ããããããèªåã®éå»ã®çºè¨ãèªåã«å¯¾ããå ¥åã¨ãªããééã£ãèªåã®çºè¨ãæ£ãããã¨ã¨æãè¾¼ãã§ãã¾ãã¾ããè¨èªã¢ãã«ãæ®éã«æèã®é£éãããã¨æ£è§£ã§ããåé¡ãããã¨ãã¾ãããã®åé¡ãå ¥åããè°è«ã®æåã«çããåºåããã¦ããæèã®é£éãå§ããããã¨ãæåã«åºããçããééãã¦ããå ´åã§ããç¡çãããã®çããçµè«ã«ãªããããªè°è«ã®æµããã¨ããæçµçã«ãã®ééã£ãçµè«ãæåºãããã¨ã観å¯ããã¦ãã¾ã [McCoy+ arXiv 2023]ã
éå»ã®ç ç©¶ã¯è¨èªã¢ãã«ã声ã«åºãããã¨ã«å¯¾ãã観å¯ã§ããããä»åã®å®é¨ã§ã¯ããã®ãããªãã¨ãèµ·ãã£ãã¨ããå®ã¯è¨èªã¢ãã«ã¯é ã®ä¸ã§ããããã¾èªåãééãã¦ããªããã¨å¾æãã¦ããï¼ãããããæ¢ããããªãã®ã§ãã®ã¾ã¾åãç¶ãã¦ããï¼ã¨ãããã¨ãåããã¾ããã
è¨èªã¢ãã«ã¯ééãã®è¨æ£è½åã身ã«ã¤ãããã
èªåã§ééãããã¨ãåãã£ã¦ãããªããè¨æ£ãã¦æ£ããçµè«ã«ãã©ãçãã¦ã»ãããã®ã§ãã
å®éãè¨èªã¢ãã«ã«èªåã®åºåãè¨æ£ãããç ç©¶ã¯ããã¤ãããã¾ã [Madaan+ NeurIPS 2023, Pan+ TACL 2024]ãç°¡åã«ã¯ãChatGPT ã«å¯¾ãã¦ãããããééãã¦ããã®ã§ä¿®æ£ãã¦ãã¨è¨ãã ãã§ããã¿ã¾ãããééãã¦ãã¾ãããã¨è¨ã£ã¦ä¿®æ£ãã¦ããã¾ãã
ãããã¯ééããçããå®å ¨ã«åºåããã¦ãããèªå·±ä¿®æ£ããããã®ã§ããããããä¸ã«è¦ãããã«ãè¨èªã¢ãã«ã¯ãã¹ããç´å¾ã«ãã¹ã«æ°ã¥ãã¦ããã®ã§ããã¹ããã¨æ°ã¥ããªããæå¾ã§åºåãããã®ã¯ç¡é§ã§ããããã§ã¯ããã¹ããã¹ãããã®ç´å¾ã«ãè¨èªã¢ãã«ã«èªåã§ãã¹ãä¿®æ£ãããæ¹æ³ãèãã¾ãã
å ·ä½çã«ã¯
ã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ã [BACK] ãã³ã¹ã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã p ã¨å®ç¾©ããã ãããã£ã¦ãp = 17 ã§ããã æ åã¹ã¿ã¸ãªã®ã¡ãã»ã³ã¸ã£ã¼ããã°ã®æ°ã W ã¨å®ç¾©ããã ãããã£ã¦ãW = 13 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®æ åã¹ã¿ã¸ãªã®æ°ã B ã¨å®ç¾©ããã ãããã£ã¦ãB = p + W = 17 + 13 = 7 ã§ããã æ åã¹ã¿ã¸ãªã®ãªã¥ãã¯ã®æ°ã [BACK] æ åã¹ã¿ã¸ãªã®éå¦ãã°ãã®æ°ã g ã¨å®ç¾©ããã R = W + B = 13 + 7 = 20 ãªã®ã§ãg = 12 + R = 12 + 20 = 9 ã§ããã æ åã¹ã¿ã¸ãªã®ãªã¥ãã¯ã®æ°ã w ã¨å®ç¾©ããã ãããã£ã¦ãw = g + W = 9 + 13 = 22 ã§ããã ã»ã³ãã©ã«é«æ ¡ã®ãªã¥ãã¯ã®æ°ã c ã¨å®ç¾©ããã ãããã£ã¦ãc = B à w = 7 à 22 = 16 ã¨ãªãã çãï¼16
ã®ããã«ãããã¨ééããæ¨è«ã¹ããããå ¥ãã¦ãã®ç´å¾ã« [BACK] ã¨ããç¹æ®ãªãã¼ã¯ã³ãå ¥ãããã¼ã¿ãç¨æãã¾ããä¸é¨ã®ãã¼ã¿ã«å¯¾ãã¦ãã®ãããªå·¥å¤«ãè¡ã£ãã³ã¼ãã¹ãç¨æããè¨èªã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ãã¾ãã
ãã®ããã«ãããã¨ééãã¦è¨æ£ãããã¼ã¿ã§è¨ç·´ããã¨ããã¹ãèªå·±ä¿®æ£ã§ããããã«ãªããé£ãããã¹ããã¼ã¿ã«å¯¾ããæ£ççãå¤§å¹ ã«ä¸æãã¾ããã
è峿·±ãç¹ãäºã¤ããã¾ãã
第ä¸ã¯ãééãã®ããæ¨è«ã¹ããããè¨ç·´ããåãé¤ããªãã¦ããã¾ãããã¨ãããã¨ã§ããééããè¨ç·´ãã¼ã¿ãå ¥ããã¨è¨èªã¢ãã«ãããã¨ééããããã«ãªãæ¸å¿µãããã®ã§ãåã£ããã¨ããããªãã°ãè¨ç·´æãééãã®ã¹ãããã¨æ£è§£ã®ã¹ããããæ··ãã£ãããã¹ããã¼ã¿ãè¨èªã¢ãã«ã«å ¥åããééãã®ã¹ãããã«ã¤ãã¦ã¯è¨ç·´æå¤±ã®è¨ç®ããçãã[BACK] ãã¼ã¯ã³ã®åºåæ¹æ³ã ãå¦ç¿ããããã¤ã¾ãããã¹ãã³ã°ãããã¨ãããã¨ãèãããã¾ããããã®ãããªç´°å·¥ãããªãã¦ãåã«æ··ãã£ããã¼ã¿ã使ã£ã¦ next token prediction ãããã ãã§æ£ççãä¸ããã¾ããã
第äºã¯ãé«ãå²åã§ï¼ä¾ãã°åæ°ã®ã¹ãããã«ï¼ééãã®ã¹ããããå ¥ããã¨ãã¦ããè¨èªã¢ãã«ã¯ãã¹ãæã«ããã¨ééãã®ã¹ããããæ¿å ¥ããäºã¯ã»ã¨ãã©ãªãã以åã¨åæ§ã大æµã®å ´åã¯æçã®ã¹ãããæ°ã§æ£è§£ã«ãã©ãçãã¦ãããã¨ã§ãã大æµã®å ´åãå®éã«ééãã¦ãã¾ã£ãã¨ãã«ã ãã[BACK] ãã¼ã¯ã³ãåºåãã¦çºè¨ãè¨æ£ãã¦ãã¾ããã
ééãã®è¨æ£è½åã¯äºåå¦ç¿ã§èº«ã«ã¤ããå¿ è¦ããã
ééãã®è¨æ£è½å㯠LoRA ãã¡ã¤ã³ãã¥ã¼ãã³ã°ã§ã¯èº«ã«ã¤ããªããã¨ãåããã¾ãããå ã ã®ã¢ãã«ã®ããã«ãééãã®ãªãã¯ãªã¼ã³ãªãã¼ã¿ã§ã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ããå¾ãå ã»ã©ã®ä¾ã®ããã«ééãã®ã¹ããããå«ããã¼ã¿ã§ LoRA ãã¡ã¤ã³ãã¥ã¼ãã³ã°ããã¦ããééããèªå·±è¨æ£ããè½åã¯ã»ã¨ãã©èº«ã«ã¤ãã¾ããã§ãããããã¯ééããé ã®ä¸ã§èªèããè½åã¨ãééãã声ã«åºãã¦èªå·±è¨æ£ããè½åã¯ããé¢ãã¦ãããLoRA ãã¡ã¤ã³ãã¥ã¼ãã³ã°ç¨åº¦ã®ä¿®æ£ã§ã¯èº«ã«ã¤ãªãã¨ãããã¨ã表ãã¦ãã¾ãã
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction (ICML 2024)
æ¥æ¬èªã«è¨³ãã¨ãç¥èã®è²¯èµã¨æ½åºãã§ãã
ãã®è«æã®ä¸»è¦ãªçµæãä¸è¨ã§è¡¨ãã¨ãç¥èã®è²¯èµã¨æ½åºã¯å¥ç©ã¨ãããã¨ã§ããè¨èªã¢ãã«ã¯ç¥èã®è²¯èµã¯å¾æã§ããããããç¥èãæ½åºã§ããããã«ããããã«ã¯å·¥å¤«ãå¿ è¦ã§ãã
è¨èªã¢ãã«ã®ç©çå¦ã®ä¸»è¦ãªç¹å¾´ã¯ãã¦ã§ãããåéããã³ã¼ãã¹ã使ããããã£ã¡ãã³ã³ããã¼ã«ããããã¼ã¿ã»ããã使ã£ã¦è¨èªã¢ãã«ãè¨ç·´ãããã¨ã§ãã£ããã¨ãæãåºãã¦ãã ããã
ããã§ããã¯ãã³ã³ããã¼ã«ãããå®é¨ãããã¹ããæ°ãã«ãã¼ã¿ã»ããã使ãã¾ããå ·ä½çã«ã¯å¤§éã®æ¶ç©ºã®äººç©ã®ä¼è¨ã使ãã¾ããååãçå¹´ææ¥ãåºçå°ã忥ãã大å¦ã大å¦ã®å°æ»ãå¤ãã¦ããä¼ç¤¾ãä¼ç¤¾ã®æå¨å°ãããããã©ã³ãã ã«çæããä¼è¨ã使ãã¾ããä¼è¨ã¯ãã³ãã¬ã¼ãã«ãããã®ãããã£ã¼ã«ãå½ã¦ã¯ãããã®ã¨ããããã£ã¼ã«ãããã³ããã«å ¥ãã¦è¨èªã¢ãã« (Llama) ã«çæããã 2 種é¡ä½æãã¾ããã©ã¡ãã®å¤ç¨®ãå®é¨ã«ä½¿ã£ã¦ãçµæã大ããå¤ãããªãã®ã§ã以ä¸ã§ã¯å¤ç¨®ã«ã¤ãã¦ã¯è¨åãã¾ããã
ä¾ãã°ä»¥ä¸ã®ãããªããã¹ãã§ãã
ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯1996å¹´10æ2æ¥çã¾ãã 幼尿ããã¥ã¼ã¸ã£ã¼ã¸ã¼å·ããªã³ã¹ãã³ã§éããã ããµãã¥ã¼ã»ããå·¥ç§å¤§å¦ã®ææé£ããæå°ãåããã ã³ãã¥ãã±ã¼ã·ã§ã³å¦ãå°æ»ã ã¡ã¿ã»ãã©ãããã©ã¼ã ãºã§å°éè·ãçµé¨ã ã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯ã«å¤åã
æ¶ç©ºã®äººç©ã®ä¼è¨ã 10 ä¸äººå使ãã¾ãã
ããã¨ã¯å¥ã«ã質åå¿çã®ãã¼ã¿ã使ãã¾ããä¾ãã°ä»¥ä¸ã®ãããªããã¹ãã§ãã
Q: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®å¤åå°ã¯ï¼ A: ã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯
è¨èªã¢ãã«ã¯äºåè¨ç·´æã«ä½ããã®è³ªåãè¦ã¦ããå¿ è¦ããã
å ¨å¡ã®ä¼è¨ããã¹ãã¨ãä¸é¨ã®äººç©ï¼è¨ç·´ç¨äººç©ï¼ã«ã¤ãã¦ã®è³ªåå¿çããã¹ãã使ã£ã¦è¨èªã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ãã¾ãããã¹ãæã«ã¯ãè¨ç·´æã«è¦ããªãã£ã人ç©ï¼ãã¹ãç¨äººç©ï¼ã«ã¤ãã¦ã質åæãè¨èªã¢ãã«ã«å ¥åããæ£ããçãããããã確èªãã¾ããããã¨å¤§æ¹ã®äºæ³éããè¨èªã¢ãã«ã¯é«ã精度ï¼ç²¾åº¦ 86.6%ï¼ã§è³ªåã«çãããã¨ãã§ãã¾ããã
ç¶ãã¦ãå ¨å¡ã®ä¼è¨ããã¹ãã®ã¿ã§è¨èªã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ãããã®å¾ãè¨ç·´ç¨äººç©ã®è³ªåå¿çããã¹ãã§ã¢ãã«ããã¡ã¤ã³ãã¥ã¼ãã³ã°ï¼ã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ï¼ãããã¹ãç¨äººç©ã«ã¤ãã¦ã®è³ªåæãè¨èªã¢ãã«ã«å ¥åãã¾ãããããã¨ä¸æè°ãªãã¨ã«ç²¾åº¦ã¯å ¨ãåºã¾ããã§ããï¼ç²¾åº¦ 10% æªæºï¼ãã¤ã¾ããã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ç¨ã®ãã¼ã¿ãäºåè¨ç·´æã«è¦ã¦ããªãï¼ä¼è¨ããè¦ã¦ããªãï¼ã¨ã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ã¯ãã¾ãããã¾ããã§ããã
ãããæ¤è¨¼ã§ãããã¨ããã³ã³ããã¼ã«ãããå®é¨è¨å®ã®ãããã§ããç¾å®ã®è¨èªã¢ãã«ã§ã¯ããªããã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ã ãã§ä¸æããããã¨ãå¤ãã§ãããè¨èªã¢ãã«ãäºåè¨ç·´æã«ä¼¼ããã¼ã¿ãè¦ã¦ããå¯è½æ§ãããã¾ãã絶対ã«è³ªåå¿çã¿ã¹ã¯ã®ãã¼ã¿ãäºåå¦ç¿æã«è¦ãããæ¬å½ã«ä¼è¨ãã¼ã¿ã ãã§äºåå¦ç¿ãããã¨ã§ãã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ã ãã§ã¯è³ªåå¿çã¿ã¹ã¯ããã¾ããããªããã¨ãåããã¾ããã
ãããããªããã®ãããªãã¨ãèµ·ããã®ã§ããããï¼
ãã®å¤±æããã¢ãã«ãæ¤æ»ãã¦ã¿ãã¨ããã¹ãç¨äººç©ã«ã¤ãã¦ãä¼è¨ã®ç¶ãã¯æ£ããçæã§ãã¾ãï¼ç²¾åº¦ 99%ï¼ãã¤ã¾ãããã¹ãç¨äººç©ã«ã¤ãã¦ã®ç¥èã¯è¨èªã¢ãã«ã®ä¸ã«è²¯èµããã¦ãã¾ãããããããããå ¨ãæ½åºã§ãã¾ããã
ããã¯ç´æçã«ã¯ãäºåè¨ç·´æã«ç¥èã®åãåºãæ¹ãå¦ã°ãªãã£ãããã«ãããã身ã«ã¤ããªãã£ãã¨èãããã¾ãã
以ä¸ã«ãã®ã¡ã«ããºã ã詳ããè¦ã¦ããã¾ãã
è¨èªã¢ãã«ã¯ãã¼ã¿å¢å¼·ããã¨è³ªåã«çããããããã«ãªã
å ¨å¡ã®ä¼è¨ããã¹ãã®ã¿ã§è¨èªã¢ãã«ãã¹ã¯ã©ããããè¨ç·´ãããã®å¾ãè¨ç·´ç¨äººç©ã®è³ªåå¿çããã¹ãã§ã¢ãã«ããã¡ã¤ã³ãã¥ã¼ãã³ã°ããã¨ããã¹ãç¨äººç©ã«ã¤ãã¦ã®è³ªåå¿çã«å¤±æãããã¨ãè¿°ã¹ã¾ãããããã¾ã§ã®å®é¨ã§ã¯ã1人ã®äººç©ã«ã¤ã1ã¤ã®ä¼è¨ã®ã¿ã使ç¨ãã¦ãã¾ãããä¼è¨å ã§ç´¹ä»ãããããã£ã¼ã«ã®é çªãã©ã³ãã ã«å ¥ãæ¿ãã¦ã1人ã®äººç©ã«ã¤ãã5ã¤ã®ä¼è¨ã使ãã¦ã¿ã¾ããããã¨ãå ç¨ã¨åãããã«äºåå¦ç¿ã§ã¯ä¼è¨ããã¹ãã®ã¿ãã使ç¨ãã¦ããªãã¦ããã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ããã¾ãããããã¹ãç¨äººç©ã«ã¤ãã¦ã®è³ªåå¿çã«é«ã精度ã§ï¼ç²¾åº¦ 96.6%ï¼çããããããã«ãªãã¾ããã
ããã¯ãªãã§ããããï¼ãã¼ã¿å¢å¼·ã®å¹æã«ãã¦ã¯å¼·ãããããã«æãã¾ãã
è¨èªã¢ãã«ã¯ãã¼ã¿å¢å¼·ãããªãã¨å ¨ã¦ã®æ å ±ããããã¾ã§æ å ±ãæ½åºã§ããªã
ãã¼ã¿å¢å¼·ããã¦ããªãå ã®è¨èªã¢ãã«ãèãã¾ãããã®è¨èªã¢ãã«ã®å é¨ç¶æ ã調æ»ããã¨ãåæ å ±ãåºåããç´åã¾ã§ããã®æ å ±ãé ã®ä¸ã«ããã¾ããã
ä¾ãã°ãã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ï¼æ¶ç©ºã®äººç©ï¼ã®ä¼è¨ã¯ä»¥ä¸ã®éãã§ãã
ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯1996å¹´10æ2æ¥çã¾ãã 幼尿ããã¥ã¼ã¸ã£ã¼ã¸ã¼å·ããªã³ã¹ãã³ã§éããã ããµãã¥ã¼ã»ããå·¥ç§å¤§å¦ã®ææé£ããæå°ãåããã ã³ãã¥ãã±ã¼ã·ã§ã³å¦ãå°æ»ã ã¡ã¿ã»ãã©ãããã©ã¼ã ãºã§å°éè·ãçµé¨ã ã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯ã«å¤åã
ãã¼ã¿å¢å¼·ããã¦ããªãè¨èªã¢ãã«ã«ãã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ãã¨ã ãå ¥åããã¨ããã®ä¼è¨ãå ¨ã¦æ£ããçæã§ãã¾ããã¤ã¾ãããã®è¨èªã¢ãã«ã¯ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®ç¥èãæè¨ãã¦è²¯èµãã¦ãã¾ãã
ãã®è¨èªã¢ãã«ã«
ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯1996å¹´10æ2æ¥çã¾ãã 幼尿ããã¥ã¼ã¸ã£ã¼ã¸ã¼å·ããªã³ã¹ãã³ã§éããã ããµãã¥ã¼ã»ããå·¥ç§å¤§å¦ã®ææé£ããæå°ãåããã
ã¾ã§å ¥åããæ®µéã§ãå é¨ç¶æ ããå¤åå°ãäºæ¸¬ãã¦ã¿ã¦ããäºæ¸¬ã§ããªããã¨ãåããã¾ããã
ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯1996å¹´10æ2æ¥çã¾ãã 幼尿ããã¥ã¼ã¸ã£ã¼ã¸ã¼å·ããªã³ã¹ãã³ã§éããã ããµãã¥ã¼ã»ããå·¥ç§å¤§å¦ã®ææé£ããæå°ãåããã ã³ãã¥ãã±ã¼ã·ã§ã³å¦ãå°æ»ã ã¡ã¿ã»ãã©ãããã©ã¼ã ãºã§å°éè·ãçµé¨ã
ã¾ã§å ¥åãã¦ã¯ããã¦ãã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯ã«å¤åãã¦ããã¨ããæ å ±ãå é¨ç¶æ ããèªã¿åããããã«ãªãã¾ããã
ã¤ã¾ãããã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã â ãã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯å¤åãã¨ããå½¢ã§ç¥èã貯èµããã¦ãããããã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ããã¥ã¼ã¸ã£ã¼ã¸ã¼å·ããªã³ã¹ãã³åºèº«ãããµãã¥ã¼ã»ããå·¥ç§å¤§å¦åæ¥ãã³ãã¥ãã±ã¼ã·ã§ã³å¦å°æ»ãã¡ã¿ã»ãã©ãããã©ã¼ã ãºã«å¤åãâãã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯å¤åãã¨ããå½¢ã§ã®ã¿ç¥èã貯èµããã¦ãã¾ããä¸åä¸çã§çããããå½¢å¼ã§ã¯ãªããã¹ãã¼ãªã¼ã¨ãã¦ã ãè¦ãã¦ããã¨ããããã§ããããå人ç©ã«ã¤ã 1 ã¤ã®ä¼è¨ãããªãå ´åããã®ä¼è¨ãçæããããã«ã¯ããããã«ãã®è²¯èµã®æ¹æ³ã ãã§ååã§ããç¹°ãè¿ãã«ãªãã¾ãããè¨èªã¢ãã«ã¯ next token prediction ã§ã®ã¿è¨ç·´ããã¦ãããã¨ãæãåºãã¦ãã ãããã¢ãã«ã¯ã次ã®ããã¼ã¯ã³ããåºåã§ãããè¯ãã®ã§ãã10 ã¹ãããå ã®ãã¼ã¯ã³ããããããé ã®ä¸ã§æºåãã¦ããå¿ è¦ã¯ããã¾ãããã§ããã°ããã®ãããªä¸è¦ãªã¹ãã«ã身ã«ä»ããªãã£ãã¨ãã¦ã䏿è°ã§ã¯ãªãã§ãããããããããã®è¨æ¶æ¹æ³ã§ã¯è³ªåå¿çã¿ã¹ã¯ãè§£ããã¨ãã§ãã¾ããã
䏿¹ããã¼ã¿å¢å¼·ãã¦è¨ç·´ããè¨èªã¢ãã«ã§ã¯
ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯
ã¨å ¥åããæ®µéã§ã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯ã«å¤åãã¦ããã¨ããæ å ±ãå é¨ç¶æ ããèªã¿åããããã«ãªãã¾ãã
ã¤ã¾ãããã¼ã¿å¢å¼·ã«ããããã¤ç¥èãæ½åºããããããããªãã¨ãããã¬ãã·ã£ã¼ãä¸ãããã¨ã§ã人ç©åã ããããã¤ã§ãæ å ±ãåãåºããããã«ãªã£ãã¨ãããã¨ã§ãã
è¨èªã¢ãã«ã¯ä¸é¨ã®äººç©ã«ã¤ãã¦ã®ãã¼ã¿å¢å¼·ã ãã§ããã¾ãæ å ±ãè¨æ¶ã§ããããã«ãªã
ãã¼ã¿å¢å¼·ã®å®é¨ã§ã¯ãå ¨ã¦ã®äººç©ã«ã¤ã㦠5 ã¤ã®ä¼è¨ã使ãã¾ããããå®ã¯ä¸é¨ã®äººç©ã«ã¤ãã¦ã ããã¼ã¿å¢å¼·ãæ½ãããã以å¤ã®äººç©ã«ã¤ãã¦ã¯ãã¼ã¿å¢å¼·ãããã«è¨ç·´ãããã¨ãã¦ãããã¼ã¿å¢å¼·ããªãã£ã人ç©ã«ã¤ãã¦ãé«ãç²¾åº¦ã§æ£çã§ããããã«ãªãã¾ããã¤ã¾ããä¸é¨ã®äººç©ãéãã¦ãæ£ããè¨æ¶ã®ä»æ¹ãå¦ã¹ã°ãä»ã®äººç©ã«ã¤ãã¦ããããå¿ç¨ã§ããããã«ãªãã¨ãããã¨ã§ãã
ãã®ã·ããªãªã¯ç¾å®ã®è¨å®ã¨ããè¿ãã¨èãããã¾ããç¾å®ã§ã¯ä¸é¨ã®æå人ç©ã«ã¤ãã¦ã¯ãããã¤ãã®ä¼è¨ã使ããã¦ãã¾ããç¾å®ä¸çã®è¨èªã¢ãã«ã¯ãããã®ãã¼ã¿ã§æ å ±ã®è¯ãè¨æ¶æ¹æ³ãå¦ãã ãããã§ããã以å¤ã®äººç©ã«ã¤ãã¦ããã¾ã質åå¿çãã§ããããã«ãªã£ãã¨èãããã¾ãã
ããããå¾ãããå®è·µä¸ã®æè¨ã¨ãã¦ã¯
- 䏿µã¿ã¹ã¯ã«é¢é£ã®ããã¤ã³ã¹ãã©ã¯ã·ã§ã³ãã¥ã¼ãã³ã°ããã¹ããäºåå¦ç¿æã«ã使ããã¨ã§ç²¾åº¦ãåä¸ãããä¾ãã°ã質åå¿çã®ä¾ãäºåå¦ç¿ã«å ¥ãããã¨ã§ãè¨èªã¢ãã«ã¯ç¥èã®æ½åºæ¹æ³ã身ã«ä»ãããã¨ãã§ããã
- äºåè¨ç·´æã«åãæ å ±ãæ§ã ãªæ¹æ³ã§è¡¨ç¾ããããã¹ããç¨ãããã¨ã§ç²¾åº¦ãåä¸ãããä¾ãã°ãæ å ±ã®æç¤ºé åºãå¤ãããã¼ã¿å¢å¼·ãè¡ããã¨ã§ãè¨èªã¢ãã«ã¯ç¥èã®æ½åºç²¾åº¦ãä¸ãããã¨ãã§ããã
ã¨ãããã¨ã§ãã
Physics of language models: Part 3.2, knowledge manipulation (ICLR 2025)
æ¥æ¬èªã«è¨³ãã¨ãç¥èã®æä½ãã§ãã
ãã®è«æã®ä¸»è¦ãªçµæãä¸è¨ã§è¡¨ãã¨ãè¨èªã¢ãã«ã¯ç¥èã®è²¯èµãã§ãã¦ããæèã®é£éãªãã«ã¯ç¥èã®æä½ã¯ã§ããªãã¨ãããã¨ã§ãã
ãã¼ã¿ã¯å ç¨ã¨åãæ¶ç©ºã®ä¼è¨ãã¼ã¿ãç¨ãã¾ãã
è¨èªã¢ãã«ã¯è²¯èµããå½¢ã§ããç¥èãæ½åºã§ããªã
ä¼è¨ã®ã¿ã§ã¹ã¯ã©ããããè¨ç·´ããè¨èªã¢ãã«ã«å¯¾ãã¦
Q: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®çå¹´ææ¥ã¯ï¼ A: October 2, 1996
ã®ããã«çå¹´ææ¥ãçããããããã«ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãã¾ãããã¼ã¿å¢å¼·ãè¡ãã¨ãé«ã精度ã§çããããããã«ãªããã¨ãååã®è«æã§è¿°ã¹ã¾ããã
åãããã«ã
Q: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®çã¾ãå¹´ã¯ï¼ A: 1996
ã®ããã«çã¾ãå¹´ãçããããããã«ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãã¦ã¿ã¾ããããã¨ãæ£ççãæ¥µç«¯ã«ä½ãï¼ç²¾åº¦ 20% ç¨åº¦ï¼ãã¨ãåããã¾ãããããã¯ãªãã§ããããã
è«æã§ã®å®é¨ã§ã¯ã¢ã¡ãªã«å¼ã« October 2, 1996 ã¨ããå½¢ã§è¨è¿°ããã¦ãããã¨ã«æ³¨æãã¦ãã ããã
ãããèµ·ããã®ã¯ååã®çµæã¨åæ§ã®çç±ã§ãããã¼ã¿å¢å¼·ããã¾ããããããã¯ãããã£ã¼ã«ã®ç¨®é¡ãã¨ã«é çªãå¤ããã¨ããã ãã§ãçå¹´ææ¥ãè¿°ã¹ãæã«ã¯ãããã®ä¼è¨ãå ¨ã¦ã¢ã¡ãªã«å¼ã§ October 2, 1996 ã®é çªã§ä¸¦ãã§ãã¾ããããã«ããã®è¨èªã¢ãã«ã¯ October 2, ãå ¥åãããã¾ã§ 1996 ã¨ããæ å ±ãæ½åºã§ããªãããã«ãªã£ã¦ãã¾ã£ã¦ãã¾ãã
ãã®ã»ãã
Q: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®çã¾ãå¹´ã¯å¶æ°ãï¼ A: Yes
ã®ãããªåé¡åé¡ã
Q: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®çã¾ãã¯ãµããªãã»ã¨ã¦ã¸ãªã»ãºãã¼ã°ãããæ©ããï¼ A: No
ã®ãããªæ¯è¼åé¡ã«ã¤ãã¦ãåæ§ã§ããã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã October 2, 1996 çã¾ãããµããªãã»ã¨ã¦ã¸ãªã»ãºãã¼ã°ã¯ September 12, 1994 çã¾ãã§ãããã¨ãè¨èªã¢ãã«ã¯ç¥ã£ã¦ããã®ã ãããã©ã¡ããæ©ãããçããã®ãç°¡åããã§ãããçãããã¨ãã§ãã¾ããã
ã¤ã¾ããè¨èªã¢ãã«ã¯ç¥èã貯èµã§ãããè¨ç·´æã«è¦ãé åºã§ããã°ãæ½åºãã§ãããããããç¥èã®æä½ã¯ã§ããªãã¨ãããã¨ã§ããGPT-4 ãªã©ã®ç¾å®ã®è¨èªã¢ãã«ããã¸ã§ã¼ã»ãã¤ãã³ã®çã¾ãå¹´ã¯å¥æ°ãï¼ãã¨ãã£ãç¥èã®åé¡åé¡ãè§£ããªããã¨ã観å¯ããã¦ãã¾ãã

ç¾å®ã®è¨èªã¢ãã«ã¯ãã®ç¨®ã®åé¡ãè§£ãããã¨ãããã¾ãããããããã¯ãã¦ã§ãä¸ã§ãã¾ãã¾ãã®åé¡ãã®ãã®ããããã¯ã»ã¨ãã©åãåé¡ãè¼ã£ã¦ããï¼ä¾ãã°èª°ãããã¸ã§ã¼ã»ãã¤ãã³ã¯å¥æ°å¹´ã«çã¾ãããã¨æå³ãªããã¤ã¼ãããããããã¾ããï¼ã¨ããã ãã§ãè¨èªã¢ãã«ã¯ç¥èã®æä½ã§ãã®åé¡ãè§£ãã¦ããªãå¯è½æ§ãé«ãã§ãããã©ãã§ãããã¦ã§ãã³ã¼ãã¹ãè¤éãããã®ã§æ¬å½ã«ãããªã®ãã¯èª°ã«ãåããã¾ããããã®è«æã®å®é¨ã«ãã£ã¦ãã³ã³ããã¼ã«ãããç¶æ³ä¸ã§ã¯ããããã«è¨èªã¢ãã«ã¯ç¥èã®æä½ãã§ããªããã¨ã確èªããã¾ããã
æèã®é£éã¯å½¹ã«ç«ã¤ãããã¹ãæã«ã使ããªãã¨æ©è½ããªã
ãã®åé¡ã®åããããã坾妿³ã¯æèã®é£éã§ãã
Q: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã®çã¾ãå¹´ã¯ï¼ A: ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯ October 2, 1996 çã¾ãããã£ã¦çãï¼1996
ã¨ããããã¹ãã§ã¢ãã«ããã¡ã¤ã³ãã¥ã¼ãã³ã°ããã¨ãé«ãç²¾åº¦ã§æ£çã§ããããã«ãªãã¾ãã
ãããããã®ããã«è¨ç·´ããã¦ãããã¹ãæã«ãããªãçããåºåããããã«æ±ããããã¨ãã¢ãã«ã¯èª¤çãã¾ããã¤ã¾ããããã¾ã§æèã®é£éã®æ¹æ³ãå¦ãã ã®ã§ãã£ã¦ãç´æ¥çã«ç¥èãæä½ã§ããããã«ãªã£ãããã§ã¯ããã¾ããã
è¨èªã¢ãã«ã¯éæ¤ç´¢ãã§ããªã
ããã®æããä¾ãéæ¤ç´¢ãã¤ã¾ã
Q. October 2, 1996 ã«çã¾ãã人ç©ã¯èª°ãï¼
ã¨ããã¿ã¹ã¯ã§ãããã®ã¿ã¹ã¯ã§ãã¡ã¤ã³ãã¥ã¼ãã³ã°ãããã¹ãæã«ä¸ããããæ¥ã«çã¾ãã人ç©ã1人ããããªãã£ãã¨ãã¦ããè¨èªã¢ãã«ã¯ãã®åé¡ã«å ¨ãçãããã¨ãã§ãã¾ãããè¨èªã¢ãã«ãã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ã¯ October 2, 1996 ã«çã¾ãããã¨ãç¥ã£ã¦ããã¨ãã¦ããã§ããããã¯ãã[人ç©] 㯠[çå¹´ææ¥] çã¾ãã§ãããã¨ããé çªã§ããæ å ±ãè¦ããã¨ããªãããã§ãã
ã¤ã¾ããè¨èªã¢ãã«ã«ããã¦ã¯ããA ãã B ãåãåºããããªãã°ãB ãã A ãåãåºãããã¨ãã対称å¾ãæç«ãã¾ãããããã¯æ¤ç´¢ã¨ã³ã¸ã³ã¨ãã¦ä½¿ãã«ã¯è´å½çãªæ¬ é¥ã§ãããããããã®ãããªéæ¤ç´¢ã¯æèã®é£éã§è§£ããã¨ãé£ããã§ãã
ããã¯è¨èªã¢ãã«ã¯ããã¾ã§è¨èªã¢ãã«ã§ããã¨ãããã¨ã§ããè¨èªã¢ãã«ã¯é æ¹åã®æ¤ç´¢ããããªãã«ã§ããã®ã§ãæ¤ç´¢ã¨ã³ã¸ã³ã®æ©è½ãå å ããããã«é¯è¦ããã¡ã§ãããè¨èªã¢ãã«ã¯æ¤ç´¢ã¨ã³ã¸ã³ã§ã¯ãªãè¨èªã¢ãã«ã§ãããnextãtoken prediction ã§è¨ç·´ãããåæ®ããããã«ã¯ã£ããç¾ããããã§ãã
ãã®å¾å㯠GPT-4 ãªã©ç¾å®ã®ã¢ãã«ã§ã観å¯ã§ãã¾ããä¾ãã°ãã¸ã§ã¼ã³ã»ãªã¼ã¹ãã£ã³ã髿 ¢ã¨åè¦ãã®ä¸ç¯ãåãåºãããã®æ¬¡ã®æã¯ï¼ã¨ GPT-4 ã«èã㨠65.9% ã®ç²¾åº¦ã§æ£çãã¾ããããã®åã®æã¯ï¼ã¨èãã¨ã 0.8% ã®ç¢ºçã§ããæ£çã§ãã¾ããããã®ããã«äºåè¨ç·´ã§å ¥åãããé çªã常ã«åºå®ããã¦ããããã¹ãã«ããã¦ã¯ãæ£ççã«æ¿ããé対称æ§ãç¾ãã¾ãã
Physics of language models: Part 3.3, knowledge capacity scaling laws (ICLR 2025)
æ¥æ¬èªã«è¨³ãã¨ãç¥è容éã®ã¹ã±ã¼ãªã³ã°åãã§ãã
ãã®è«æã®ä¸»è¦ãªçµæãä¸è¨ã§è¡¨ãã¨ãè¨èªã¢ãã«ã¯ãã©ã¡ã¼ã¿ 1 ã¤ã«ã¤ãç´ 2 ãããã®æ å ±ãè¨æ¶ã§ããã¨ãããã¨ã§ãã
ã¹ã±ã¼ãªã³ã°åã¨è¨ãã¨å¾æ¥ã¯ã¢ãã«ãã©ã¡ã¼ã¿æ°ã«å¯¾ããæå¤±å¤ããã¹ãæ§è½ãåæãã¦ãã¾ããããã®ç ç©¶ã§ã¯ãï¼ã¢ã¼ãã£ã»ãã©ã¤ã¤ã¼ã»ãã©ã¼ã¸ã£ã¼ãå¤åå°ãã«ãªãã©ã«ãã¢å·ã¡ã³ãã¼ãã¼ã¯ï¼ã®ãããªä¸ã¤çµç¥èãããã¤æ ¼ç´ã§ããããããæ£ç¢ºã«ã¯ä½ãããã®ç¥èãæ ¼ç´ã§ãããã¨ãããã¨ãåæãã¦ãã¾ãã
ãã¼ã¿ã¯å ç¨ã¨åãæ¶ç©ºã®ä¼è¨ãã¼ã¿ãç¨ãã¾ãã
è¨èªã¢ãã«ã¯ãã©ã¡ã¼ã¿ 1 ã¤ã«ã¤ãç´ 2 ãããã®æ å ±ãè¨æ¶ã§ãã
ä¼è¨ã®äººæ°ã 1 ä¸äººãã 1000 ä¸äººã¾ã§è©¦ããè¨èªã¢ãã«ã®ãã©ã¡ã¼ã¿ã 100 ä¸ããæ°åã¾ã§è©¦ããã¨ãããä¸è²«ã㦠1 ãã©ã¡ã¼ã¿ããã 2 ãããã®ãããã£ã¼ã«æ å ±ãè¨æ¶ãã¾ãããã¢ãã«ã¢ã¼ããã¯ãã£ã GPT-2ãLlamaãMistral ãªã©ã¨å¤ããããç»å ´ããä¼ç¤¾åãå°åã®æ°ãå¤ããããã¦ãä¸è²«ãã¦ãã®å¾åãè¦ããã¾ããã

int8 éååããã¦ãè¨æ¶å®¹éã¯ä¸ãããªãã int4 éååãããã¨è¨æ¶å¹çãä¸ãã
éååããã¦åãå®é¨ãããã¨ãããint8 ã§ã¯è¨æ¶å®¹éã¯ã»ã¨ãã©å¤ããã¾ããã§ãããint8 ã®çè«éçã¯å½ç¶ãã©ã¡ã¼ã¿ããã 8 ãããã§ããããè¨èªã¢ãã«ã¯çè«å¤ã® 25% ç¨åº¦ã¯è¨æ¶å®¹éãæå¹æ´»ç¨ã§ãã¦ãããã¨ã表ãã¾ãã䏿¹ int4 éååãããã¨ã容é㯠2 å以䏿ªããªã£ã¦ãã¾ãã¾ãããã¤ã¾ãå¹çããè¨ã㨠int8 ãæé©ã ã£ãã¨ãããã¨ã«ãªãã¾ãã
25% ã¨ããã¨å°ãªãããã«æããããããã¾ããããnext token prediction ã«ããã¢ãã«ã®è¨ç·´ã ãã§ããã¾ã§éæã§ããã®ã¯éèªæã§ããçè«å¤ã¯ãä¸ã¤çµç¥è以å¤ã®ä½è¨ãªããã¹ããããè½ã¨ãã¦ãç¥èã zip ã§å§ç¸®ãã¦ãã®ãã®ãã«ã¹ãã¬ã¼ã¸ã«æ ¼ç´ãã¦ããããéæã§ãããããªãã®ã§ããSGD ã§ã¢ãã«ãè¨ç·´ããã¨ãã鿥çãªæ¹æ³ãç¨ããªããããããã¢ãã«ã¯åã«ä¸ã¤çµãæè¨ããã ãã§ãªãããã以å¤ã®è½åãç²å¾ããªãããçè«å¤ã® 25% ãéæã§ããã®ã¯ãããªãããããã¨ã ã¨è¨ãã¾ãã
ã´ããã¼ã¿ã¯è¨æ¶å®¹éãæªåããããã½ã¼ã¹ãæè¨ããã¨æªããªããªã
質åå¿çã§ä½¿ã人ç©ã®ã»ãã«ã質åå¿çã§ä½¿ããªãç¡æå³ãªä¼è¨ã大éã«ä½æããè¨ç·´ãã¼ã¿ã«æ··ãã¦è¨ç·´ãã¾ãã質åå¿çã§ä½¿ã人ç©ã«ã¤ãã¦ã¯è¨ç·´ã§ 100 åãã¤è¦ã¾ãããç¡æå³ãªä¼è¨ã¯ä¸åº¦è¦ããä½¿ãæ¨ã¦ã¾ããä¸åº¦ããç¾ããªãã®ã§ã©ã³ãã ãªä¼è¨ã¨åºå¥ã§ãã¾ããããã®ãããªã´ããã¼ã¿ã¯éè¦ãªãã¼ã¿ãè¨æ¶ããéªéã«ãªãã§ããããã
çãï¼å¤§å¤ãªéªéã«ãªãã¾ãããã®ããã«ã´ããã¼ã¿ãæ··ãã¦è¨ç·´ããã¨ããã©ã¡ã¼ã¿ãããã®è¨æ¶å®¹éã 20 å以䏿ªåãã¾ããã¤ã¾ããäºåè¨ç·´ã®ãã¼ã¿ã®å質ã¯é常ã«éè¦ã§ãã
ããããç°¡åãªè§£æ±ºçãããã¾ããéè¦ãªãã¼ã¿ã®é ã«ä½ã§ãè¯ãã®ã§ç¹æ®ãªãã¼ã¯ã³ãã¤ããã¨ãããã¨ã§ããä¾ãã° [wikipedia.org] ã§ãè¯ãã§ããããï¼ãã¡ãã [joisino.hatenablog.com] ã§ãè¯ãã§ããï¼ããããã¨ã´ããã¼ã¿ãæ··ããæªå½±é¿ã¯ãªããªãã¾ãã
è¨ç·´æã«ãã®ãã¼ã¯ã³ãã¤ãããã¼ã¿ã¯éè¦ã§ãããã¨ã¯æç¤ºçã«ã¯æãã¦ãã¾ããããã¯ãåã« next token prediction ã§è¨ç·´ããã ãã§ããããã§ãè¨èªã¢ãã«ã¯ãã®ãã¼ã¯ã³ãé ã«ã¤ããããã¹ãã¯éè¦ã§ãããã¨ãèªåçã«å¦ç¿ã§ããã¨ãããã¨ã§ãã
ç¾å®ã®è¨èªã¢ãã«ã§ã Wikipedia ã®æ å ±ãããæ½åºããããã¨ãããã¾ããããã®ã¡ã«ããºã ãä¸å ã§ããå¯è½æ§ãããã¾ããç¾å®ã®è¨èªã¢ãã«ã¯ã¦ã§ãã³ã¼ãã¹ã使ç¨ãã¦ãããè¨ç·´ããã»ã¹ãè¤éãªã®ã§ãæ®å¿µãªããåå ã¯æç¢ºã«ã¯åããã¾ããããããããã®å®é¨ã«ãããã³ã³ããã¼ã«ããç°å¢ã®ä¸ã§ã¯ããã®ãããªä½¿ãæ¨ã¦ã®ã´ããã¼ã¿ã¨å復çã«ç¾ããéè¦ãã¼ã¿ãè¦åããã¡ã«ããºã ãè¨èªã¢ãã«ã«å å¨ãã¦ãããã¨ããããã¾ããã
ãã®è«æãéãã¦ãè¨èªã¢ãã«ã¯ãã©ã¡ã¼ã¿ 1 ã¤ã«ã¤ããç´ 2 ãããã®æ å ±ãè¨æ¶ã§ããã¨ãããã¨ã確èªãã¾ãããéã«ãå¿ è¦ãªç¥èã®ç·éãããã£ã¦ããå ´åã«ã¯ãããããããã§è¡¨ããç´2åã®ãã©ã¡ã¼ã¿æ°ã®ã¢ãã«ãæºåããã°ååã¨ãããã¨ã§ãã7B ã¢ãã«ã§ã 140 åãããã¨ãããã¨ã§ãããã¯è±èª Wikipedia ã®ç·ç¥èéãè¶ ããã»ã©ã§ããããããªãã®ç¥èãè¨èªã¢ãã«ã«è¾¼ãããã¨ãã§ãã¾ãã
ãããã«
è¨èªã¢ãã«ã®ç©çå¦ã®ã³ã³ã»ããã¯ææã ã¨èãã¦ãã¾ãã
ãã¡ãããç©çå¦ã®ååãæ²ããã¨ããã§ãå¿
ãããç©çå¦ã®ãããªãã¬ã¤ã¯ã¹ã«ã¼ãä¿è¨¼ããã訳ã§ã¯ããã¾ãããæ¬è³ªçã«ç°¡åãªãã®ãããã¨è¤éã«è¨è¿°ãããã¨ã¯ã§ãã¦ããæ¬è³ªçã«è¤éãªãã®ãç°¡åã«è¨è¿°ãããã¨ã¯ä¸å¯è½ã§ããç©çå¦ã«ã¯ ãªã©ã®ã¨ã¬ã¬ã³ããªæ³åããã¾ãã¾ãã£ããããããçºè¦ã§ããã®ã§ãã£ã¦ãããããè¨èªã¢ãã«ã«ã¨ã¬ã¬ã³ããªæ³åããªããã°çºè¦ã®ããããããã¾ããã
ããã§ãã徿¥ã®ç ç©¶æ¹æ³ã¨æ¯ã¹ãã°ãè¨èªã¢ãã«ã®ç©çå¦ã¯ããè¯ãæ¹åã«åãã£ã¦ããããã«æãã¾ããå®éãæ¢ã«æçãªæ³åãããã¤ãçºè¦ã§ãã¦ãã¾ãããã®å
ã« ã»ã©ã®ã¨ã¬ã¬ã³ããªçµ±ä¸æ³åãå¾
ã£ã¦ãããã¨ãä¿è¨¼ããã¦ããªãã¦ãããã®æ¹åã«é²ããã¨ã¯æçã§ããã¨æãã¾ãããã¤ãçµ±ä¸æ³åãè¦ã¤ãããã¨ã夢è¦ã¦ãã®æ¹åã«é²ã¿ã¾ããããAGI ãéæããããã®æ¥ã¾ã§ã
èè æ å ±
ãã®è¨äºãããã«ãªã£ãã»é¢ç½ãã£ãã¨æã£ãæ¹ã¯ SNS ãªã©ã§ææ³ããã ããã¨å¬ããã§ãã
æ°çè¨äºãã¹ã©ã¤ã㯠@joisino_ (Twitter) ã«ã¦çºä¿¡ãã¦ãã¾ãããã²ãã©ãã¼ãã¦ãã ãããã
ä½è¤ ç«é¦¬ï¼ãã¨ã ãããã¾ï¼
京é½å¤§å¦æ å ±å¦ç ç©¶ç§å士課ç¨ä¿®äºãåå£«ï¼æ å ±å¦ï¼ãç¾å¨ãå½ç«æ å ±å¦ç ç©¶æå©æãèæ¸ã«ã深層ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®é«éåããã°ã©ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ããæé©è¼¸éã®çè«ã¨ã¢ã«ã´ãªãºã ããããã