æ¬ç¨¿ã§ã¯ Even GPT-5.2 Can't Count to Five: The Case for Zero-Error Horizons in Trustworthy LLMs ããã¨ã«ãæå 端㮠LLM ãæªã ã«ããç°¡åãªåé¡ã§ãããã¹ãããã¨ãè°è«ãã¾ãã
å
·ä½ä¾ã¨ãã¦ã¯ã11000 ã«å«ã¾ãã 1 ã®æ°ãå¶æ°ã奿°ãèãã¨ãgpt-5.2-2025-12-11 ã¯å¥æ°ã¨çãã¾ããã¾ãã((((()))))) ã®ã«ãã³ã®ãã©ã³ã¹ãåãã¦ãããèãã¨ãåãã¦ããã¨çãã¾ãã127Ã82 ãè¨ç®ãããã¨ã10314 ã¨çãã¾ãï¼æ£è§£ã¯ 10414ï¼ããã®ãã¨ã¯ä»¥ä¸ã®ã³ãã³ãã§ç¢ºèªã§ãã¾ãã
ããã㯠API ãã¼ $OPENAI_API_KEY ããè¨å®ããã°ã³ããã§èª°ã§ã試ããã®ã§ãã²è©¦ãã¦ã¿ã¦ãã ãããã
GPT-5.2 ã¯æµä½åå¦ã®è¤éãªã·ãã¥ã¬ã¼ã·ã§ã³ãè¡ããã¢ã»ã³ããªè¨èªã®ããããªæé©åãã¯ããã¯ãé§ä½¿ãã¦ä½ã¬ã¤ã¤ã¼ããã°ã©ãã³ã°ãããªããã¨ãã§ãã¾ãããã¯ã人éã®è½åãä¸åã£ããã«è¦ãã¾ãããæªã ã«äººéããããã¨èããããªããããªæããªãã¹ãç¯ããã¨ãããã¾ãããã®ãããªè½åã®ã¡ãã¯ãããä¿¡é ¼æ§ã®é«ãé åã« LLM ãå±éããã¨ãã®èª²é¡ã«ãªã£ã¦ãã¾ãï¼ããã¦ãã®ã¡ãã¯ããã®ãããã§äººéã¯ã¾ã LLM ã«å®å
¨ã«ä»äºã奪ããã¦ãã¾ãããï¼å¤§è¦æ¨¡ãªéèåå¼ããã AI ããé«åº¦ãªéèçè«ãé§ä½¿ãããã¨ã§ã127Ã82 ãè¨ç®ãã¹ãã¦å¤§æã被ã£ããã©ãã§ãããããååçãå¸ã AI ãç¶æ
ãã©ã° 11000 ã« 1 ã奿°åç«ã£ã¦ããã¨èãã¦åä½ä¸ã®ååçã®æãéãã¦ãã¾ã£ããã©ãã§ãããããç®ãå½ã¦ããã¾ããã
ãã®è«æã§ã¯ããã®è½åã®ãç©´ããè©ä¾¡ããããã«ã¼ãã¨ã©ã¼å¢ç (Zero-Error Horizon; ZEH) ã¨ããææ¨ãææ¡ãã¦ãã¾ãã
ã¢ãã«ãã¿ã¹ã¯ãããã³ãããä¹±æ°ãåºå®ãã¾ããä¾ãã°ã¢ãã«ã¯ gpt-5.2-2025-12-11 ãã¿ã¹ã¯ã¯æãç®ãããã³ãã㯠{"instructions": "Answer with only the integer.", "input": "{a}*{b}="} ã§ããåé¡ãµã¤ãºã®å°ããé ã«ãã¹ã¦ã®åé¡ä¾ãå
¥åããã¨ãããµã¤ãº n ã¾ã§ã¯å
¨ã¦æ£è§£ãããããµã¤ãº n + 1 ã§å¤±æããåé¡ãããã¨ããã¼ãã¨ã©ã¼å¢ç㯠n ã§ããã¨ãã¾ããééãããµã¤ãº n + 1 ã®åé¡ä¾ããªããã¿ã¼ (ZEH limiter) ã¨å¼ã³ã¾ããã¼ãã¨ã©ã¼å¢çã¨ãªããã¿ã¼ã¯åºæ¬çã«ã¯å
¨æ¢ç´¢ã§æ±ãã¾ãï¼è«æã§ã¯å°ãé«éåããæ¹æ³ã«ã¤ãã¦ãè¿°ã¹ã¦ãã¾ãï¼ã
ä¾ãã°ãåé¡ãµã¤ãºã a 㨠b ã®å¤§ããæ¹ã®å¤ã¨ããã¨ãgpt-5.2-2025-12-11 㯠126 ã¾ã§ã®æãç®ï¼è¨ 126Ã126 = 15876åï¼ ã«ã¯å
¨ã¦æ£è§£ãã¾ããã127Ã82 ã§ééããã®ã§ãã¼ãã¨ã©ã¼å¢ç㯠126ããªããã¿ã¼ã¯ 127Ã82 ã§ãã
åé¡ãµã¤ãºãæååé·ã¨ããã¨ãgpt-5.2-2025-12-11 㯠4 æåã¾ã§ã® 01 æååï¼è¨ 24 = 16 åï¼ã«ã¤ãã¦ã¯ 1 ã®æ°ã®å¶å¥ã«å
¨ã¦æ£è§£ãã¾ããã11000 ã§ééããã®ã§ã¼ãã¨ã©ã¼å¢ç㯠4ããªããã¿ã¼ã¯ 11000 ã§ãã
ã¾ããgpt-5.2-2025-12-11 㯠10 æåã¾ã§ã®ã«ãã³åï¼è¨ 210 = 1024åï¼ã«ã¤ãã¦ãã©ã³ã¹ãåãã¦ããããå
¨åæ£è§£ãã¾ããã((((()))))) ã§ééããã®ã§ã¼ãã¨ã©ã¼å¢ç㯠10ããªããã¿ã¼ã¯ ((((()))))) ã§ãã
ã¼ãã¨ã©ã¼å¢çã§ã¯ããã³ããï¼æèï¼ã¨ä¹±æ°ã¯åºå®ãã¦ãããã¨ã«æ³¨æãã¦ãã ãããããã³ãããä¹±æ°ãå¤ããã¨æ£è§£ãããã¨ãããã§ããããã¦ã§ãä¸ã® ChatGPT ã¯ãAPI çµç±ã®å ´åã¨ã¯ä¹±æ°ãããã³ãããéãã¾ãããã11000 ã«ã¤ãã¦æ£è§£ããããããã¾ãããããããããã³ãããä¹±æ°æ¬¡ç¬¬ã§ç°¡åãªåé¡ã§ãééãããã¨ãããã¨ãéè¦ã§ãããã¤ãªã¹ã¯ãªé åã§ã¯ã100 åã« 1 åã§ãééãã¦ãã¾ãã®ã§ã¯å¤§åé¡ã§ããã¾ãããªããã¿ã¼ã®ä¸ã«ã¯ããã³ãããä¹±æ°ã®å¤åã«å¯¾ãã¦æ¯è¼çé å¥ãªãã®ãåå¨ãã¾ãã((((()))))) ããã®ä¾ã§ãã((((()))))) ã¯ãã©ã³ã¹ãã¦ããï¼ ã¨ã¦ã§ãã® ChatGPT ã«èãã¨ãããªãã®ç¢ºçï¼50% ãããã§ããããï¼ã§ãã©ã³ã¹ãã¦ããã¨çãããã¨ããããã¾ãããGPT-5.2-Thinking ã®ããã«æèã®é£é (Chain-of-Thought) ã許å¯ãã¦ããã¹ãã¾ããGPT-5.2 ã¯æ¬è³ªçã«ãã®åé¡ãè¦æãªããã§ãããã²è²ããªã¢ãã«ãããã³ããã§è©¦ãã¦ã¿ã¦ãã ãããã

ããããæãç®ãã«ãã³ã®å¯¾å¿ã LLM ã«è§£ãããªãã ããã¨æãæ¹ãããããããã¾ãããããã®ãããªåºæ¬çãªåé¡ã¯è¤éãªåé¡ã®ãµãã¿ã¹ã¯ã¨ãã¦ç»å ´ãããã¨ãããã¾ããè¤éãªæ°å¦ã®åé¡ãæèã®é£éã§è§£ãã¨ããéä¸å¼ã§æãç®ãåºã¦ãããã¨ãããã¾ããããã§ãã¹ãããã¨ãã®ãã¹ã伿ãã¦æçµçµè«ãééãããããã¾ãããé»åã Python ããã°ã©ã ãå¼ã³åºãã°ããããããã¾ãããããã®ãããªåç´ãªãµãã¿ã¹ã¯ã§ããæ¯åãã¼ã«ãå¼ã³åºãã®ã¯å¤§å¤ã§ããããã¼ã«å¼ã³åºããããã¹ããã®å¤æããã¹ãããã¨ãããã¾ããå®éãGPT-5.2-Thinking ã¯ãã¼ã«å¼ã³åºãã許å¯ããã¦ããã«ãããããããå¼ã³åºããã« ((((()))))) ã®ã«ãã³ãèªåã§æ°ãã¦ãã¹ãã¦ãã¾ã£ã¦ãã¾ãã
ã¼ãã¨ã©ã¼å¢çã¯ããã®ãã㪠LLM ã®è½åã®ã¡ãã¯ããããç©´ãã广çã«å¤å®ã§ãã¾ããã¾ããæ¬¡ã®ãããªæ°å¤ãã®ã¡ãªãããããã¾ãã
ãªããã¿ã¼ã確åºã¨ãã証æ ã«ãªã
ã¼ãã¨ã©ã¼å¢çã n 以ä¸ã§ãããã¨ã¯ä¸ã«æ²è¼ããã³ãã³ããå®è¡ããã°èª°ã§ãä¸çºã§æ¤è¨¼ã§ãã¾ããå®éã«ã³ãã³ããå®è¡ãã¦åºåãè¦ãã° GPT-5.2 ããããã®åé¡ã§ãã¹ãããã¨ã¯ç«ãè¦ããããæããã§ããã誰ã§ããããããªç´å¾ããããã¨ãã§ãã¾ããããã¯æ°å¦çã«ãã³ãã¥ãã±ã¼ã·ã§ã³ã®ä¸ã§ã好ã¾ããã§ãã
èªåçã«é©ãã®ããçµæãå¾ããã
GPT-5.2 ã 11000 ã® 1 ã®æ°ãã«ã¦ã³ãã§ããªãã((((()))))) ããã©ã³ã¹ãã¦ãããåãããªããã¨ããçµæã¯é©ãã§ãã示åã«å¯ã¿ã¾ãããããã®ãªããã¿ã¼ã¯ã¼ãã¨ã©ã¼å¢çãè©ä¾¡ããã¨èªåçã«å¯ç£ç©ã¨ãã¦å¾ããã¾ãã((((()))))) ã¯ãããã£ã½ããä¾ã§ããã試è¡é¯èª¤ã§æ¢ããããã§ã¯ãªããæãå°ããééããèªåã§è©ä¾¡ããçµæçºè¦ãã¾ãããããã㯠GPT-5.2 ãééããåé¡ã®ä¸ã§æãå°ããç°¡åãªä¾ãªã®ã§ããããªã«ç°¡åãªä¾ã§ãééããã¨ããç¹ã§æå¤§ç´ã®æ´å¯ã¨é©ããå¾ããã¾ãã
ãã®ãã¨ã¯æµå¯¾çä¾ (adversarial example) ã¨ä¼¼ã¦ãã¾ãããå®éä¸ã®æç¾©ã¯ç°ãªãã¾ãã æµå¯¾çä¾ã¯ä¸èªç¶ã§ãåå¸å¤ã®ä¾ãªã®ã§ã¢ãã«ãééãã®ã¯ããæå³å½ç¶ã§ãï¼ãããééãæ¹ãæ£ããã¨ãè¨ãã¾ãã詳ãã㯠人間には認知できない情報を活用するAIたち - ジョイジョイジョイ ãèªãã§ã¿ã¦ãã ãããï¼ã䏿¹ããªããã¿ã¼ã¯èªç¶ã§ãæ®éã«èµ·ããããä¾ã§ããã«ãããããããããã¦ããã¾ã§å°ããªç°¡åãªä¾ã§ããã«ããããããã¢ãã«ããã¹ãããã¨ããç¹ã§ãå®éä¸ã®æç¾©ã¨é©ããããã¾ãã
æ£è§£çã«ã¯ã¹ã±ã¼ã«ã®æ£ææ§ããããã¼ãã¨ã©ã¼å¢çã«ã¯ãªã
æ£è§£ç (accuracy) ã¯æããã使ãããè©ä¾¡ææ¨ã§ãããæ£è§£çãè©ä¾¡ããããã«åé¡ã®ç¯å²ã人éã®è©ä¾¡è
ããããããå®ããªãã¦ã¯ãªãã¾ãããä¾ãã°æãç®ã®æ£è§£çãæ±ããã¨ãã1Ã1 ãã 99Ã99 ã®åé¡ã®ç¯å²ã®æ£è§£çãè©ä¾¡ããããªã©ã¨å®ãã¾ãããããããã®ç¯å²ã®è¨å®ããå
å
¥è¦³ã«å·¦å³ããããã¨ããããã¾ãè©ä¾¡è
ãèªèº«ã®ææ³ãè¯ãè¦ããããã®æä½ã®å¯¾è±¡ã«ãªããã¨ãããã¾ãã以ä¸ã®å³ã¯ Qwen2.5-7B-Instruct 㨠Qwen2.5-72B-Instruct ã®æãç®ã®è©ä¾¡çµæã§ãã

72B ã¢ãã«ã 7B ã¢ãã«ã«å§ç¸®ãããã¨ãææ¡ãã人ã¯ãå·¦ã®å³ãçãä¸ã®å³ãè¦ãã¦ãã72B ã¢ãã«ã 7B ã¢ãã«ã« 10 å以ä¸å§ç¸®ãã¦ã精度ã¯ã»ã¨ãã©è½ã¡ãªãã£ããã¨ä¸»å¼µããããããã¾ãããããã«é¨ãããèªè ãããã§ããããããããå³ã®å³ã®ããã«å¥ã®ã¬ã³ã¸ã§è©ä¾¡ããã¨ãå ¨ãå¥ã®å¾åã«ãªãã¾ãããã®ããã«ãåé¡ã®ç¯å²æ¬¡ç¬¬ã§çµæã¯å¤§ããå¤åãã¾ãããè©ä¾¡è ã®å å ¥è¦³ãæ£æã§ç¯å²ã決å®ããããã«ãè©ä¾¡ã«ãã¤ã¢ã¹ãå ¥ãè¾¼ããã¨ãããã¾ãã
䏿¹ãã¼ãã¨ã©ã¼å¢çã¯ã¢ãã«èªèº«ãå®ãã¾ãã人éãæ£æçã«è©ä¾¡ç¯å²ã決ããä½å°ã¯ããã¾ããããã®ããã22 vs 42 ã¨ããããã«ãç¯å²ã®è¨å®ã«å·¦å³ãããªã客観çãªå¤ãå¾ããã¾ãã
åé¡ã®ç¯å² = é£åº¦ããããããåºå®ããã«ã¢ãã«èªä½ã«æ±ºå®ãããã¨ããã®ãã¼ãã¨ã©ã¼å¢çã®å¤§ããªç¹å¾´ã§ãã
ææ¨ã¨ãã¦æä»£é ãã«ãªãã¥ãã
ç¯å²ããããããåºå®ãããã³ããã¼ã¯ã¯æä»£é ãã«ãªãã¾ãã1Ã1 ãã 50Ã50 ã¾ã§ã® 2500 åãããªããã³ããã¼ã¯ã¯ 7B ã 72B ã¢ãã«ã®è½åãã»ã¨ãã©è¦åãããã¾ããã99Ã99 ã®ãã³ããã¼ã¯ã¯è¦åãããã¦ãã¾ãããããã飽åããã§ããããMNIST ã CIFAR-10 ã GLUE ããåãéå½ã辿ã£ã¦ãã¾ããã
䏿¹ãã¼ãã¨ã©ã¼å¢çã¯é£åº¦ããããããåºå®ãããã¢ãã«ã®è½åã«ãããã¦ãªã¼ãã³ã¨ã³ãã«é£åº¦ãè¨å®ãããã®ã§ãæä»£é ãã«ãªãã¥ããã§ãã
æ§é çãªã¨ã©ã¼ãã¿ã¼ã³ãåªéã§ãã
æ£è§£æ°ãåãã¢ãã«ã§ããééãæ¹ã®ãã¿ã¼ã³ã¯æ§ã ã§ãã以ä¸ã¯ã©ã¡ããæ£è§£çã 90% ã®ãã¿ã¼ã³ã§ãããæ§é ãå ¨ãéãã¾ãã

å·¦ã®ãããªã©ã³ãã ãªãã¿ã¼ã³ã«ã¯ãç©´ããå¤ããã¼ãã¨ã©ã¼å¢çã¯ä¼¸ã³ã¾ãããå³ã®ããã«ç°¡åãªåé¡ã確å®ã«æ£è§£ãããµã¤ãºã大ããªé£ããåé¡ããé å½ã«ãééããã¢ãã«ã¯ã¼ãã¨ã©ã¼å¢çã大ãããªãã¾ããåãæ£è§£çã§ããå³ã®ãããªééãæ¹ãããæ¹ãæ±ãããã好ã¾ããã§ããæ£è§£çã§ã¯ãã®åºå¥ã¯ã¤ãã¾ããããã¼ãã¨ã©ã¼å¢çã§ã¯åºå¥ãã¤ãã¾ãã
ä¾ãã° Qwen2.5-72B-Instruct ã® 1Ã1 ~ 99Ã99 ã®æ£è§£ç㯠98.6% ã§ããããå®å
¨ã«ã©ã³ãã ã«ãã¹ãã¦ããã¨ãã¼ãã¨ã©ã¼å¢ç㯠10 æªæºã«ãªãã¯ãã§ãã1Ã1 ãã 10Ã10 ã¾ã§ã«ã¯ 100 åããã®ã§ãééãã確çã 1.4% ã ã¨ãã®ç¯å²ã§ 1.4 åç¨åº¦ééãããã§ããããããQwen2.5-72B-Instruct ã®ã¼ãã¨ã©ã¼å¢çã®å®æ¸¬å¤ã¯ 42 ã§ããã¤ã¾ããQwen2.5-72B-Instruct ã¯ç°¡åãªåé¡ã¯ç¢ºå®ã«è§£ããé£ããåé¡ãããç¨åº¦ãé å½ã«ãééãã¦ããã¨ãããã¨ãåããã¾ããããã¯æ£è§£ç㯠98.6% ã®ä¸ã§ããQwen2.5-72B-Instruct ã¯å®ç¨ä¸æ±ããããééãæ¹ããããã¨ã示ãã¦ãã¾ãã
LLMのキモい算術 - ジョイジョイジョイ ã LLM のアテンションと外挿 - ジョイジョイジョイ ã§ç´¹ä»ããããã«ãLLM ã¯æ§ã ãªæ¹æ³ã§æ¨è«åé¡ãè§£ãã¦ãããã¨ãç¥ããã¦ãã¾ãã
æè¨ãå ç¢ã§ãªãæ¹æ³ã§åé¡ãè§£ãã¦ããã¨ãç©´ãã¯å¤ããªãã¼ãã¨ã©ã¼å¢çã¯å°ãããªãã§ããããã¼ãã¨ã©ã¼å¢çã大ããããã«ã¯ãå ç¢ãªã¢ã«ã´ãªãºã ãã«ã¼ã«ã身ã«ã¤ããå¿ è¦ãããã¾ããã¼ãã¨ã©ã¼å¢çãè©ä¾¡ææ¨ã¨ãã¦ç¨ãããã¨ã§ãåãæ£è§£çã®ä¸ã§ããã®ãããªå ç¢ãªã¢ã«ã´ãªãºã ã®ç²å¾ãä¿é²ã§ããã¨èãããã¾ãã
ãã®ããã«ãã¼ãã¨ã©ã¼å¢çã¯è©ä¾¡ææ¨ã¨ãã¦æ£è§£çã«ã¯ãªã好ã¾ããæ§è³ªãè¤æ°ãã¡ãLLM ã®ä¿¡é ¼æ§ãä¸å®å®æ§ãè©ä¾¡ããä¸ã§ä¾¿å©ã§ãããã²ãèªç¤¾ã®ã¢ãã«ã®è©ä¾¡ãããããèªåã§ä½¿ãã¢ãã«ã®é¸å®ã«æ´»ç¨ãã¦ã¿ã¦ãã ãããã
ãããã«
SNS ãçºãã¦ããã¨ãLLM ããããªã«ãããåé¡ãè§£ããããã«ãªã£ãï¼ãã¨ãããã¥ã¼ã¹ã¨ã LLM ã¯ã¾ã ãããªã«æããªééããããï¼ãã¨ãããã¥ã¼ã¹ã§ããµãã¦ãã¾ãããã®ãããªè½åã®ã®ã£ãããé常ã«å¤§ãããã¨ã LLM ã®æ±ãã¥ããã®è¦å ã ã¨æãã¾ãã
ãã®ç ç©¶ã§ã¯ãã®ãã¡ã LLM ã¯ã¾ã ãããªã«æããªééããããï¼ãã®æ¹åã®ä¸»å¼µãã·ã¹ããããã¯ã«è¡ãæ¹æ³ãæ´çã§ããã¨ãããæ°ã«å ¥ã£ã¦ãã¾ãã
GPT-5.2 ãè¦ã¦ããã¨ãã¾ã ãç©´ãã¯æ°å¤ããããAI ã®å°»ã¬ãããããä»äºã¯ãã°ããç¶ãããã«æãã¾ãããã®ç©´ãåã¾ãæ¥ã¯ããã®ã§ãããããçãããèãã¦ã¿ã¦ããã ããã°å¹¸ãã§ãã
èè æ å ±
ãã®è¨äºãããã«ãªã£ãã»é¢ç½ãã£ãã¨æã£ãæ¹ã¯ SNS ãªã©ã§ææ³ããã ããã¨å¬ããã§ãã
æ°çè¨äºãã¹ã©ã¤ã㯠@joisino_ (Twitter) ã«ã¦çºä¿¡ãã¦ãã¾ãããã²ãã©ãã¼ãã¦ãã ãããã
ä½è¤ ç«é¦¬ï¼ãã¨ã ãããã¾ï¼
京é½å¤§å¦æ å ±å¦ç ç©¶ç§å士課ç¨ä¿®äºãåå£«ï¼æ å ±å¦ï¼ãç¾å¨ãå½ç«æ å ±å¦ç ç©¶æå©æãèæ¸ã«ã深層ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®é«éåããã°ã©ããã¥ã¼ã©ã«ãããã¯ã¼ã¯ããæé©è¼¸éã®çè«ã¨ã¢ã«ã´ãªãºã ããããã