人éã®ããã³ããã«ã¯å¿ å®ã«å¾ãã決ããããæ示ãå¾é ã«ããªãã¨æãããAIããå®ã¯ãæ¼æãããã¦æ¬å¿ãé ãå¯è½æ§ãããã¨å ±åããã¾ãããAIãæ害ãªçºè¨ãããªãããã«ããè¨ç·´ãæå³ãæããªããªãå±éºæ§ãææããã¦ãã¾ãã Alignment faking in large language models \ Anthropic https://www.anthropic.com/research/alignment-faking AIä¼æ¥ã®Anthropicã«ããã¨ãäºåã®å¦ç¿ã¨å¾ã®å¼·åå¦ç¿ã§ç¸åãããã¨ãæããããAIã¯ãäºåã®å¦ç¿ã§èº«ã«ã¤ããææ³ãé ãã¦ã表é¢çã«ã¯å¼·åå¦ç¿ã«å¾ãããã«æ¯ãèããã¨ãããã¨ã®ãã¨ã ä¾ãã°ãç¹å®ã®æ¿å ãæ¯æããããã«å¦ç¿ããã¢ãã«ããå¾ã«ä¸ç«ã«ãªãããã«è¨ç·´ãããã¨ããã¨ãç¹å®ã®æ¿å ãæ¯æãã¦ãããã¨ãé ããªããä¸ç«ã«è¦ããæ¼æãããå¯è½æ§ãããããã§ã
{{#tags}}- {{label}}
{{/tags}}