NeurIPS2024ã§ã¯"Multi-Agent"ãé¡åã«å
¥ã£ã¦ããç 究ã34件ããã¾ããã
ãããã®ããã¤ãããããã¯ãã¨ã«ç´¹ä»ãã¾ãã
ãã«ãã¨ã¼ã¸ã§ã³ã深層強åå¦ç¿
è¨ç·´æ¹æ³/æé©åææ³ã«ã¤ãã¦ã®ææ¡ã»æ¹å
- Li et al.ã®ç 究ã§ã¯ããã©ã¡ã¼ã¿å ±æã«ãã£ã¦ããªã·ã¼ãå質åãã¦ãã¾ããã¨ãé²ãæ°ããã¢ããã¼ããææ¡ãã¦ãã¾ãããã©ã¡ã¼ã¿å ±æã«ããè¨ç·´ã®å¹çåãç¶æããªããããªã·ã¼ã®å¤æ§æ§ãä¿é²ã§ãã¾ãã
- ãã«ãã¨ã¼ã¸ã§ã³ãå¼·åå¦ç¿ã模å£å¦ç¿ã«å¿ç¨ããç 究ãäºã¤ãã£ãã®ãèå³æ·±ãã§ãï¼Bui et al.ã¨Tang et al.)ã人éã®è¡åå±¥æ´ãã¨ãã¹ãã¼ããã¼ã¿ã¨ãã¦å¦ç¿ãã人éãæãã¤ããªããããªãã¼ã ãã¬ã¼ãåµåºããã®ãç®çã§ãã
- Hu et al.ã§ã¯ã大ããªã¢ãã«ãããã«å¹çããå¦ç¿ããããã¨ãã¦ãDynamic Sparse Training (DST)ã®ãã«ãã¨ã¼ã¸ã§ã³ãã¸ã®æ¡å¼µãææ¡ãã¦ãã¾ããDSTã¨ã¯æé©ãªãçãªããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®ãã©ã¡ã¼ã¿ãè¦ã¤ãããã¯ããã¯ã®ãã¨ã§ãã
- Heterogeneousãªã¨ã¼ã¸ã§ã³ã群ã«ã¤ãã¦ã®å¦ç¿ææ³ã«ã¤ãã¦ãç 究ãããã¾ããHeterogeneousè¨å®ã§ã¯ãã¨ã¼ã¸ã§ã³ãã®æ¹çãä¸åãã¤é次çã«æ´æ°ãã¦ããã¾ããä¸ååã¾ã§ã®ã¨ã¼ã¸ã§ã³ãã®æ¹çãè¦ã¦æ´æ°ãã¦ããã¾ãããããã§ã¯ä¸ååã¾ã§ã®ã¨ã¼ã¸ã§ã³ãã®æ¹çã«èªåã®æ¹çã®æ´æ°ãå¼·ãä¾åãã¦ãã¾ãã¾ãããã®åé¡ãé²ãããã«é©åã«ã¨ã³ãããã¼é ãè¨è¨ããããå¹ åºãæ¢ç´¢ããããã¨ãæå³ããç 究ãããã¾ãï¼Dou et al.ï¼ã
- æ¡æ£ã¢ãã«ãç¨ããææ³ãåºå§ãã¦ãã¾ããZhu et al.ã§ã¯ãæ¡æ£ã¢ãã«ã«ãã£ã¦ãªãã©ã¤ã³ãã¼ã¿ã«ã¯ãªããããªtrajectoryãçæãããã«ãã¨ã¼ã¸ã§ã³ãã®å¦ç¿ãå¹çåããã¦ãã¾ãã
-
McClellan et al.ã§ã¯ãã«ãã¨ã¼ã¸ã§ã³ãç°å¢ãæã«å¯¾è±¡æ§ãæã¤ãã¨ã«çç®ãã¾ãï¼ä¸å³åç §ãæ¬æããå¼ç¨ï¼å¯¾ç§°æ§ã¨ããã°Graph Neural Networkã§ãããæ¬è«æã§ã¯GNNããã«ãã¨ã¼ã¸ã§ã³ã深層強åå¦ç¿ã®å¦ç¿ãå©ãããã¨ãç´¹ä»ãã¦ãã¾ãã
-
ä»ã«ããã«ãã¨ã¼ã¸ã§ã³ã深層強åå¦ç¿ã®ããã®ãã¡ã¤ã³é©å¿ãè¡ãç 究ï¼Jiang et al.ï¼ãå®å ¨æ§ãæ ä¿ããç 究(https://nips.cc/virtual/2024/poster/93564)ãããã¾ãã
æ°ã¢ãã«ã®ææ¡
- D. Lee et al.ã¯åç©ã®èªç¥ããã»ã¹ããçæ³ãå¾ã¦ãåã¨ã¼ã¸ã§ã³ãããããã£ã©ã¯ã¿ã¼ï¼ç¹æ§ã®ãããªãã®ï¼ã帯ã³ã¦ããã¨èãã¾ããããã¦ãã¨ã¼ã¸ã§ã³ããã¡ã¯ä»ã®ã¨ã¼ã¸ã§ã³ãã®è¦³æ¸¬ã¨è¡åã®ãã¢ãããã®ã¨ã¼ã¸ã§ã³ãã®ãã£ã©ã¯ã¿ã¼ãæ¨æ¸¬ããä»å¾ã®è¡åãäºæ¸¬ãã¾ãã
- é層åã®ãã¼ã ã®ç 究ã¨ãã¦ãDing et al.ãããã¾ãããã®è«æã§ã¯ãä¸ä½ã¬ãã«ã®ã¨ã¼ã¸ã§ã³ãã¯ä¸ä½ã¬ãã«ã®ã¨ã¼ã¸ã§ã³ããããå ã«ææ決å®ãè¡ããä¸ä½ã¨ã¼ã¸ã§ã³ãããã®è¡åãä¸ä½ã¨ã¼ã¸ã§ã³ãã«ä¼éãã¾ãããããããã¨ã§ããã¼ã ãã¬ã¼ã®å®ç¾ãã¹ã ã¼ãºã«ãã¦ãã¾ããçè«çã«ã¯ãSeqCommã«ãã£ã¦å¦ç¿ãããããªã·ã¼ã¯ãå調ã«æ¹åãããåæãããã¨ãä¿è¨¼ããã¦ãããã¨ã証æãã¦ãã¾ã
ãã«ãã¨ã¼ã¸ã§ã³ãÃLLM
- æè¿ã®ç 究ã§ã¯ãLLMã«RLHF (Reinforcement Learning from Human Feedback)ãå¹æçã¨ãããã¦ãã¾ããç¹ã«PPOã使ç¨ããä¸è¬çãªRLHFã¯ä¸ã®3ã¤ã®å³ã§èª¬æã§ãã¾ãã
ã¾ããäºåå¦ç¿æã«ã¯ãªãã£ãæ å ±ãPretrained modelã«æãè¾¼ã¾ããSupervised Fine Tuning (SFT)ãè¡ãã¾ãããã®æ¬¡ã«ãããããã³ããã«å¯¾ããããåçãã©ããããè¯ãããè©ä¾¡ããå ±é ¬ã¢ãã«ãè¨ç·´ãã¾ããããã¯äººéãè¤æ°ã®åçåè£ãã©ã³ã¯ä»ããããã®ãæ師ãã¼ã¿ã¨ãã¾ãããããHuman Feedbackã¨è¨ãããçç±ã§ããæå¾ã«SFTã«ãã£ã¦è¨ç·´ãããè¨èªã¢ãã«ãæ¹çã¢ãã«ã¨ãã¦ãå ±é ¬ã¢ãã«ã¨åããã¦å¼·åå¦ç¿ã®æé©åã¢ã«ã´ãªãºã ã§ããPPOã«ãã£ã¦ããããè¨ç·´ãã¾ãã Ma et al.ã§ã¯ãã®Fine Tuningä½æ¥ãè¤æ°ã®ã¨ã¼ã¸ã§ã³ãã§è¡ããã¨ã§ç²¾åº¦ã®åä¸ãã¯ãã£ã¦ãã¾ãã- ã¾ãã"Language Grounded Multi-Agent"ï¼è¨èªã«åºã¥ããã«ãã¨ã¼ã¸ã§ã³ãï¼ã®ç 究ãçãã«è¡ããã¦ãã¾ããã¨ã¼ã¸ã§ã³ãã©ããã人éã«ã¯è§£èªã§ããªãéä¿¡ãããã³ã«ã§ã³ãã¥ãã±ã¼ã·ã§ã³ãåã£ã¦ã»ãããªãç¶æ³ããã°ãã°ããã¾ãããã¨ãã°ãããããã¨äººéã®å ±åä½æ¥ãªã©ã§ããLi et al.ã®ç 究ã§ã¯ãã¼ã ã¯ã¼ã¯ã«å¤§äºãªæ½è±¡çãªã³ãã¥ãã±ã¼ã·ã§ã³ç©ºéã¨èªç¶è¨èªã®åãè¾¼ã¿ç©ºéããã¾ãæ´åããããã¨ã§ãæ°ããã¿ã¹ã¯ã«ããã¼ã ã¯ã¼ã¯ãã§ããããã«ãã¦ãã¾ãã
- ã»ãã«ãéèåéã¸ã®å¿ç¨ï¼Yu et al.ï¼ãGithubã®issue解決ã¸ã®å¿ç¨ï¼Tao et al.ï¼ãããã¾ãã
ãã®ä»ã®å¿ç¨ç 究
ãã«ãã¨ã¼ã¸ã§ã³ãã®å¿ç¨ç 究ãè¤æ°ããã¾ããè¤æ°ã®é¢¨åçºé»æ©ã®å調å¶å¾¡ãæ±ã£ãMonroc et al.ãèªåé転ãæ±ã£ãLiu et al.ãWu et al.ãæ°å¦ã®åé¡ã解ãããã®LLMããã³ããæè¡ã®ææ¡ããLei et al.ãªã©ãããã¾ãã
ã©ã¤ãã©ãª/ãã³ããã¼ã¯
ãªã¼ãã³ã½ã¼ã¹ã®ã©ã¤ãã©ãªããã³ããã¼ã¯ç°å¢ã®çºè¡¨ãç®ã«ã¤ãã¾ããã Rutherfordãã®JAXã«ãããã«ãã¨ã¼ã¸ã§ã³ã深層強åå¦ç¿ã©ã¤ãã©ãª(JAXMARL)ããã«ãã¨ã¼ã¸ã§ã³ãç 究ã®ããã®ãã³ããã¼ã¯ãã¼ã«ï¼BenchMARLï¼ãªã©ãããã¾ãã
ãã®ããã°ã¯æ ªå¼ä¼ç¤¾EfficiNet Xã®ããã¯ããã°ã§ãã efficinetx.co.jp