ãUnityãUnityã§æ©æ¢°å¦ç¿ãããML-Agentããè²ã ã¨è©¦ãã¦å¾ãç¥è¦ã¨ã
å»å¹´ã®ï¼æé ãUnityã§æ©æ¢°å¦ç¿ãè¡ã ML-Agentï¼ver 0.01)ãå ¬éããã¾ããã
ãã®ã©ã¤ãã©ãªã使ç¨ããã¨ãUnityä¸ã§ä½æããã²ã¼ã ã§æ©æ¢°å¦ç¿ã«ããAIãå®è¡ãããåºæ¥ããã§ãã
æè¿ãã®ML-Agentã使ç¨ãã¦è²ã ã¨è©¦ãã¦ããã®ã§ãããããç¨åº¦å½¢ã«ãªã£ãã®ã§è²ã ã¨ãã¡ãã®è¨äºã«ã¡ã¢ãæ®ãã¾ãã
Â
ç®æ¬¡
 æ©æ¢°å¦ç¿ï¼
æ©æ¢°å¦ç¿ã¯æè¿é¨ããã¦ããæè¡ã®ä¸ã¤ã§ãAIã®ä¸ç¨®ã§ãã
Â
ãã®ç¹å¾´ã¯ããæ示çã«ããã°ã©ã ãããäºãªããçµé¨ããå¦ç¿ãããã¨ããç¹ã«ããã¾ãããã®å¦ç¿ã®ããã»ã¹ã¯ãæä¾ããããã¼ã¿ã«åºã¥ãã¦ãã¿ã¼ã³ãæ½åºã観測ãè¡ãææ決å®ãè¡ãã¨ãããã®ã§ãã
Â
å¼·åå¦ç¿ï¼
æ©æ¢°å¦ç¿èªä½ã¯æ§ã ãªã¢ã«ã´ãªãºã ãããã®ã§ãããUnityã®ML-Agentã§ã¯Reinforcement Learningï¼å¼·åå¦ç¿ï¼ã®ã¢ã«ã´ãªãºã ãæ¡ç¨ããã¦ãã¾ãï¼ç·è²ï¼ã
å¼·åå¦ç¿ã®ãµã¤ã¯ã«ã¯ãAgentãä½ããã®ã¢ã¯ã·ã§ã³ãè¡ããç°å¢ã«åæ ããã®çµæãã¹ãã¼ãåã³å ±é ¬ã¨ãã¦åãåããã¨ãããã®ã§ãã
AIã¯ãä½ããããã©ãå¤åãããã¨ããç¹ã観測ãæ½è±¡åããè¯ãå ±é ¬ãåãã/æªãå ±é ¬ãåããªãçºã«ã¯ä½ãããã°ããããã¨ããäºãå¦ç¿ãã¦ãã訳ã§ãã
- Action : AIæ§ã®æ示
- State : AIæ§ã観測ãã¦ããã²ã¼ã å æ å ±
- Reward : æ示ã®çµæãè¯ããªã£ãããè¤ç¾ãæªããªã£ãããä»ç½®ã
Â
å ±é ¬ã¯"çµæ"ã§ã¯ãªã"é£ç¶ããåä½"ã«ä¸ãããã
é¢ç½ãã®ããå ±é ¬ã常ã«ä¸ããå¿ è¦ã¯ç¡ãã¨ããç¹ã§ãã
æçµçãªçµæã«å¯¾ãã¦ãæåããã»å¤±æãããã¨ãã£ãæ å ±ãä¸ãã¦ããã¨ãé£ç¶ããåä½ãéç®ãã¦åä½ãè©ä¾¡ãã¦ãããã¿ããã§ãã
Â
ä¾ãã°ããã¼ã«ãè½ã¨ããªããã¨ãã£ãç®çã®å ´åãæ®éã«åä½ã®è©ä¾¡ãè¡ãå ´å è½ã¡ãããã«ã±ã¨ãªãåä½ãæ¢ãã»è©ä¾¡ããã³ã¼ããè¨è¿°ããå¿ è¦ãããã¾ãããä¾ãã°ãä¸ã«åå°ããã¨ï¼0.1ãããæã¡è¿ãè§åº¦ãè¯ããã°+0.1ããçåæéã§+0.1ãçã â¦
å¼·åå¦ç¿ã®å ´åããã¼ã«ãè½ã¡ãã-1ãã¨ããæ å ±ãå ã«ãã©ããã£ããè½ã¡ãã®ãããéç®ã»å¦ç¿ããèªåã§éã¿ãè¨å®ãã¦ããã¾ããéã«ããã¼ã«ãè½ã¡ãªããã°+0.1ãã¨ããã¦ããå ´åããè½ã¨ããªãçºã«è¡ãã¹ãåä½ãã«ï¼è©ä¾¡ãä¸ããããå¦ç¿ãã¾ãã
Â
ãã®è¾ºãããèªåã§ããã°ã©ã ãçµãã§AIãèããå ´åã®ã¢ããã¼ãã¨å¼·åå¦ç¿ã®éãã§ãããªããå ±é ¬ã®æ å ±ãå°ãªãã¨ãçåã®ããã«å¿ è¦ãªã¢ããã¼ããã大éã«è©¦ããªããã°ãããªããªãã®ã§å¦ç¿å¹çã¯è½ã¡ãã¿ããã§ãã
Â
Stateã®å¤åãæ°å¼ã§è¡¨ç¾åºæ¥ãªããã®ã¯ä¸æãå¦ç¿åºæ¥ãªãå°è±¡ã§ãã
ä¾ãã°ãããã¯å´©ãçã®ãåããé£ç¶ãã¦ãããã®ãã¯å°ãªãå¦ç¿ã§ãããªã精度ãåºãã¦ãã¾ãããSTGã®ãããªã©ã³ãã ã«æµãå¼¾åºç¾ããï¼Stateã®ãã¼ã¿ãã¢ããã³ããã«é£ã¶ï¼ã¢ãã¯ãæéããããå²ã«ä»ã²ã¨ã¤ä¸æãåãã¦ããã¦ããªãã§ããã½ã¬ã£ã½ãã¯ãªããã§ããâ¦
Unityã¨å¼·åå¦ç¿
Unityã®ML-Agentãç¨æããå¼·åå¦ç¿ã¯ãä¸ã®ãããªæ§æã«ãªã£ã¦ãã¾ãã
åºæ¬çã«AgentãAIã®æ示ã®å ãEnvironmentï¼ç°å¢ï¼ã«å¯¾ãã¦ä½ããã®ã¢ã¯ã·ã§ã³ãè¡ãçµæãäºã ã¯å¤ããã¾ããã
Agentã¯åä½ã®å®ç¾©ãBrainã¯AIã®åä½ã¢ã¼ããPythonï¼Tensorflowï¼ã¯AIãå¶å¾¡ãã¦ãã¾ãã
éè¦ãªãã¤ã³ãã¯ãBrainã¨Agentãããã¦Python(Tensorflowï¼ã§æ§æããã¦ããã¨ããç¹ã§ããããã®æ¹ãæ°ã«æå³ã¯ããç¨ããã¾ããã
AgentãBrainã«æ¥ç¶ã§ãã¦ãBrainãExternalï¼å¤ï¼ã®å ´åã«å¦ç¿ãå¤é¨Pythonä¸ã§åä½ããTensorflowã§è¡ãç¹ã ãè¦ãã¦ããã°è¯ãããã§ãã
Â
ãªãå¦ç¿çµæãå ã«Pythonä¸ã®Tensorflowããæ示ãããäºãåºæ¥ã¾ããï¼Externalï¼ãTensorFlowSharpãUnityã«å¯¾å¿ãããTSFUnityPluginãçµã¿è¾¼ããã¨ã§ã²ã¼ã å ã«çµã¿è¾¼ãäºãåºæ¥ã¾ãï¼Internalï¼ã
Â
ç¬èªã®ããã¸ã§ã¯ããä½ã£ã¦ã¿ã
å°å ¥æ¹æ³ã¯ãå¤ãã®æ¹ãç´¹ä»ãã¦ããã®ã§å²æãã¾ãã
Â
ä»åã¯ãä¸ã®ãããªã¢ããä½ã£ã¦ã¿ã¾ããã
å ã¨ãªãæ¯ãèãã¯ãä¸ã®åãã¨åããããªãã®ã§ãã
ãã ããä¸ã®ãããªé¬¼çé£æ度ã«ãªã£ã¦ãã¾ãã
- 赤ããã¼ããä¸æ¹åã«æ¨é²åãåºãï¼å¾ãã¨ä¸æ°ã«æ¨ªè»¢ããï¼
- 赤ããã¼ãã®ä½ç½®ã¯æ¯åå¾®å¦ã«éãï¼ä¸¡æ¹åæã«ãã¼ã¹ããããã¨å³è»¢ã¶ï¼
-  å¤å¨ã«æ¿ãæ¥è§¦ãããã²ã¼ã ãªã¼ãã¼
- çãè½ã¨ãã¦ãã²ã¼ã ãªã¼ãã¼
ææ©ããã¯æåã§ã¯é£ã°ãã¾ããããã³ã¬ãé£ã°ãè¨ç®å¼ãèããã®ãé¢åã§ãããããªç¶æ ããã¹ã¿ã¼ããã¦ã¿ã¾ãã
Â
Â
ã·ã¼ã³ã®ã»ããã¢ãã
ã¾ããããã¨ã¯ãã·ã¼ã³å ã«Academyã¨Brainã³ã³ãã¼ãã³ããç»é²ãããã¨ã§ãã
Academyã親ãªãã¸ã§ã¯ããBrainã¯åãªãã¸ã§ã¯ãã¨ãã¦ç»é²ãã¾ããç¹ã«Academyã«ä½ãããªãã®ã§ããã°Template Academyã¨ãè¨å®ãã¦ããã°OKã§ãã
ãªããBrainã¯Agentã®è¨å®ã«ãã£ã¦ã¡ããã¡ããè¨å®ãå¤æ´ããã¾ãã
Â
å¾ã¯Agentãä½ãåä½ãå®ç¾©ããã°è¯ãã§ãã
- ã¢ã¯ã·ã§ã³ã®è¨å®
- Stateã¨ãã¦ä½ãéãã
- å ±é ¬ã®åå¾æ¹æ³
ã¢ã¯ã·ã§ã³ã®è¨å®ã¨Stateã®éä¿¡ã¯ãAgentã¯ã©ã¹ãç¶æ¿ããã¯ã©ã¹ã«è¨è¿°ãã¦ããã¾ããç°¡åãªç©´åãåé¡ã§ãã
ä½ã£ãAgentã«ã¯ã使ç¨ããBrainãç»é²ãã¦ããã¾ãã
Â
ã¢ã¯ã·ã§ã³ã®è¨å®
ã¾ãã¯ã¢ã¯ã·ã§ã³ã®è¨å®ãèãã¦ã¿ã¾ããã¢ã¯ã·ã§ã³ã¯ãæ©æ¢°å¦ç¿AIããã®æ示ã§ããã©ããã£ãæ示ãè¡ãã¹ããªã®ããããèãã¦è¨å®ãã¾ãã
Brainã®Action Sizeã«ãåãåãæ示ã®æ°ãè¨å®ãã¾ãã
Â
ããã§ä¸ã¤é¢åãªæããBrainã®ActionStateTypeã®Discrete 㨠Continuous ã§åãåãæ å ±ãå¤åããç¹ã§ãã
Discreteã®å ´åãé åã«ã¯ï¼ã¤ã®è¦ç´ ããå«ã¾ãã¾ãããã¤ã¾ãact[0]ã«å¤ãæ ¼ç´ãããAction Sizeã®å ã®ããããã®å¤ãæ¥ã¾ãã
Continuousã¯è¤æ°ã®è¦ç´ ãåãåãã¾ããé åã®é·ãã¯Action Sizeã¨åãã§ãè¤æ°ã®æ¯æãåæã«åãåãã¾ãã
Â
è¦ããã«ä¸ã®ããã«Action Sizeãï¼ã®å ´åãDiscreteãªãï¼ãï¼ãContinuousãªãfloat{ 0.2f, 0.1f };ã®ãããªå¤ãåãåãã¾ãã
åç´ã«è²ã ããããªãContinuousã®æ¹ãè²ã ã¨åºæ¥ã¾ãããDiscreteã®æ¹ãå ±é ¬ã¨ã¢ã¯ã·ã§ã³ãçµã³ã¤ãããããçµæãåºãã®ãéãã§ãã
Â
å¾ã¯AgentStepã®ä¸èº«ãå®ç¾©ãã¾ãã
ãã±ããããInputç³»ã®ã³ã¼ããå
¨é¨å¥ãåã£ã¦ãAgentStepããæ示ãåºãã¦ãã¾ããã¡ãªã¿ã«AgentStepã¯FixedUpdateã®ã¿ã¤ãã³ã°ãããã®ã§ãã½ã³ãã¨ã注æ
Â
AgentStepãè¨è¨ããããBrainTypeãPlayerã«ãã¦ã¡ããã¨åãã確èªããã®ãè¶ ãªã¹ã¹ã¡ã§ãããã®æé ãã¹ãããããã¨ãå¾ã§æã£ãããã«åããªãã£ãããã¾ãã
ã¡ãªã¿ã«ä»åä½ã£ããã¢ã§ã¯ãå·¦å³ã®ãã¼ã¹ã¿ã¼ã®ON/OFFãã¦ãã ãã§ãã
ã³ã¼ãã«ããã¨ãããªæã
Â
Stateã®è¨å®
次ã«ãã¢ã¯ã·ã§ã³ã®çµæã²ã¼ã ãã©ã®ããã«å¤åããã®ãã観測ãã¾ãã
ãã®ã°ã©ããå
ã«é¢æ°ãä½æããå¦ç¿ãè¡ãã£ã½ãã®ã§ãéè¦ãªè¦ç´ ã§ãã
Â
ã¾ãç»é²ããã¹ãã¼ãã®æ°ãBrainã«è¨å®ãã¾ãã
ããã¯ãCollectStateã«ä¸èº«ãåãããªã¹ããè¿ãã¦ããã°è¯ãã§ããListã®è¦ç´ ã¯State Sizeã¨åãæ°ã§ããããèªåããè¦è½ã¨ãã¾ãã
forçã§ä¸æ¬åéããå ´åã¯ããã®ããã注æã§ãã
Â
ã¾ãæ°ãå¤åããå ´åã¯æ大æ°ã§ç»é²ããé©å½ãªæ°ã§åãã¾ãããã®éãå¤ã®é£ç¶æ§ã失ãããªãããã«æ³¨æããå¿
è¦ãããæ°ããã¾ãã
ãã®è¾ºãããªãã¸ã§ã¯ãã«ã¦ãã¼ã¯IDãå²ãæ¯ãã®ãè¯ãããããã¾ããã
Â
æ°ã¯æããæ¹ãè¯ãããªã¼ã¨æã£ã¦ã¾ãããã観測ããã®ãé¢åã§ããè¨å¤§ã§ãæ¡å¤è¡ããããããªãããªæãããã¾ããã«ã¡ã©ç»åãã使ããã¿ããã§ããï¼æµç³ã«80x80ç½é»ã®ãããªä½è§£å度ã§ããï¼ã
Â
Rewardï¼å ±é ¬ï¼ã®è¨å®
å¾ã¯å ±é ¬ã§ããå ±é ¬ã¯ç¹ã«å¦ç¿ã«å¼·ãå½±é¿ããã®ã§ãããèããå¿ è¦ãããã¾ãã
å ±é ¬ï¼ï¼ã®ã¢ã¤ãã ãæ£ããã ãã ã¨ãç©åãæ éã«ã¢ã¤ãã ãååãããã¨ãã¾ããããã«å¸¸ã«-0.05ã®å ±é ¬ãä¸ããã¨ãå°ãæ¥ããã¾ããæ¥ãããããã¨èªæ®ºããäºãããã¾ãã
Â
ã©ã®ç¨åº¦å ±é
¬ãè²°ã£ã¦ãããã¯Monitorã¯ã©ã¹ã§ç¢ºèªããã¨è¯ãã§ãã
CollectStateã«Rewardã観測ããæãã®å¦çãè¨è¿°ãã¦ããã°ãã²ã¼ã ä¸ã«å¤ã表示ããã¾ãã
Monitor.Log(key:"reward",value: reward, target:transform );
ãªããRewardã¨Doneã¯å¥ã«Agentã¯ã©ã¹å
ã§ãªãã¨ãè¯ãã§ããä¾ãã°æ¥è§¦ããå¤å¨ã®OnCollisionEnter2Dã§ãæ¥è§¦å¯¾è±¡ãAgentæã£ã¦ããreward -= 1ã¨ããã¦ãè¯ãã§ãã
ç¹ã«Collisionã®ãããªã³ã¼ã«ããã¯ç³»ã¯AgentStepãCollectStateå
ã§è¨å®ããã®ã¯é¢åãããã®ã§ãã³ã¬ãåºæ¥ãäºã¯ç¥ã£ã¦ãããã»ããè¯ãã§ãã
Â
ä»åä½ã£ã奴ã§ã¯ãå ±é ¬ã¯ãï¼çãããã¯ãã±ãããï¼è½ã¡ãã-1ããçåãã¦ããªãæ¯ãã¬ã¼ã +0.1ãã ãã§ãã
Â
çµäºã¨ãªã¹ã¿ã¼ã
æå¾ã«ã¨ã¼ã¸ã§ã³ããç®æ¨ãéæãããã¯å¤±æããããdone = true;ãå¼ãã§ããã¾ãããããå¥ã«Agentã¯ã©ã¹å ã§ãªãã¨ãè¯ãã§ãã
done=true;ãéãã¨AgentResetãå¼ã°ããã®ã§ãããã§ãã£ã©ã¯ã¿ã¼ã®åº§æ¨ãçã®ä½ç½®çãã²ã¼ã å æ å ±ããªã»ãããã¦ããã¾ãã
Â
å¦ç¿
å¾ã¯python ppo.py <env_name> --trainã¨ãã§ãã¬ã¼ãã³ã°ãã¾ãã
Â
éä¸ã§å¦ç¿ãä¸æããããªããä¸æ¦æ¢ã㦠--load ã足ãã¦å¦ç¿ãåéãããã--max-stepsã§åæ°ãå¢ãããããåºæ¥ã¾ãã
ãã ãnormalizeãnum-layersçã®è¨å®ãå¤ããã¨ã¬ã¸ã¥ã¼ã åºæ¥ãªãã§ãã
- éä¸ããåéãï¼ãpython ppo.py <env_name> --train --load
- 試è¡åæ°ãå¢ããï¼python ppo.py <env_name> --train --max-steps=ãããæ°
Â
ã¡ãªã¿ã«ã--trainãé¤ãã¨ãã¬ã¼ãã³ã°ã§ã¯ãªãåçã¢ã¼ãã«ãªãã¾ãã
ä¸ã 3DBall.bytesã®ãããªãã®ãä½ã£ã¦äºã ããªãã¨ãåä½ã確èªåºæ¥ãã®ã¯è¯ãã§ãã
- python ppo.py <env_name> --load
ãã®ä»
- 100åéã§ãã¬ã¼ãã³ã°ãè¡ã訳ã§ããã確èªããã¨å²ã¨ä¸æãåãã¦ãªãäºãå¤ãã§ãããªã®ã§æä½1åã¯1ï½ï¼åéã§åããã¦ã¿ãã®ãªã¹ã¹ã¡ã§ããå ±é ¬ãStateãå¤ã ã£ãããã¾ãã
- betaãä¸ããã¨ããã¬ã¼ãã³ã°æã®ã¢ã¯ã·ã§ã³ããã£ã¨ã©ã³ãã ã«ãªãããããçãï¼ã«å°éãããããªãããããã¾ãã
é¢é£
Unity Connectã®ML-Agents Challenge 1ã§ã®ä½åã¨å®è£
ä¾ã§ãã
ããã¤ãã¯ããã¸ã§ã¯ããå
¬éããã¦ããã®ã§åèã«ãªãã¾ãã
blogs.unity3d.comä¸çªç´å¾ããå¼·åå¦ç¿ã«ã¤ãã¦ã®è§£èª¬
qiita.comMacã§ML-Agentã試ãã¨ãã«åãããããã§ã
Windowsã¯ãã¡ã
am1tanaka.hatenablog.comAWSã§ããããå ´å
æ¸ãã¦ã¦æ®µã ã¨é¢åãããry