ä¸å·ç æ©æ¢°å¦ç¿åå¼·ä¼ 2007/6/7 Apprenticeship Learning via Inverse Reinforcement Learning by Pieter Abbeel and Andrew Y. Ng (ICML 2004) åç° ç¨ å¼·åå¦ç¿ Reinforcement Learning ⢠ç°å¢ã¨ãããã§è¡åããã¨ã¼ã¸ã§ã³ããããã¨ãã ãã¨ã¼ã¸ã§ã³ããã©ã®ãããªè¡åãã¨ãã°ãããã ãå¦ç¿ããã â ãç¶æ ãã¨ããããé·ç§»ããããè¡åããããã â Policyï¼æ¹çãè¡ååï¼: ãç¶æ ãã«å¿ãããè¡åãã決ã ãã â Reward functionï¼å ±é ¬é¢æ°ï¼ï¼ç¶æ ãæã¾ãããå¦ã ãã¹ã³ã¢ä»ãããã â Value functionï¼ä¾¡å¤é¢æ°ï¼ï¼ãã®ç¶æ ãããæçµçã« æã¾ããçµæã«ãªããã©ãããã¹ã³ã¢ä»ããã ⢠ç¾æç¹ã®ç¶æ ã®ã¿ãªãããå°æ¥
{{#tags}}- {{label}}
{{/tags}}