PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する

1. PG - Q @shohu33

2. ATARI

3. PG PG

4. (policy)

5. PG PG

6. PG Q ( ) ( or ) ( ) Q

7. "Q-Learning Tutorial". Mnemosyne Studio. http://mnemstudio.org/path-ﬁnding-q-learning-tutorial.htm

8. 6 5

9. 1. (Gamma) 2. Q 0 3. : 3.1 3.2 5 : 3.2.1 3.2.2 3.2.3 Q Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] * Q 3.2.4 3.2.5 5 3.3

11. 1. (Gamma)

12. (Gamma) 0 1 0 ( ) 0.8

13. [ ]

14. 100

16. 2. Q( ) 0

17. 0 1 5 100 Q( ) 0( )

19. 3.1

20. RANDOM

21. 3.2 5

22. 3.2.1

23. 3 5

24. 3.2.2

26. 3.2.3 Q Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]

27. 今回、部屋1 から部屋5 に移動する⾏動を選んだので Q state=1, action=5, Gamma=0.8, next state=5, all actions = 1,4,5 Q

28. Q Q 5 1,4,5 Q

29. 3.2.4

31. 3.2.6 5

33. 3.3

34. 3.1

35. 3 1 Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] Q(3, 1) = R(3, 1) + 0.8 * Max[Q(1, 3), Q(1, 5)] = 0 + 0.8 * Max(0, 100) = 80 1

37. 1 5 5 5 Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)] Q(1, 5) = R(1, 5) + 0.8 * Max[Q(5, 1), Q(5, 4), Q(5, 5)] = 100 + 0.8 * Max(0, 0, 0) = 100 Q

39. Q Q 2 Q 2 → 3 → 1 → 5 OR 2 → 3 → 4 → 5 5

PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する

More Related Content

PGに簡単なゲームのやり方を学習させる Vol.1 - まずはQ学習を理解する