ã‚¼ãƒã‹ã‚‰ä½œã‚‹Deep Learning 4ï¼ˆå¼·åŒ–å¦ç¿’ç·¨ï¼‰ã®4ç« ã®ã‚µãƒ³ãƒ—ãƒ«ã‚³ãƒ¼ãƒ‰ã«ç”»åƒãƒ•ã‚¡ã‚¤ãƒ«ã®å‡ºåŠ›æ©Ÿèƒ½ã‚’è¿½åŠ ã™ã‚‹

å‰å›žï¼ˆゼロから作るDeep Learning 4（強化学習編）の4章のサンプルコードにアニメーション化の機能を追加する - daisukeの技術ブログï¼‰ã«ç¶šãã€ä»Šå›žã¯ã€åŒã˜ãã€Œã‚¼ãƒã‹ã‚‰ä½œã‚‹Deep Learning 4ï¼ˆå¼·åŒ–å¦ç¿’ç·¨ï¼‰ã€ã®4ç« ã®ã‚µãƒ³ãƒ—ãƒ«ã‚³ãƒ¼ãƒ‰ã®ã€Œpolicy_iter.pyã€ã«å¯¾ã—ã¦ã€ã‚¹ãƒ†ãƒƒãƒ—ã”ã¨ã®æ›´æ–°ã•ã‚ŒãŸä¾¡å€¤é–¢æ•°ãŒæ›¸ã‹ã‚ŒãŸãƒžãƒƒãƒ—ã®ç”»åƒãƒ•ã‚¡ã‚¤ãƒ«ã‚’ã€ãƒ•ã‚¡ã‚¤ãƒ«å‡ºåŠ›ã™ã‚‹æ©Ÿèƒ½ã‚’è¿½åŠ ã—ã¦ã„ãã¾ã™ã€‚

ã¾ãŸã€ã¤ã„ã§ã«ã€åŒã˜ã4ç« ã®ã‚µãƒ³ãƒ—ãƒ«ã‚³ãƒ¼ãƒ‰ã®ã€Œpolicy_eval.pyã€ã®ã‚¹ãƒ†ãƒ¼ãƒˆã”ã¨ã®æ›´æ–°ã•ã‚ŒãŸä¾¡å€¤é–¢æ•°ãŒæ›¸ã‹ã‚ŒãŸãƒžãƒƒãƒ—ã®ç”»åƒãƒ•ã‚¡ã‚¤ãƒ«ã‚’ã€ãƒ•ã‚¡ã‚¤ãƒ«å‡ºåŠ›ã™ã‚‹æ©Ÿèƒ½ã‚’è¿½åŠ ã—ã¾ã™ã€‚

å‚è€ƒæ–‡çŒ®

ã‚¼ãƒã‹ã‚‰ä½œã‚‹Deep Learning â¹ â€•å¼·åŒ–å¦ç¿’ç·¨

ä½œè€…:æ–Žè—¤ åº·æ¯…
ã‚ªãƒ©ã‚¤ãƒªãƒ¼ã‚¸ãƒ£ãƒ‘ãƒ³

Amazon

Matplotlib&Seabornå®Ÿè£…ãƒãƒ³ãƒ‰ãƒ–ãƒƒã‚¯

ä½œè€…:ãƒãƒ¼ãƒ ãƒ»ã‚«ãƒ«ãƒ
ç§€å’Œã‚·ã‚¹ãƒ†ãƒ

Amazon

ã¯ã˜ã‚ã«

å‰å›žã€ãƒžãƒƒãƒ—ã®å¤‰åŒ–ã‚’ã‚¢ãƒ‹ãƒ¡ãƒ¼ã‚·ãƒ§ãƒ³åŒ–ã—ãŸã“ã¨ã§ã€ã©ã®ã‚ˆã†ã«ãƒžãƒƒãƒ—ãŒå¤‰åŒ–ã—ã¦ã„ãã®ã‹ã‚’ç›´æ„Ÿçš„ã«ç†è§£ã™ã‚‹ã“ã¨ãŒã§ãã¾ã—ãŸã€‚ä¸€æ–¹ã§ã€ç´°ã‹ãã€ä¾¡å€¤é–¢æ•°ã‚„ã€æ–¹ç–ãŒæ›´æ–°ã•ã‚Œã¦ã„ãã®ã‹ã‚’ç¢ºèªã™ã‚‹ã«ã¯ã€å°‘ã—ä¸ä¾¿ã§ã—ãŸã€‚

ãã“ã§ã€ä»Šå›žã¯ã€å‰å›žã‚¢ãƒ‹ãƒ¡ãƒ¼ã‚·ãƒ§ãƒ³åŒ–ã—ã¦ã„ãŸã‚‚ã®ã‚’ã€å˜ç´”ã«ç”»åƒãƒ•ã‚¡ã‚¤ãƒ«ã¨ã—ã¦å‡ºåŠ›ã™ã‚‹æ©Ÿèƒ½ã‚’è¿½åŠ ã—ã¾ã—ãŸã€‚

ãªãŠã€ä»Šå›žã‚‚ã€æ©Ÿèƒ½ã‚’è¿½åŠ ã—ãŸã‚½ãƒ¼ã‚¹ã‚³ãƒ¼ãƒ‰ã¯ã€ä»¥ä¸‹ã®GitHubã«æ ¼ç´ã—ã¦ã„ã¾ã™ã€‚

github.com

ä½¿ã„æ–¹

$ git clone https://github.com/dk0893/deep-learning-from-scratch-4.git -b v1.1-dk0893
Cloning into 'deep-learning-from-scratch-4'...
remote: Enumerating objects: 425, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (33/33), done.
ta 115)R, pack-erceueiving objectss:e  d81 %2774[K5/425) 3eu40s/ed 1425)16 (d e  l
Receiving objects: 100% (425/425), 922.49 KiB | 0 bytes/s, done.
Resolving deltas: 100% (246/246), done.
Checking connectivity... done.
Note: checking out '4eca9bf48e1afbf56628107a33bffe2440df6000'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

  git checkout -b <new-branch-name>


$ cd deep-learning-from-scratch-4/

$ python ch04/policy_iter.py --ope ani_step --fpath policy_iter_step.gif
save animation: policy_iter_step.gif

$ python ch04/policy_iter.py --ope im_step
save image: images\policy_iter_step_0.png
save image: images\policy_iter_step_1.png
save image: images\policy_iter_step_2.png
save image: images\policy_iter_step_3.png
save image: images\policy_iter_step_4.png

$ python ch04/policy_iter.py --ope ani_state --fpath policy_iter_state.gif
save animation: policy_iter_state.gif
ImageStore.cnt=384

$ python ch04/policy_iter.py --ope im_state
save image: images\policy_iter_step0_phase00_state_(0, 0).png
save image: images\policy_iter_step0_phase00_state_(0, 1).png
save image: images\policy_iter_step0_phase00_state_(0, 2).png
save image: images\policy_iter_step0_phase00_state_(0, 3).png
save image: images\policy_iter_step4_phase00_state_(2, 0).png
save image: images\policy_iter_step4_phase00_state_(2, 1).png
save image: images\policy_iter_step4_phase00_state_(2, 2).png
save image: images\policy_iter_step4_phase00_state_(2, 3).png
ImageStore.cnt=384

ä»Šå›žã¯ã€å‚è€ƒã¨ã—ã¦ã€Google Colaboratoryã§å®Ÿè¡Œã§ãã‚‹ãƒ•ã‚¡ã‚¤ãƒ«ï¼ˆch04-exec.ipynbï¼‰ã‚‚ä¸€ç·’ã«ã‚³ãƒŸãƒƒãƒˆã—ã¦ãŠãã¾ã—ãŸã€‚

ä»Šå›žã®æ©Ÿèƒ½è¿½åŠ ã®è¨è¨ˆæ–¹é‡

å‰å›žã®æ©Ÿèƒ½è¿½åŠ ã¯ã€ã‚ªãƒªã‚¸ãƒŠãƒ«ã®ã‚½ãƒ¼ã‚¹ã‚³ãƒ¼ãƒ‰ã«å¯¾ã—ã¦ã€ç‰¹ã«ä½•ã‚‚è€ƒãˆãšã«å¤‰æ›´ã—ã¦ã—ã¾ã„ã¾ã—ãŸãŒã€ä»Šå›žã¯ã€ã‚ªãƒªã‚¸ãƒŠãƒ«ã®ã‚½ãƒ¼ã‚¹ã‚³ãƒ¼ãƒ‰ã«ãªã‚‹ã¹ãå½±éŸ¿ã‚’ä¸Žãˆãªã„ã‚ˆã†ãªå¤‰æ›´ã«ã—ã¾ã—ãŸã€‚

å…·ä½“çš„ã«ã¯ã€ã‚ªãƒªã‚¸ãƒŠãƒ«ã®ã‚½ãƒ¼ã‚¹ã‚³ãƒ¼ãƒ‰ã«å¯¾ã—ã¦ã€æœ€å°é™ã®è¿½åŠ ã§å®Ÿç¾ã™ã‚‹ã‚ˆã†ã«ã—ã¾ã—ãŸã€‚ã“ã†ã™ã‚‹ã“ã¨ã§ã€ã‚ªãƒªã‚¸ãƒŠãƒ«ã®ã‚½ãƒ¼ã‚¹ã‚³ãƒ¼ãƒ‰ãŒæ›´æ–°ã•ã‚ŒãŸå ´åˆã«ã€ä»Šå›žã®æ©Ÿèƒ½è¿½åŠ åˆ†ã‚’ãƒžãƒ¼ã‚¸ã™ã‚‹ã“ã¨ãŒç°¡å˜ã«ãªã‚Šã¾ã™ã—ã€ä»–ã®æ©Ÿèƒ½ã‚’è¿½åŠ ã—ãŸããªã£ãŸå ´åˆã«ã€è¤‡é›‘ãªæ§‹æˆã«ãªã‚‰ãªã„ã‚ˆã†ã«ã§ãã¾ã™ã€‚

ã‚ªãƒªã‚¸ãƒŠãƒ«ã‹ã‚‰ã®å¤‰æ›´å†…å®¹

policy_iter.py

ä»¥ä¸‹ã®ã‚ˆã†ã«ã€policy_iter()ã¯ã€1è¡Œã®å¤‰æ›´ã§å®Ÿç¾ã§ãã¦ã„ã¾ã™ã€‚

--- deep-learning-from-scratch-4-org/ch04/policy_iter.py        2024-03-20 18:07:05.107000000 +0900
+++ deep-learning-from-scratch-4/ch04/policy_iter.py    2024-03-23 20:52:32.819000000 +0900
@@ -3,6 +3,7 @@
     sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
 from collections import defaultdict
 from common.gridworld import GridWorld
+from common.image_store import ImageStore
 from ch04.policy_eval import policy_eval
@@ -44,7 +45,7 @@
         new_pi = greedy_policy(V, env, gamma)

         if is_render:
-            env.render_v(V, pi)
+            ImageStore.st_step( env, V, pi )

         if new_pi == pi:
             break
@@ -53,7 +54,25 @@
     return pi

+def parse_args():
+    import argparse
+    parser = argparse.ArgumentParser( description='policy_iter.py' )
+    parser.add_argument( '--ope',   default=None,              help='select output operation, [None or im_step or im_state or ani_step or ani_state]' )
+    parser.add_argument( '--dpath', default="images",          help='input save image directory path' )
+    parser.add_argument( '--fpath', default='policy_iter.gif', help='input save animation path' )
+    return parser.parse_args()
+
 if __name__ == '__main__':
+    args = parse_args()
+    ImageStore.init( args.ope, args.dpath, args.fpath )
     env = GridWorld()
     gamma = 0.9
     pi = policy_iter(env, gamma)
+    ImageStore.output( env.renderer.fig )

policy_eval.py

ã“ã¡ã‚‰ã‚‚ã€2è¡Œã‚’è¿½åŠ ã—ãŸã ã‘ã§å®Ÿç¾ã§ãã¦ã„ã¾ã™ã€‚

--- deep-learning-from-scratch-4-org/ch04/policy_eval.py        2024-03-20 18:07:05.100000000 +0900
+++ deep-learning-from-scratch-4/ch04/policy_eval.py    2024-03-21 23:07:58.401000000 +0900
@@ -3,12 +3,14 @@
     sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
 from collections import defaultdict
 from common.gridworld import GridWorld
+from common.image_store import ImageStore

 def eval_onestep(pi, V, env, gamma=0.9):
     for state in env.states():
         if state == env.goal_state:
             V[state] = 0
+            ImageStore.st_state( env, V, pi, state )
             continue

         action_probs = pi[state]
@@ -18,6 +20,7 @@
             r = env.reward(state, action, next_state)
             new_V += action_prob * (r + gamma * V[next_state])
         V[state] = new_V
+        ImageStore.st_state( env, V, pi, state )
     return V

image_store.py

ã“ã¡ã‚‰ã¯ã€æ–°è¦è¿½åŠ ã—ãŸãƒ•ã‚¡ã‚¤ãƒ«ã§ã™ã€‚ã‚¯ãƒ©ã‚¹å¤‰æ•°ã‚’ä½¿ã†ã“ã¨ã§ã€importæ–‡ã‚’æ›¸ãã ã‘ã§ã€ä¸€è¡Œã®è¿½åŠ ã§å¤‰æ›´ãŒã§ãã¦ã„ã¾ã™ã€‚

é€šå¸¸ã®ã‚¯ãƒ©ã‚¹ã®ä½¿ã„æ–¹ã®ã‚ˆã†ãªã‚¤ãƒ³ã‚¹ã‚¿ãƒ³ã‚¹ã‚’ç”Ÿæˆã™ã‚‹æ–¹æ³•ã®å ´åˆã€æ—¢å˜ã®é–¢æ•°ã®å¼•æ•°ã¸ã®è¿½åŠ ãŒå¤šãå¿…è¦ã«ãªã£ã¦ã—ã¾ã„ã¾ã™ã€‚

import os
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import ArtistAnimation

class ImageStore:
    
    ope   = None
    dpath = None
    fpath = None
    
    artists_step  = []
    artists_state = []
    animation     = False
    
    step  = 0
    phase = 0
    cnt   = 0
    
    debug = False
    
    def init( ope=None, dpath=None, fpath=None, debug=False ):
        
        ImageStore.ope   = ope
        ImageStore.dpath = dpath
        ImageStore.fpath = fpath
        ImageStore.debug = debug
        
        if ImageStore.ope == "ani_step" or ImageStore.ope == "ani_state":
            ImageStore.animation = True
        elif ImageStore.ope == "im_step" or ImageStore.ope == "im_state":
            os.makedirs( ImageStore.dpath, exist_ok=True )
    
    def st_step( env, V, pi ):
        
        if ImageStore.ope == "im_step" or ImageStore.ope == "ani_step" or ImageStore.ope is None:
            frame = env.render_v(V, pi, title=f"step={ImageStore.step}")
            ImageStore.artists_step.append( frame )
        
        if ImageStore.ope == "im_step":
            fpath = os.path.join( ImageStore.dpath, f"policy_iter_step_{ImageStore.step}.png" )
            plt.savefig( fpath )
            plt.close()
            
            print( f"save image: {fpath}" )
        
        ImageStore.step += 1
        ImageStore.phase = 0
    
    def st_state( env, V, pi, state ):
        
        if ImageStore.ope == "im_state" or ImageStore.ope == "ani_state":
            frame = env.render_v( V, pi, title=f"step={ImageStore.step} phase={ImageStore.phase} state={state}" )
            ImageStore.artists_state.append( frame )
            
            if ImageStore.ope == "im_state":
                fpath = os.path.join( ImageStore.dpath, f"policy_iter_step{ImageStore.step}_phase{ImageStore.phase:02d}_state_{state}.png" )
                plt.savefig( fpath )
                plt.close()
                print( f"save image: {fpath}" )
            
            ImageStore.cnt += 1
            if ImageStore.cnt % np.prod(env.shape) == 0:
                ImageStore.phase += 1
            
            if ImageStore.debug:
                if ImageStore.ope == "ani_state" and ImageStore.phase == 1:
                    ImageStore.ope = "ani_end"
    
    def output( fig ):
        
        if ImageStore.ope == "ani_step" or ImageStore.ope == "ani_state" or ImageStore.ope == "ani_end":
            artists = ImageStore.artists_step if ImageStore.ope == "ani_step" else ImageStore.artists_state
            interval = 2000 if ImageStore.ope == "ani_step" else 500
            anim = ArtistAnimation( fig, artists, interval=interval )
            anim.save( ImageStore.fpath )
            
            print( f"save animation: {ImageStore.fpath}" )
        
        if ImageStore.ope == "im_state" or ImageStore.ope == "ani_state":
            print( f"ImageStore.cnt={ImageStore.cnt}" )