DeepMindããçºè¡¨ãããç»åããæãæ¹ãçæããå¼·åå¦ç¿ã·ã¹ãã ãSPIRALã[1]ãChainerRLãç¨ãã¦å®è£ ããï¼å®è£ ã¯GitHubã¬ãã¸ããªã§å ¬éãã¦ããï¼
SPIRALã¯çµµã®ç»åããæãæ¹ãçæããããã®ã·ã¹ãã ã§ããï¼ç»åçæã¨ããã°Deep Convolutional Generative Adversarial Network(DCGAN [2])ã主ãªææ³ã ãï¼DCGANã¯ã©ã¹ã¿ã¼ç»åãçæããã®ã«å¯¾ãã¦ï¼SPIRALã¯æç³»åã®æç»è¡åï¼ã¤ã¾ããã¯ã¿ã¼å½¢å¼ã§ç»åãçæãããã¨ãã§ããï¼ãã¯ã¿ã¼å½¢å¼ã®ç»åã¯ã©ã¹ã¿ã¼å½¢å¼ã¨ã¯ç°ãªãç¹å®ã®ç·ãåãé¤ãã¨ãã£ãç·¨éãç°¡åã«è¡ããã¨ãã§ããï¼ãã¯ã¿ã¼ç»åãçæããã¢ãã«ã¨ãã¦ã¯Sketch-rnn [3]ããããï¼ãã®ã¢ãã«ã¯å¦ç¿ã«æç»éç¨ï¼ã¤ã¾ããã¯ã¿ã¼å½¢å¼ã®ãã¼ã¿ã大éã«å¿ è¦ã«ãªãï¼ãããã«å¯¾ãã¦SPIRALã¯å¦ç¿ã«ã©ã¹ã¿ç»åã®ã¿ã使ãã¨ããç¹å¾´ãããï¼
SPIRALã¯ãã£ã³ãã¹ã«ä¸å®ã®ã¹ãããæ°ã®éæç»ãè¡ãï¼æãããç»åã¨ææ¬ï¼æ師ãã¼ã¿ï¼ã¨ãªãç»åã¨ã®è¿ããæ大åããããã«çµµã®æãæ¹ãæ¢ç´¢ããï¼æ¢ç´¢ã¯å¼·åå¦ç¿ã§è¡ããï¼æãããçµµã¨ãææ¬ç»åã®è·é¢ãæ¨å®ããããã®å¤å¥å¨ã®åºåãå ±é ¬ã¨ããï¼ã¨ã¼ã¸ã§ã³ãã®æ¢ç´¢ã¨å ±ã«æµå¯¾çã«å¤å¥å¨ã®å¦ç¿ããããã¨ã§åçã«å ±é ¬ãä¸ãããï¼äºä¹èª¤å·®ã使ããããå¹ççã«åæããï¼
å¦ç¿æ¸ã¿çæã¢ãã«ã®ã¿ã使ã£ã¦å®éã«æç»ããããã¨ã§çæç»åãå¾ããã¨ãã§ããï¼ä»¥ä¸ã«ç¤ºããã¢ã¯çæãããæç»éç¨ãã©ã³ãã ã«é¸ãã§åçãããã®ã§ããï¼
DCGANã¨ã¯ç°ãªãï¼æ·±å±¤å¦ç¿ã¢ãã«ã®æ¨å®ããæç»è¡åããçµµã®ç»åãä½ãã¬ã³ããªã³ã°éç¨ãå¾®åå¯è½ã§ããå¿ è¦ããªãï¼ãã®ããæç³»åã®ã³ãã³ãããéçãªç»åãçæãããããªæ§ã ãªã¿ã¹ã¯ã¸ã®å¿ç¨ãæå¾ ã§ããï¼ãã¨ãã°ï¼ææ¡è«æã§ã¯å®é¨ããã¦ãããããªã·ãã¥ã¬ã¼ã·ã§ã³ç°å¢ã«ãªãã¸ã§ã¯ããé ç½®ãã¦æå®ã®ç»åã®ããã«ä½ãã¿ã¹ã¯ãå®é¨ããã¦ããï¼ä»åã¯è«æã§å®é¨ããã¦ãããçµµããã½ãããç¨ããç·æç»å¦ç¿ã¿ã¹ã¯ãå®è£ ããï¼
æã ã®è¿½å®è£ ã§å¦ç¿ããSPIRALã¨ã¼ã¸ã§ã³ãã®æç»çµæãåçãããã¨ãã§ããï¼
ãã¼ã¿ã»ããåã®ãã¿ã³ãã¯ãªãã¯ããã¨æç»ãéå§ããï¼ãã³ã¯ã®ä¸¸ã¯ã¢ãã«ãæ¨å®ãããã³ã®ä½ç½®åº§æ¨ã®å±¥æ´ããããããããã®ã§ããï¼
ChainerRLã¯æ©æ¢°å¦ç¿ã©ã¤ãã©ãªChainerã使ã£ã¦æ§ã ãªå¼·åå¦ç¿ã¢ã«ã´ãªãºã ãå®è£ ããã©ã¤ãã©ãªã§ããï¼è«æã§ã¯IMPALA [5]ã¨å¼ã°ããReplay Bufferãå©ç¨ããéåæã®å¦ç¿ãã¤ãã©ã¤ã³ã使ã£ã¦ããï¼ãããChainerRLã«ã¯IMPALAãå®è£ ããã¦ããªãã£ãï¼ããã§ä»åã¯æ¢ã«å®è£ ããã¦ãã Asynchronous Advantage Actor-Critic(A3C) [6]ãå©ç¨ãããã¨ã«ããï¼A3Cã§ã¯è¤æ°ã®CPUããã»ã¹ã並ååã§å®è¡ãï¼åã ããã»ã¹ã§æ¹çé¢æ°ã使ã£ãçæï¼å¤å¥å¨ã«ããå ±é ¬è¨ç®ï¼æ¹çå¾é æ³ã使ã£ãå¾é è¨ç®ãè¡ãï¼ããã¦ã¢ãã«ã®ãã©ã¡ã¼ã¿ã¼ã¯åã ã®ããã»ã¹ãéåæã«æ´æ°ãè¡ãï¼
å®è£ ã¯GitHubã§å ¬éãã¦ããï¼MNISTï¼EMNISTï¼Quick, Draw!ï¼å¤å ¸ç±åå½¢ããããã®ãã¼ã¿ã»ããã§å¦ç¿ãããã¢ãã«ãåããã¦å ¬éãã¦ããï¼ãã¢ã®å®è¡ã¯Dockerã®ã³ã³ããä¸ã§å®è¡ãããã¨ãã§ããï¼ã¾ãã¯ã¬ãã¸ããªãã¯ãã¼ã³ããï¼
git clone https://github.com/DwangoMediaVillage/chainer_spiral.git
ããã¦Dockerã¤ã¡ã¼ã¸ããã«ãããï¼
cd chainer_spiral/docker
docker build . -t chainer_spiral
ããã¾ã§ã§ãã¢å®è¡ã®æºåãã§ããï¼Quick, Draw!ãã¼ã¿ã»ããã§å¦ç¿æ¸ã¿ã®ã¢ãã«ã®çæçµæãè¦ããã¨ãã§ããï¼
docker run -t --name run_chainer_spiral_demo chainer_spiralpipenv run python demo.py movie trained_models/quickdraw/68976000 result.mp4 --without_dataset
çæãããåç»ãã¡ã¤ã«ãã³ã³ãããããã¹ãã¸ã¨ã³ãã¼ããã¨å®éã«é²è¦§ãããã¨ãã§ããï¼
docker cp run_chainer_spiral_demo:/chainer_spiral/ChainerSPIRAL/result.mp4 .
å·¦å´ããæç»ã®æ§åï¼æçµæå»ã«ããã観測ï¼æãããçµµï¼ï¼ããã¦æ¸ãé ãå¯è¦åãããã®ï¼éâ赤ï¼ã«ãªã£ã¦ããï¼
çµµã®ã¬ã³ããªã³ã°ã¨ã³ã¸ã³ã¯MyPaintãå©ç¨ãã¦ããï¼åä½ã§OpenAI Gymã®ç°å¢ã¨ãã¦æ¯èãããè¨è¨ããã¦ããï¼ChainerRLã¯ç°å¢ãªãã¸ã§ã¯ããå ±é ¬ãè¨ç®ããåæã§è¨è¨ããã¦ãããï¼SPIRALã¯å ±é ¬è¨ç®ã«å¤å¥å¨ã§ãããã¥ã¼ã©ã«ãããã®ã¢ãã«ãå¿ è¦ã¨ãªãï¼ããã§å¤å¥å¨ã®ã¢ãã«ãA3Cã¨ã¼ã¸ã§ã³ããä¿æãå ±é ¬è¨ç®ãè¡ãããè¨è¨ãè¡ã£ãï¼
ã¨ã¼ã¸ã§ã³ããæ¢ç´¢ãè¡ãéã«ï¼ä½ãæããªãã¾ã¾æç»ãçµäºãã¦ãã¾ãå ´åãããï¼ãã®ããè«æã«å¾ã£ã¦è£å©çãªå ±é ¬ãå°å ¥ãããã¨ã«ããï¼å ·ä½çãªå ±é ¬è¨è¨ã¯è¨è¿°ããã¦ããªãã£ãããï¼ä»åã¯ä½ãæããªãã¾ã¾çµäºãããã¨ã«å¯¾ãã¦è² ã®å ±é ¬ãä¸ãããã¨ã«ããï¼
å¾é æ³ã«ããå¦ç¿ãè¡ããã¥ã¼ã©ã«ãããçæã¢ãã«ã®åé¡è¨å®ã®ãã¤ã³ãã¯ããã¼ã¿ã®çæéç¨ããæ失é¢æ°ã¾ã§ã®ãã¹ã¦ãå¾®åå¯è½ã§ãããã©ãããã«ããï¼ãã¼ã¿ãç´æ¥å¯¾è±¡ã¨ãã¦å帰ã®åé¡ã¯æ失é¢æ°ãå¾®åå¯è½ã§ããããã°å¾é æ³ãé©ç¨ãããã¨ãã§ããï¼ãã¨ãã°ï¼DCGANã®å ´åã¯ç»åããã¯ã»ã«å¤ã®ãã¯ãã«ã¨ãã¦ç´æ¥çæãï¼æ失é¢æ°ã¯å¾®åå¯è½ãªå¤å¥ã¢ãã«ã使ãããå¾®åå¯è½ã§ããï¼ãããå¾®åå¯è½ãªå ´åã¯éããã¦ããï¼ãçµµããã½ããã¯æç»ã®ããã®ã³ãã³ãããç»åãçæããé¢æ°ã¨ãã¦ã¿ããã¨ãã§ãããï¼ãã®é¢æ°ã¯ä¸è¬ã«å¾®åãããã¨ãã§ããªãï¼è£½å³ï¼çµçï¼é³æ¥½è£½ä½ï¼ã²ã¼ã ãã¶ã¤ã³ï¼etc… ç§ãã¡ãæ±ãã»ã¨ãã©ã®çæéç¨ã¯ããéãããæ¡ä»¶ãé¤ãã¦å¾®åä¸å¯è½ã§ããï¼
å¾®åä¸å¯è½ãªå ´åã«ã¯åé¡è¨å®ãå¾é æ³ãé©ç¨ã§ããããã«å¤ããå¿ è¦ãããï¼SPIRALã§ã¯æç»è¡åï¼ãã¯ã¿ã¼ï¼âçµµã®ç»åï¼ã©ã¹ã¿ï¼ã¨ããçæéç¨ãå¾®åä¸å¯è½ã ã£ãï¼ããã§å¼·åå¦ç¿ãå¿ è¦ã«ãªãï¼è©ä¾¡é¢æ°ãå ±é ¬ãä¸ããé¢æ°ã¨ç½®ãæãããã¨ã§ï¼çæã¢ãã«ã«ã¯è©ä¾¡å¤ãç´æ¥æ大åããããã®æ¹çå¾é æ³ãé©ç¨ã§ããï¼ãã®ææ³ã¯çæã¢ãã«ãæ±ããã¨ãã§ããå½¢å¼ã®å¯è½æ§ã大ããåºãããã®ã§ããã¨è¨ããï¼ãããçæããããã¼ã¿ããã®å¾é ãçæã¢ãã«ã«ä¼æ¬ãããªãããå¦ç¿å¹çãç ç²ã«ãªãã¨ããå¯ä½ç¨ãããï¼
çæéç¨ãå¾®åå¯è½ãªã¢ãã«ã§è¿ä¼¼ããã¨ããæ¹æ³ãããï¼æç»ã§ããã°æç»è¡åããã©ã¹ã¿ç»åãçæããã¬ã³ããªã³ã°é¢æ°ãè¿ä¼¼ããã¢ãã«ãå©ç¨ããææ³ãææ¡ããã¦ããï¼ã¬ã³ããªã³ã°ã®è¿ä¼¼ç²¾åº¦ãååã§ããã°æ¢ç´¢ãããå¹ççã«ãªããã¨ãäºæ³ããããï¼å®éã«ããè¿ä¼¼ã¢ãã«ãä½ããã¨ãé£ããã¨ãã課é¡ãããï¼
å¼·åå¦ç¿å®é¨ã®å®è£ ã¯éåæå¦çãå¤ç¨ããããããã¡ãã¨ããã©ã¼ãã³ã¹ãåºãããã«å®è£ ãããã¨ãé£ããï¼ChainerRLã¯è¤æ°ã®å¼·åå¦ç¿ã®ãã¤ãã©ã¤ã³ãæ½è±¡åãã¦å®è£ ãã¦ããããï¼ããã»ã¹ä¸¦ååã«é¢ããå®è£ ã¯ã©ã¤ãã©ãªä»»ãã«ã§ããããéçºæéãç縮ãããã¨ãã§ããï¼
å¦ç¿ã¯é常ã«å¤§å¤ã ã£ãï¼ãã®çç±ã¯å¼·åå¦ç¿å ¨è¬ã«è¨ããåé¡ã ãï¼ãã©ã¡ã¼ã¿ã¼æ¢ç´¢ã¨å ±é ¬è¨è¨ã®é£ããã«ããï¼è©¦ããã¨ãã§ãããã©ã¡ã¼ã¿ã¼ã®çµã¿åããéããããä¸ã«ï¼å¦ç¿çµæããã©ã¡ã¼ã¿ã¼è¨å®ã«ãã£ã¦å¤§ããå¤ãã£ã¦ãã¾ããã¨ãä½åº¦ããã£ãï¼ã¨ãã«è£å©å ±é ¬ã®è¨è¨ã«ã¤ãã¦ã¯ã¾ã æ¹åã®ä½å°ãããï¼ããã¦ä½ããã大å¤ã ã£ãã®ã¯å¦ç¿æéã§ï¼æ¥æ¬å¤å ¸ç±åå½¢ãã¼ã¿ã»ããã¯24ããã»ã¹ä¸¦åã§20æ¥ä»¥ä¸ï¼ï¼ï¼ããã£ãï¼è£å©å ±é ¬ã ãã§ã¯ãªãï¼å°æ°ã®æç»éç¨ãã¼ã¿ããç¶æ ããµã³ããªã³ã°ãããªã©ã®æ¢ç´¢ãå¹çåããæ¹æ³ãå¿ è¦ããã ã¨æããï¼
[1] Ganin, Y., Kulkarni, T., Babuschkin, I., Eslami, S.M.A. & Vinyals, O.. (2018). Synthesizing Programs for Images using Reinforced Adversarial Learning. Proceedings of the 35th International Conference on Machine Learning. http://arxiv.org/abs/1804.01118
[2] Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Proceedings of the 4th International Conference on Learning Representations. http://arxiv.org/abs/1511.06434
[3] Ha, D., & Eck, D. (2017). A Neural Representation of Sketch Drawings. https://arxiv.org/abs/1704.03477
[4] Cohen, G., Afshar, S., Tapson, J., & van Schaik, A. (2017). EMNIST: an extension of MNIST to handwritten letters. http://arxiv.org/abs/1702.05373
[5] Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S. & Kavukcuoglu, K.. (2018). IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. Proceedings of the 35th International Conference on Machine Learning. http://arxiv.org/abs/1802.01561
[6] Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., ⦠Kavukcuoglu, K. (2016). Asynchronous Methods for Deep Reinforcement Learning, Proceedings of the 35th International Conference on Machine Learning. http://arxiv.org/abs/1602.01783