ããã«ã¡ã¯è¤åï¼@tafujishï¼ã§ããä»åã¯äºä¾ç´¹ä»ããããã¨æãã¾ãã
GPUæ
å½ã¨ãã¦é¢ãããã¦ããã£ãæ¡ä»¶ã§ãã¨ã¦ãèå³æ·±ãã£ãã®ã§æ¯éç´¹ä»ããããªã¨ã客æ§ã¸ç¸è«ããã¨ãã快諾ããã ãã¾ããã
ä½ãé¢ç½ããã¨è¨ãã¨ãDeepLearningã§ã®å©ç¨ãç®çã¨ããã±ã¼ã¹ã§ããã«ãã¯ã©ã¦ãåãå®ç¾ãã¦ããã¨ããã¨ããã§ããä»åã¯ãã«ãã¯ã©ã¦ãã«ç¦ç¹ãå½ã¦ãããããIDCFクラウドの導入事例ã§ã¯ãªããã¡ãã®ããã°ã§åãä¸ãã¦ããªããã«ãã¯ã©ã¦ããªã®ããã©ã®ãããªæ§æãªã®ãããã¢ãªã³ã°ããã¦ããã£ãå 容ãç§ã®æ¹ããç´¹ä»ãã¦ããããã¨æãã¾ãã
ä»åã®ããã¡ã§ã®æ²è¼ã®æ¿è«¾ã¨æ å ±æä¾ãã ãã£ããªã¯ã«ã¼ããã¯ããã¸ã¼ãºæ§ã«æè¬ç³ãä¸ãã¾ãã
- ããããã®èª²é¡
- ã©ããªãµã¼ãã¹ãåãã¦ããã
- æ°ç°å¢æ¤è¨ã®ããã»ã¹
- ç°å¢ç§»è¡ã®ããã®ãã¼ã¿ããªãã£
- ç°å¢ç§»è¡ãã¦è¯ãã£ãç¹ã¨è¦ãã課é¡
- ãã«ãã¯ã©ã¦ããå®ç¾ããæ§æ
- ã¾ã¨ã
ããããã®èª²é¡
ããã¾ã§ã®æ§ç°å¢ã¨ãã¦ã¯ãæµ·å¤ã®ã¯ã©ã¦ããµã¼ãã¹ãå©ç¨ãã¦ããããDeepLearningãæ´»ç¨ãããµã¼ãã¹ãå¢ãããªããå¦ç¿ã®ããã«å¤§éã®GPUãå¿ è¦ã«ãªã£ã¦ããã¾ããããã¨ãã¤ã³ãã©ã³ã¹ãã®ä¸ã§GPUã®ã³ã¹ãã大ãããªã£ã¦ããã®ã¯é¿ãããã¾ããã§ãããããã§ãã³ã¹ãããã©ã¼ãã³ã¹è¯ã使ããGPUç°å¢ã®æ¤è¨ãã¯ããã¾ããã
ã¾ãããã®æ¤è¨ã®ãªãã§è¦ãã¦ããã®ããGPUç°å¢ã®ãããã¤ã«ã¯ã©ã¦ããµã¼ãã¹å´ã§æä¾ããããã³ãã¬ã¼ãæ©è½ãå©ç¨ãã¦ãããããç°å¢ã®ç§»è¡ãããã«ããã¨ãã課é¡ã§ããã
ã©ããªãµã¼ãã¹ãåãã¦ããã
A3RT(アート)ã¨ãããªã¯ã«ã¼ãã°ã«ã¼ãå ã§ä½¿ããã¦ããã³ã°ããã£ããªAPIãç¡åå ¬éãã¦ãã¾ãããªãªã¼ã¹å½åãã話é¡ã«ãªã£ã¦ããã®ã§ãåç¥ã®æ¹ãå¤ãã¨æãã¾ãããã®A3RTã®ä¸ã§ã®å¦ç¿ã«GPUã大éã«å©ç¨ãã¦ãã¾ãã
ã¾ãA3RTã®ã»ãã«ããR&Dã®éçºç°å¢ã®ä¸ã§ããéçºè
ã®æ¹ãGPUãèªç±ã«å©ç¨ãããã¨ãã§ããããã§ãå¦ç¿ç¨éã§ã®å©ç¨ã«GPUãå©ç¨ãã¦ãããGPUã®ä½¿ç¨éã¯å¤ããªãä¸æ¹ã§ãã
æ°ç°å¢æ¤è¨ã®ããã»ã¹
ä¸è¿°ã®ã¨ãããã³ã¹ãããã©ã¼ãã³ã¹è¯ã使ããç°å¢ã®æ¤è¨ã§ã¯ãå社ãµã¼ãã¹ã試ç¨ãæ¯è¼è©ä¾¡ãã¾ããã
è©ä¾¡æ¹æ³ã¨ãã¦ã¯ãæ¢åã®ç°å¢ã§å©ç¨ãã¦ããå¦ç¿ããã°ã©ã ãå®è¡ãããã®æ§è½ã¨æéããã³ã¹ãããã©ã¼ãã³ã¹ãæ¯è¼ãã¾ãããã®ä¸ã§ãGPUã®ä¸ä»£ãã¢ãã«ãå¤ãã¦æ¤è¨¼ãã¾ããã
ãã®ä¸ã§ããã£ããã¨ã¯ãGPUã®æ§è½ã¯ä¸ä»£ãã¢ãã«ã«ä¾åããå社ãµã¼ãã¹éã§ã¯æ§è½ã«å·®ããªããã¨ããããã¾ããããã®çµæãã³ã¹ãããã©ã¼ãã³ã¹ãæåªå ã§æ¯è¼ãã¦é¸å®ãããã®å¾ãã¯ã©ã¦ãç°å¢å«ããã»ãã¥ãªãã£çã®è¦ä»¶ã確èªãã¾ããã
çµè«ã¨ãã¦ã¯ãGPUç°å¢ã¯IDCFã¯ã©ã¦ã ãã¢ã¡ã¿ã«ãµã¼ãã¼ãå©ç¨ãããããã¯ã¼ã¯ãããã³ãã®ãµã¼ãã¼ã¯IDCFã¯ã©ã¦ã ã³ã³ãã¥ã¼ãã£ã³ã°ãå©ç¨ããæ§æã«æ±ºã¾ãã¾ããã
ç°å¢ç§»è¡ã®ããã®ãã¼ã¿ããªãã£
ã³ã¹ãããã©ã¼ãã³ã¹ã¨ããä¸ã¤èª²é¡ã ã£ãç°å¢ç§»è¡ã¯ãDockerãå©ç¨ãããã¨ã§ãã¼ã¿ããªãã£ãå®ç¾ãããã¨ã«ãã¾ããã
ä»åãã³ã¹ãããã©ã¼ãã³ã¹ãéè¦ããæ¯è¼æ¤è¨ããã¾ããããä»å¾ã常ã«æé©ãªã³ã¹ãããã©ã¼ãã³ã¹ã追æ±ããå¿ è¦ãããã¾ããããã®æã®ç¨éã«å¿ãã¦GPUã®ææ°ä¸ä»£ãç¹å®ã®ã¢ãã«ãæå®ãã¦ä½¿ãããã±ã¼ã¹ãã§ã¦ãã¾ããã¨ãªãã¨ãè¤æ°ã®æ§ã ãªç°å¢ãä»å¾å©ç¨ãã¦ãããã¨ããªãã¡ãã«ãã¯ã©ã¦ãç°å¢ã§ã®å©ç¨ãæ³å®ããå¿ è¦ãããã¾ããæ§ç°å¢ã®ããã«ãã¯ã©ã¦ããµã¼ãã¹ã®ãã³ãã¬ã¼ãæ©è½ãå©ç¨ããã¨ããã®ã¯ã©ã¦ããµã¼ãã¹ã®ä¸ã§éãããã¦ãã¾ãã¾ãã
ããã§ãã©ãã®ç°å¢ã§ãå©ç¨å¯è½ã§ç°å¢ç§»è¡ããããããéçºç°å¢ã¨ãã¦ãé©ãã¦ããDockerãå©ç¨ãããã¨ã«æ±ºãã¾ãããnvidia-dockerã«ãããDockerä¸ããGPUãå©ç¨ãããã¨ãã§ãã¾ãã
ç°å¢ç§»è¡ãã¦è¯ãã£ãç¹ã¨è¦ãã課é¡
ä»åã®ç°å¢ç§»è¡ã®çµæãå½åã®è¨ç»éãã®ã¡ãªãããè¦ãã ãã¾ããã
- ã¯ã©ã¦ãå©ç¨æéãå¤å°ä¸ãã£ããããæ§è½ã¯5ï½6ååä¸
å½åã¯å¦ç¿ç¨éã®å©ç¨ãæ³å®ãã¦ãã¾ãããå¦ç¿ã«ã¯å¤§éã®GPUãè¦ãã¾ãããç°å¢ã¨ãã¦ãåãåºããããã®ã§ç§»è¡ããããããã§ããããããæ¨è«ï¼ãµã¼ãã¹ï¼ç¨éã§ãå©ç¨ãã¯ããã¦ãã¾ãããã¢ã¡ã¿ã«ãµã¼ãã¼ã¯å®é¡ã§ã®å©ç¨ã¨ãªãããã常æ稼åãç¶ããæ¨è«ï¼ãµã¼ãã¹ï¼ç¨éã¨ã®ç¸æ§ãè¯ãã£ãã®ã§ãã
- Dockerãå©ç¨ããã¼ã¿ããªãã£ãå®ç¾ããã«ãã¯ã©ã¦ãåãå¯è½ã«
Dockerãå©ç¨ãããã¨ã§æ³å®å¤ã«è¯ãã£ããã¨ã¯ãCUDAï¼GPUã®ãã©ã¤ãçãã¼ã«ç¾¤ï¼ã®ãã¼ã¸ã§ã³ã®åãæ±ããããããããªã£ããã¨ã§ããéçºè ã¯èªç±ã«DeepLearningã®ãã¬ã¼ã ã¯ã¼ã¯ãå©ç¨ã§ãããããå¿ è¦ãªCUDAã®ãã¼ã¸ã§ã³ãããããç°ãªãã±ã¼ã¹ãããããã¾ãããã®ãããªã¨ãããéçºè ããã¬ã¼ã ã¯ã¼ã¯ã¨CUDAã®ãã¼ã¸ã§ã³ãèªç±ã«é¸æãã¦Dockerã¤ã¡ã¼ã¸ãä½æãããã¨ãã§ãã¾ãã
ä¸æ¹ã§èª²é¡ãããã£ã¦ãã¾ããã
ããã¾ã§ã¯ãã³ãã¬ã¼ããã¼ã¹ã ã£ããã®ãDockerã¸å¤æ´ããããã調æ´ãå®ä½æ¥ãªã©ã¯æ³å®ä»¥ä¸ã®å¤§å¤ããããã¾ãããæé ãä½æããã社å
ãã³ãºãªã³ãå®æ½ããããã¦å¯¾å¿ãã¾ããã
ããä¸ã¤ã®èª²é¡ããå¦ç¿ç¨ã®ãã¼ã¿ãæ§ç°å¢ããæ°ããIDCFç°å¢ã¸ç§»ããã¨ã§ããããããããã¯ãã¼ã¿ã®è»¢éã«æéããããéããããæ念ãã¾ããããã®åé¡ã解決ããããã«ãã¯ã©ã¦ãæ§æã次ã«ç´¹ä»ãã¾ãã
ãã«ãã¯ã©ã¦ããå®ç¾ããæ§æ
次ã®æ§æããã¨ã«é ã«èª¬æãã¦ããã¾ãã
ã¤ã¡ã¼ã¸ç®¡ç
â ãã¼ã«ã«ã«ã¦Dockerã¤ã¡ã¼ã¸ä½æ
éçºè ãããããèªèº«ã®ãã·ã³ä¸ã§Dockerã¤ã¡ã¼ã¸ãä½æãã¦ãã¾ããéçºè ããããã§å©ç¨ãããã¬ã¼ã ã¯ã¼ã¯ãªã©ã®ç°å¢ãç°ãªãã®ã§ãåèªã®ç°å¢ã§æ§ç¯ããæ¹ãå¹çãè¯ãããã§ãã
â¡Dockerã¤ã¡ã¼ã¸ãã¢ãããã¼ãï¼Pushï¼
â¢Dockerã¤ã¡ã¼ã¸ããããã¤ï¼Pullï¼
Dockerã¤ã¡ã¼ã¸ã管çãããªãã¸ããªã¯ããã©ã¤ãã¼ãæ§ã¨æ軽ãããGCPã®Container Registryãæ¡ç¨ãã¾ãããç°¡åã«ç¬èªã®Dockerãªãã¸ããªãæºåã§ããGCP以å¤ã®ç°å¢ãããå©ç¨ã§ããã®ã§ãã«ãã¯ã©ã¦ãç°å¢ä¸ã®å©ç¨ã«é©ãã¦ãã¾ãã
Dockerãã¹ã
â£IDCFã¯ã©ã¦ã ãã¢ã¡ã¿ã«ãµã¼ãã¼ 2GPU
Dockerãã¹ãã¨ãªãã®ã¯ãIDCFã¯ã©ã¦ã ãã¢ã¡ã¿ã«ãµã¼ãã¼ã§ãããã®ãã¢ã¡ã¿ã«ãµã¼ãã¼ã«ã¯ãNVIDIA Tesla P100ã2ææè¼ãã¦ããããã«ãGPUå©ç¨ãè¤æ°ã³ã³ããã§å
±æãªã©ä¸¡æ¹ã®ç¨éã§ã®å©ç¨ãå¯è½ã§ãã
ã¾ããDockerã«ããã³ã³ããç°å¢ã®ããããµã¼ãã¼ããã«æ§è½ã§æ´»ãããã¨ãã§ãããã¢ã¡ã¿ã«ãµã¼ãã¼ãé©ãã¦ãã¾ãããªããªããã¢ã¡ã¿ã«ãµã¼ãã¼ä¸ã§ã³ã³ãããå©ç¨ããå ´åãã¯ã©ã¦ãç°å¢ã§å©ç¨ããã¨ãã®ä»®æ³åã«ãããªã¼ãã¼ãããããªãããã§ãã
â¤ã¹ã±ã¼ã«ã¢ã¦ã
IDCFã¯ã©ã¦ã ãã¢ã¡ã¿ã«ãµã¼ãã¼ã¯æ±ºã¾ã£ãå°æ°ãå®é¡ã§å©ç¨ãããµã¼ãã¹ã®ãããã¯ã©ã¦ãã®ããã«ä»ãããµã¼ãã¼ãå¢ããã¨ãã使ãæ¹ã¯ã§ãã¾ããããããããã¢ã¡ã¿ã«ãµã¼ãã¼ã¨IDCFã¯ã©ã¦ãã®ä»®æ³ãã·ã³ã¯åä¸ã®ãããã¯ã¼ã¯ã»ã°ã¡ã³ãã«æ¥ç¶ããã¦ãããããã¯ã©ã¦ãã®GPUæè¼ã®ãã·ã³ã¿ã¤ããä½æãããã¨ã§ãæ¥ãªè¨ç®éè¦ã«å¿ãã¦ã¹ã±ã¼ã«ã¢ã¦ããããã¨ãã§ãã¾ããDockerã«ãããã¼ã¿ããªãã£ãæ´»ç¨ãã¦ãããããå©ç¨è ãã使ãåæã¯åãããã«ä½¿ãã¾ãã
â¥nvidia-docker
å è¿°ã®ã¨ãããnvidia-dockerãå ¥ãããã¨ã§ãã³ã³ããä¸ããGPUãæ±ããã¨ãã§ãã¾ãã
ã¹ãã¬ã¼ã¸
â¦Storage GatewayããS3ããã¦ã³ã
移è¡ãã¦ããã£ã課é¡ã§ãããã¼ã¿è»¢éæéã¯ãAWS Storage Gatewayãå©ç¨ãããã¨ã§è§£æ±ºãã¾ãããå¦ç¿ç¨ã«ä½¿ããµã¼ãã¹çã®ãã°ãã¼ã¿ã¯ãAWS S3ä¸ã«èç©ããã¦ãã¾ããããã§ããã¼ã¿ãäºåã«ãã¹ã¦è»¢éãããã¨ããããStorage Gatewayçµç±ã§ã¢ã¯ã»ã¹ãããã¨ã«ãã¾ããã
â§å¦ç¿ãã¼ã¿ãNFSãã¦ã³ã
Storage Gatewayã¯ããã¡ã¤ã«ã¹ãã¬ã¼ã¸ã¨ãã¦å©ç¨ãNFSã§ãã¦ã³ãããåã³ã³ããããå©ç¨ã§ãã¾ãããã¡ã¤ã«ã®å©ç¨ã®ã¯ããã¯ãã¯ãS3ããåã£ã¦ããåé ãã§ãããä¸åº¦ãã£ãã·ã¥ãããã°ãã®å¾ã¯å¿«é©ã«å©ç¨ã§ãã¾ãã
ã¸ã§ã管ç
ç¾å¨ã¯ãéçºè
ãããããDockerãèµ·åãã¦å©ç¨ãããã¨ãå¤ããGPUãªã½ã¼ã¹ã®æå¹æ´»ç¨ã¨æ´ãªããã«ãã¯ã©ã¦ãæ´»ç¨ã®ããã«ããã«ãã¯ã©ã¦ã対å¿ãåæã¨ããç¬èªã«éçºããã¸ã§ã管çã·ã¹ãã ãå°å
¥ãã¦ãã¾ãã
ã¾ã¨ã
ä»åã¯ããªã¯ã«ã¼ããã¯ããã¸ã¼ãºæ§ã®IDCFã¯ã©ã¦ãç°å¢ãåºç¹ã¨ãããã«ãã¯ã©ã¦ãæ§æãç´¹ä»ãã¾ãããGPUãå©ç¨ããä¸ã§ã®ã³ã¹ãããã©ã¼ãã³ã¹ã¨ãã¼ã¿ããªãã£ã¨ãã課é¡ããDockerãå©ç¨ãããã¨ã§ãã«ãã¯ã©ã¦ãåãããã¨ã§è§£æ±ºãã¾ããã
ä»å¾ã®IDCFã¯ã©ã¦ãã«ã¯ãææ°æ©ç¨®ã®å°å ¥ãªã©ãã¯ãã«ãªãµã¼ãã¹éçºãæå¾ ãã¦ãããã¨ã³ã¡ã³ããããã ãã¾ãããGPUãDockerãæ´»ããã¤ã³ãã©ãå¼ãç¶ãæä¾ã»éçºãã¦æå¾ ã«å¿ãã¦ããããã§ãã