ã¯ããã«
ããã«ã¡ã¯ãå¯å£«éç ç©¶æãã©ãããã©ã¼ã 驿°PJã®å·ä¸ã§ããçåå¦ç ç©¶æ/å¯å£«éãå ±åã§éçºããæ°ããã¹ã¼ãã¼ã³ã³ãã¥ã¼ã¿ãå¯å²³ããç¥æ¸å¸æ²ã®ãã¼ãã¢ã¤ã©ã³ãã«ç´å ¥ãããå½åã®äºå®ãååããã¦ä»å¹´åº¦ãã試è¡éç¨ãéå§ããã¾ããã6æã«ã¯æ©éãã¹ãã³ã³ã©ã³ãã³ã°ã§ä¸çåã®åæ4å (TOP500, HPCG, HPL-AI, Graph500)ãç²å¾ãããªã©ã幸å ã®ããç«ã¡ä¸ãããè¦ãã¦ãã¾ããç§ãæå±ããé¨ç½²ã§ã¯å¯å²³ãå§ããå¯å²³ã¨åãCPUãæè¼ããå¼ç¤¾è£½åPRIMEHPC FX1000/700ä¸ã§ãã£ã¼ãã©ã¼ãã³ã°(DL)å¦çãé«éã«å®ç¾ããæè¡ã®ç ç©¶éçºããã¦ãã¾ããä»åã¯ãDLå¦çãé«éã«å®ç¾ããoneDNNã¨ããã©ã¤ãã©ãªã½ããã¦ã§ã¢ãå¯å²³åãã«ç§»æ¤ããéçºããã½ã¼ã¹ã³ã¼ããæ¬å®¶Intelã®oneDNNã«å¯ç¨¿ããåãè¾¼ã¾ãã話ããç´¹ä»ãã¾ãã
ãã£ã¼ãã©ã¼ãã³ã°å¦çã®ã½ããã¦ã§ã¢ã¹ã¿ãã¯
ãã£ã¼ãã©ã¼ãã³ã°å¦ç(以ä¸ãDLå¦ç)ãç¨ããã¢ããªã±ã¼ã·ã§ã³ã¯ãé常ãä¸å³ã«ç¤ºãããã«ããã¬ã¼ã ã¯ã¼ã¯å±¤ã¨ã©ã¤ãã©ãªå±¤ã®2層ããæãã½ããã¦ã§ã¢ã¹ã¿ãã¯ã«ããå®ç¾ããã¾ããã¦ã¼ã¶ã¼ãDLå¦çãç¨ããã¢ããªã±ã¼ã·ã§ã³ãå®è¡ãããå ´åããã¬ã¼ã ã¯ã¼ã¯ãç¨æããAPIãç¨ããå¦çãè¡ããããã¥ã¼ã©ã«ãããã¯ã¼ã¯ã®å®ç¾©ãå¦çå 容ãè¨è¿°ãã¾ãããã¬ã¼ã ã¯ã¼ã¯ã¯ä¸ããããå®ç¾©ãå¦çå 容ã«åºã¥ããã©ã¤ãã©ãªã½ããã¦ã§ã¢ã®æ©è½ãå¼ã³åºãã¦ãDLå¦çã®è¨ç®ãå®éã«å®è¡ãã¾ããDLå¦çãå®è¡ããã·ã¹ãã ã¯ãã¹ãã³ã³ãã¯ã©ã¦ãããã½ã³ã³ãã¹ãããªã©ã®æ§ã ãªè¦æ¨¡ã®ãã®ããããã¾ãã·ã¹ãã ã®ä¸ã§å®éã«å¦çãè¡ãH/WãCPUã®å ´åãããã°ãGPUã®å ´åãããã¾ããã½ããã¦ã§ã¢ã¹ã¿ãã¯ããã®ãããª2é層ã«ãã¦ãããã¨ã§ãDLå¦çãå®è¡ããã·ã¹ãã ãH/Wãéã£ãã¨ãã¦ããã®éãã¯ã©ã¤ãã©ãªå±¤ãå¸åããã¦ã¼ã¶ã¼ã«ã¯åããã¬ã¼ã ã¯ã¼ã¯ã使ããã¨ãå¯è½ã¨ãããã¨ã§ããããã¯ã¼ã¯å®ç¾©ã®ä»æ¹ãå¦çã®è¨è¿°æ¹æ³ãªã©ã®ä½¿ãåæãå ±éåã§ããã¨ããã¡ãªãããããã¾ãã
ã©ã¤ãã©ãªã½ããã¦ã§ã¢ã¯ã·ã¹ãã ãH/Wæ§è½ãæå¤§éã«å¼ãåºãããã«ãåå¥ã«æé©åãããã®ãç¨æããã¾ããé常ãH/Wã®éçºã»è£½é ãã³ããéçºãã¦ãããIntel製CPUç¨ã®ã©ã¤ãã©ãªã§ããã°IntelããNVIDIA製GPUç¨ã®ã©ã¤ãã©ãªã§ããã°NVIDIAãéçºããæä¾ãã¦ãã¾ããå¯å²³ãFX1000/700ã«ã¯Armv8-Aå½ä»¤ã»ããï¼ããã¯Androidã¹ãããiPhoneã«æè¼ããã¦ããCPUã¨åãï¼ã«ãHigh Performance Computingåãã®Scalable Vector Extension(SVE)ã¨ããå½ä»¤ã»ãããè¿½å æ¡å¼µããCPU A64FXãæè¼ãã¦ãã¾ããArmv8-Aå½ä»¤ã»ããã«å ãã¦SVEå½ä»¤ã»ãã(以ä¸ãã¾ã¨ãã¦Armv8-Aå½ä»¤ã»ããã¨å¼ã¶)ã«å¯¾å¿ããCPUã¯A64FXãä¸çåã§ãããããããã«æé©åãããDLå¦çã©ã¤ãã©ãªã¯åå¨ãã¦ãã¾ããã§ããã
Armã¢ã¼ããã¯ãã£åãDLå¦çã©ã¤ãã©ãªéçº
ç§ãæå±ããé¨ç½²ã®ããã·ã§ã³ã®1ã¤ã¯ãå¯å²³ã®ä¸ã§DLå¦çãé«éã«åãããã¨ã§ããArmv8-Aå½ä»¤ã»ããåãã®DLå¦çã©ã¤ãã©ãªã¯åå¨ãã¦ãªãã£ãã®ã§ãæ°è¦ã«éçºããå¿ è¦ãããã¾ããããã ããå¯å£«éãåç¬ã§æ°è¦ã©ã¤ãã©ãªãéçºããã¨ãã¦ãããã¬ã¼ã ã¯ã¼ã¯å´ããç°¡åã«ä½¿ããå½¢ã«ãªã£ã¦ããªãã¨ãå®è³ªãã¦ã¼ã¶ã¼ã使ãããã¨ã«ãªãã¾ãããæã ã¯ãIntelãx64å½ä»¤ã»ããåãã«éçºãã¦ããoneDNNã¨ããDLã©ã¤ãã©ãªãArmv8-Aå½ä»¤ã»ããã«ç§»æ¤ãããã¨ã«ãã¾ãããoneDNNã¯CPUã使ã£ãDLå¦çã©ã¤ãã©ãªã®ããã¡ã¯ãã¹ã¿ã³ãã¼ãã§ããããã§ã«æ§ã ãªãã¬ã¼ã ã¯ã¼ã¯ã§ãµãã¼ãããã¦ãã¾ããããã®ããoneDNNã®APIãæã¤Armv8-Aåããã£ã¼ãã©ã¼ãã³ã°ã©ã¤ãã©ãªãããã°ããã¬ã¼ã ã¯ã¼ã¯ã«æãå ¥ãããã¨ãªãã¦ã¼ã¶ã¼ã¯ã©ã¤ãã©ãªãå©ç¨ã§ãã¾ãã ãã¬ã¼ã ã¯ã¼ã¯ã¨åæ§ã«ãoneDNNãOpen Source Software (OSS)ã¨ãã¦ã½ã¼ã¹ã³ã¼ããå ¬é(https://github.com/oneapi-src/oneDNN)ããã¦ããããããããå ¥æãArmv8-Aå½ä»¤ã»ããåãã«ã³ã³ãã¤ã«ãç´ããã¨ãã§ãã¾ããããããoneDNNã¯x64å½ä»¤ã»ããåãã«ã¢ã»ã³ãã©ã¬ãã«ã§æé©åããå®è£ ã夿°å«ãã§ãããããå ã ã®ã½ã¼ã¹ã³ã¼ãããã®ã¾ã¾ã³ã³ãã¤ã«ãç´ããã ãã§ã¯ãã¾ã£ããæ§è½ãåºã¾ããã§ããããããããè¦é£ã®ã¯ãã¾ãã§ãããããªãã¨ãArmv8-AåãoneDNNã®éçºãå®äºããã®ã§ãã(詳細ã¯å¾è¿°ãã¾ã)ã
ä¸å³ã¯æã ãArmv8-Aåãã«æé©åãè¡ã£ããã¨ã«ããå¦çé度åä¸ã®ä¾ã§ããoneDNNã§ã¯DLå¦çã§ç¨ããããconvolutionãbatch_normalizationãeltwiseãpoolingãreorderãªã©ã®æ§ã ãªå¦çãè¡ããã¨ãã§ãã¾ããä¸å³ã¯ãã®ãã¡ã®reorderå¦ç(ãã¼ã¿ã®å夿ãé åºãå ¥ãæ¿ããå¦ç)ã«ã¤ãã¦ããªãªã¸ãã«ã®oneDNNã®ã½ã¼ã¹ã³ã¼ãããã®ã¾ã¾Armv8-Aå½ä»¤ã»ããåãã«ã³ã³ãã¤ã«ããå ´åã¨ãæã ãæé©åå®è£ ãæ½ããå¾ã®ã½ã¼ã¹ã³ã¼ããã³ã³ãã¤ã«ããå ´åã®å¦çéåº¦ã®æ¯è¼ã§ãããªãªã¸ãã«ã®oneDNNããã®ã¾ã¾ã³ã³ãã¤ã«ãã¦å¾ãããã½ããã¦ã§ã¢ã§ã®å¦çé度ã1ã¨ãã¦æ£è¦åãã¦ãã¾ãããã¹ããã¿ã¼ã³ã®ç¨®é¡ã«ãããã¾ãããæå¤§ç´400åãé«éåããã¦ãã¾ã(ããã¤ãã®ãã¹ããã¿ã¼ã³ã§ã¯æé©åã«ããé ããªã£ã¦ããããæ¸¬å®èª¤å·®ãå¦çæéã®çµ¶å¯¾å¤ãå°ããããããã¬ã¼ã ã¯ã¼ã¯ã¨çµåãã¦ä½¿ãå ´åã¯åé¡ã«ãªããªã)ãreorder以å¤ã®å¦çã«ã¤ãã¦ããArmv8-Aåãã«æé©åãããã¨ã«ãããæåéããæ¡éãã®å¦çé度åä¸ãå¾ã¦ãã¾ãã
æ¬å®¶oneDNNã¸ã½ã¼ã¹ã³ã¼ããå¯ç¨¿
oneDNNã¯OSSã¨ãã¦ã½ã¼ã¹ã³ã¼ããå ¬éããã¦ããã ãã§ãªãã誰ã§ãæ¹åããã½ã¼ã¹ã³ã¼ãã®ãã«ãªã¯ã¨ã¹ã(æ¹åããã½ã¼ã¹ã³ã¼ããoneDNNã®ä¸é¨ã¨ãã¦åãè¾¼ãã§ãããè¦æ)ãåºããã¨ãã§ãããªã¼ãã³ãªéçºã¹ã¿ã¤ã«ãæ¡ããã¦ãã¾ãããã«ãªã¯ã¨ã¹ããåºããã½ã¼ã¹ã³ã¼ãã¯ã¬ãã¥ã¼ããããã°ããªããã¨ãoneDNNã®å¦çé度æ¹åãæ©è½æ¡å¼µã«è²¢ç®ãããã®ã§ãããã¨ãèªããããã°ãoneDNNã®ä¸é¨ã¨ãã¦æ£å¼ã«åãè¾¼ã¾ãã¾ããoneDNNã¯x64å½ä»¤ã»ãããæããCPUåãã«éçºãå§ããããã½ããã¦ã§ã¢ã§ããæã ã¯ãoneDNNãArmv8-Aå½ä»¤ã»ããåãã«ç§»æ¤ãæé©åããæ¹é çoneDNNã®ã½ã¼ã¹ã³ã¼ããhttps://github.com/fujitsu/oneDNNã§å ¬éãã¦ãã¾ããããããå¯å²³ã¦ã¼ã¶ã¼ããã³ä¸ã®ä¸ã®Armv8-Aå½ä»¤ã»ãããæ¡ç¨ããCPUã®ã¦ã¼ã¶ã¼ã®äºãèããå ´åãDLå¦çã©ã¤ãã©ãªã®ããã¡ã¯ãã§ããæ¬å®¶ã®oneDNNã«æé«ã«ãã¥ã¼ãã³ã°ãããå®è£ ãæåããçµã¿è¾¼ã¾ãã¦ããæ¹ãããã¨èãã¾ãããããã§Intelã¨åæ¥ããæã ã®éçºææãç©æ¥µçã«æ¬å®¶oneDNNã¸ãã«ãªã¯ã¨ã¹ããã¦ãããã¨ã決ãã¾ããã
ã¨ããã§ãoneDNNãArmv8-Aå½ä»¤ã»ããåãã«æé©åããã¨ããæ¹è¯ã¯ãå¾è¿°ããXbyak_aarch64ã¨ããå¿ é æè¡ãçµã¿è¾¼ããªã©é常ã«å¤§ããããªãã®ã§ãããå°è¦æ¨¡ãªæ¹è¯ã®å ´åãã½ã¼ã¹ã³ã¼ããä¿®æ£ãã¦ãã®ã¾ã¾ãã«ãªã¯ã¨ã¹ããåºãã°ããã§ããããããã½ã¼ã¹ã³ã¼ãæ¹å¤ã®è¦æ¨¡ã大ãããã®ããã¬ã¼ã ã¯ã¼ã¯ã¨ã®é£æºé¨åã®APIã®å¤æ´ãªã©ã«é¢ãããã«ãªã¯ã¨ã¹ããåºãå ´åã¯ãoneDNNã§ã¯äºåã«ãã®ä¿®æ£å å®¹ãæ¹éãRequest For Comments (RFC)ã¨ããããã¥ã¡ã³ãã«ã¾ã¨ããå¿ è¦ãããã¾ããããã¦ãRFCèªä½ã®ãã«ãªã¯ã¨ã¹ããåºããã¬ãã¥ã¼ â ãã¼ã¸ã¨ããæé ãã¨ãåãæ±ºãã«ãªã£ã¦ãã¾ãããããä¸ã®ä¸ã®å¯å²³ã¦ã¼ã¶ã¼ãArmv8-Aã¦ã¼ã¶ã¼ã®ããã«ä¸çæ¸å½RFCãæ¸ãã¾ããããæ¥é ã®è«æãæ¸ãããçé¢ç®ã«æ¸ãã¾ãããé å¼µãã¾ãããæ¸ãä¸ãã¾ãããRFCã®ãã«ãªã¯ã¨ã¹ããåºãã¾ãããç¨ãªããIntelã®éçºè ã®æ¹ããããããã¨RFCã®å 容ã«ã¤ãã¦ã質åãæ¥ã¾ãããå¤é ãã¾ã§ä¸çæ¸å½åçãæ¸ãã¾ãããããã¦å¯ã¾ãããèµ·ãã¾ãããèµ·ãããæ¢ã«æ¬¡ã®è³ªåãæ¸ãè¾¼ã¾ãã¦ã¾ãããIntelã®éçºè ã®æ¹ã¯ã¢ã¡ãªã«å¨ä½ãªã®ã§æå·®ã®é¢ä¿ã§ãæ¸ãã¦å¯ã¦èµ·ããããã§ã«æ¬¡ã®è³ªåãæ¥ã¦ã¾ããããããã«åçãæ¸ãã¾ãããå¯ã¾ãããèµ·ãã¾ãããæ¬¡ã¯Intelã®éçºè ã®æ¹ããé¢ä¿ããããã¨Arm社ã®éçºè ãå¬åãã¦ã¾ããã2対1ã®æ¦ãã«ãªãã¾ãããArmã®éçºè ã®æ¹ã¯ã¤ã®ãªã¹å¨ä½ã§ããArmãIntelã®æéå·®æ»æã§ããå³ããæ¦ãã§ããããããããªãã¨ããã¹ã¦ã®è³ªåã«åçãçµããæçµçã«ã¯ç§ã®åºããRFCãèªãã¦ãããã¾ããï¼å®éã¯ãç´ç²ã«æè¡çãªè³ªçã®ããã¨ãã§ãçããã¨ã¦ã紳士çãªããã¨ãããã¦ãã ããããã人ã°ããã§ããæå·®ã®é¢ä¿ã§åçãæ¸ãããããã«æ¬¡ã®è³ªåãæ¥ãã¨ããã®ã¯ãããã質çãã¹ã ã¼ãºã«é²ãã§ãRFCãã¼ã¸ã¾ã§ã®æ¥æ°ãç縮ãããã¡ãªãããããã¾ãï¼ã RFCããã¼ã¸ãããå¾ã¯ãRFCã«åºã¥ãã¦ä¿®æ£ããã½ã¼ã¹ã³ã¼ãã«ã¤ãã¦ãã«ãªã¯ã¨ã¹ããåºããã¨ã«ãªãã¾ããããã§ããã¯ããArmãIntelã®æéå·®æ»æãè¿ãæã¡ã¤ã¤ããªãã¨ãã½ã¼ã¹ã³ã¼ãã®ãã¼ã¸ã«ãã©ãçãã¾ããã
ãã¦ããã®Armv8-Aå½ä»¤ã»ããã«æé©åããã½ã¼ã¹ã³ã¼ããIntelãããéçºã主å°ããOSSã«ãã¼ã¸ããã¨ããææã§ãããããã£ããªãã§ãããªãã¨ã§ããã®?ãã¨å¼ç¤¾ã®ä¸ã§ã©ããããèµ·ãã¾ãããå½ç¶ã§ããããIntelããã«ã¨ã£ã¦ã¿ãã¨ç«¶åä»ç¤¾ãå©ãããã¨ã«ãªãã¾ãããããã¡ããã第1ã®çç±ã¨ãã¦ã¯Intelãããä»CPUåãã«ãã«ãªã¯ã¨ã¹ãã®éæ¸ãéãã¦ããã¦ããããã¨è¨ãã®ãããã¾ããIntelããã«ã¯ãã®å ´ãåãã¦æè¬ç³ãä¸ãã¾ãã第2ã®çç±ã¯ãæã ãéçºããã½ã¼ã¹ã³ã¼ããæè¡çã«ãã£ãããããã®ã§ãããDLå¦çã©ã¤ãã©ãªã®çºå±ã«è²¢ç®ãããã®ã§ããã¨èªããããã®ããªã¨æã£ã¦ãã¾ãããã®ç¹ã¯å°ãèªãã«æã£ã¦ãã¾ãã
Xbyak_aarch64ã®éçº
ãã¦ãããããã¯oneDNNãArmv8-Aå½ä»¤ã»ããåãã«ç§»æ¤ã»æé©åããéã®å°ãçªã£è¾¼ãã æè¡çãªè©±ããç´¹ä»ãã¾ãã IntelãéçºããoneDNNã«ã¯ãã¼æè¡ã®ä¸ã¤ã¨ãã¦ãXbyakã¨ããJITã¢ã»ã³ãã©ãçµã¿è¾¼ãã§ããç¹ãããã¾ã(ä¸å³åç §)ã
Xbyakã¯ãµã¤ãã¦ãºã©ãã®å æãããéçºããOSSã¨ãã¦å ¬é(https://github.com/herumi/xbyak)ããã¦ããã½ããã¦ã§ã¢ã§ããXbyakã¯ä»¥ä¸ã®ç¹å¾´ãããã¾ãã
- ã¢ã»ã³ãã©ããã°ã©ã ãC++ã§è¨è¿°ã§ãã
- å®è¡æã«å®è¡ã³ã¼ããçæãã
1ã®ç¹å¾´ã ãè¦ãã¨ãã¤ã³ã©ã¤ã³ã¢ã»ã³ãã©ãintrinsic颿°ã§ã¢ã»ã³ãã©å½ä»¤ãæå®ãã¦ããã®ã¨å¤ãããªãããã«æããããããã¾ãããXbyakã§ã¯ã«ã¼ãå¦çé¨åã®ãããããã¤ã«é¨åãå«ãã¦ãµãã«ã¼ãã³å ¨ä½ãå®å ¨ã«ã¢ã»ã³ãã©ã¬ãã«ã§æ¸ããã¨ãã§ããéçºè ãæå³ããªãå½ä»¤ãã³ã³ãã¤ã©ã«ãã£ã¦æ¿å ¥ãããããããã¨ãªããæå³éãã«å½ä»¤åãä½ããã¨ãã§ããç¹ãåªãã¦ãã¾ããã¾ãã2ã®ç¹å¾´ãé常ã«å¼·åã§ããå®è¡æã®ãã©ãããã©ã¼ã ã®æ å ±(CPUã³ã¢æ°ããã£ãã·ã¥ã¡ã¢ãªå®¹éã対å¿ããå½ä»¤ã»ãã)ãå®è¡æã«æ±ºå®ãããã©ã¡ã¼ã¿ãèæ ®ãã¦ãæé©ãªå®è¡ã³ã¼ããä½ãåãããã¨ãã§ãã¾ããä¾ãã°ãCPUã³ã¢æ°ããã£ãã·ã¥ã¡ã¢ãªå®¹éã«å¿ãã¦ãæé©ãªã«ã¼ãåå²ãããå®è¡ã³ã¼ããçæãããããããæ¡ä»¶åå²å¦çãå®è¡æã«æ±ºå®ãããã©ã¡ã¼ã¿ã«ãã£ã¦å¿ ãå®è¡ãããªããã¨ãä¿è¨¼ãããå ´åããããããã®æ¡ä»¶åå²å¦çãé¤å¤ããå®è¡ã³ã¼ããçæãããããããã¨ãã§ããé«åº¦ã«æé©åããå®è¡ã³ã¼ããå®ç¾ãããã¨ãã§ãã¾ããXbyakã«ã¤ãã¦ã¯ãéçºè ã®å æãããèªèº«ãç´¹ä»ã¹ã©ã¤ããå ¬é(https://www.slideshare.net/herumi/xbyak)ããã¦ããããXbyakã«æ§ã ãªãµã³ãã«ã³ã¼ããä»å±ããããã¦ãã¾ãã®ã§ãèå³ãããæ¹ã¯ãã¡ããåç §ãã ããã
ã¨ããã§ãXbyakã¯x64å½ä»¤ã»ããã®å®è¡ã³ã¼ããçæããã½ããã¦ã§ã¢ã§ãããããã£ã¦ãArmv8-Aå½ä»¤ã»ãããå®è¡ããA64FXåãã«ã¯ä½¿ããã¨ãã§ãã¾ãããoneDNNãArmv8-Aã¸ç§»æ¤ããããã«ã¯ãXbyakã¨åçã®æ©è½ãArmv8-Aå½ä»¤ã»ããåãã«å®ç¾ããã½ããã¦ã§ã¢ãæ°è¦ã«ä½ãå¿ è¦ãããã¾ãããArmv8-Aå½ä»¤ã»ããã¯ãªãã©ã³ãã®ããªã¨ã¼ã·ã§ã³ãèæ ®ããã¨4,000ãè¶ ãã種é¡ã®å½ä»¤ãããã¾ããããªãã¡ã4,000ãè¶ ããæ©æ¢°èªãçæãã颿°ã®å®è£ ã¨æ¤è¨¼ãå¿ è¦ã«ãªãã¾ã ãã¡ãªã¿ã«ãx64ã®å ´åã¯è»½ã1ä¸ç¨®é¡ãè¶ ãã¦ãã¾ããæ°ããã®ãèºèºãããããã®æ°ã§ãã
é常ã«ããªã¥ã¼ãã¼ãªéçºã§ã¯ããã¾ããããXbyakã®å æããããæè¡çãªã¢ããã¤ã¹ãåããããã¨ãã幸éãæä¼ã(å æãããæ¸ãããããã°(https://blog.cybozu.io/entry/xbyak_for_fugaku)ããæè¡è©è«ç¤¾ããã®ç´é¢ä¸ã®å¯¾è«è¨äº(https://gihyo.jp/news/interview/2020/12/1801)ãæ¯éãã¢ã¯ã»ã¹ãã¦ã¿ã¦ãã ãã)ããªãã¨ãéçºããArmv8-AåãXbyakã¯Xbyak_aarch64ã¨å½åããhttps://github.com/fujitsu/Xbyak_aarch64ã§å ¬éãã¦ãã¾ããå ã«æ¸ããããã«ãçããããæã¡ã®Androidã¹ãããiPhoneã¯Armv8-Aå½ä»¤ã»ãããæ¡ç¨ããCPU (ãã ããSVEå½ä»¤ã¯é対å¿)ãè¼ã£ã¦ãã¾ãããããã£ã¦ãXbyak_aarch64ã使ã£ã¦çæããArmv8-Aå½ä»¤ã»ããã®å®è¡ã³ã¼ããåãããã¨ãã§ããã¨ãããã¨ã§ãããããããããå°æ¥Xbyak_aarch64ã使ã£ã¦ä½ãããã½ããã¦ã§ã¢ãç¥ããªããã¡ã«ããªãã®ã¹ããã®ä¸ã§åãã¦ããããããã¾ããã
Xbyak_aarch64ã®å®æã«ãããoneDNNãæ¬æ ¼çã«Armv8-Aã¸ç§»æ¤ããæºåãæ´ãã¾ãããoneDNNã«ã¯DLå¦çã§ä½¿ãããconvolutionãbatch_normalizationãeltwiseãpoolingãreorderãªã©ãæ§ã ãªå¦çãXbyakã使ã£ã¦å®è£ ããã¦ãã¾ããæå§ãã«ãä¸çªå¦çå 容ãã·ã³ãã«ãªreorderãXbyak_aarch64ã使ã£ã¦Armv8-Aå½ä»¤ã»ããåãã«ç§»æ¤ãã¦ã¿ã¾ãããJITã¢ã»ã³ãã©ã使ã£ãå®è£ ã¨ãããã°ã®ããæ¹ããã¹ã¿ã¼ãã¤ã¤ãå®è£ ã¨æ©è½æ¤è¨¼ãå®äºããããã¨ãã§ãã¾ããããè¦ãã¦ããããããå¯å£«éç ã®Xbyak_aarch64ã®æ§è½ã¨ããããã¨å¿ã®ä¸ã§ã¤ã¶ãããªããXbyak_aarch64ã使ã£ãå®è£ ã¨ãããã§ãªãããªãªã¸ãã«ã®oneDNNã®ã¢ã«ã´ãªãºã ãç´ ç´ã«C++ã§æ¸ãä¸ããå®è£ ã¨ã®å¦çéåº¦å·®ãæ¯è¼ãã¾ããããArmã¢ã¼ããã¯ãã£åãDLå¦çã©ã¤ãã©ãªéçºãã«è¼ããã°ã©ãã¯ãã®ã¨ãã«æ¸¬å®ããçµæã«ãªãã¾ãããã®çµæã¯èªåã§ä½ã£ã¦ãããªããé©ãã¾ãããã¾ãããæå¤§ç´400åã2æ¡ä»¥ä¸ãé«éåãããã¨ã¯ã
Xbyak_translator_aarch64ã®éçº
Xbyak_aarch64ã®å®æã«ãããåºæ¬çã«ã¯oneDNNãArmv8-Aåãã«ç§»æ¤ãããã¨ãå¯è½ã«ãªãã¾ãããã§ããã1ã¤åé¡ãããã¾ãããç§»æ¤éçºã«å¿ è¦ãªå·¥æ°ãã¾ã£ããè¶³ãã¾ãããXbyak_aarch64ã使ã£ã¦æ¸ãæããã¨ä¸è¨ã§è¡¨ãã¦ãã¾ãããå®éã«ã¯ä»¥ä¸ã®ãããªä½æ¥ã«è©²å½ãã¾ã(ä¸å³åç §)ã
- oneDNNã®ã½ã¼ã¹ã³ã¼ãä¸ã«ç¾ããXbyakã§å®è£ ããã颿°ã確èªãããã®é¢æ°ãçæããIntel CPUã®å½ä»¤ãã©ããªå¦çãè¡ãå½ä»¤ãªã®ããä¸ã¤ä¸ã¤Intel CPUã®ãªãã¡ã¬ã³ã¹ããã¥ã¢ã«ãè¦ã¦ç¢ºèªããã
- oneDNNã®ã½ã¼ã¹ã³ã¼ãã§Xbyakã使ã£ã¦å®è£ ããã¦ããé¨åã確èªããå ¨ä½ã¨ãã¦ã©ã®ãããªå¦çãè¡ãå®è¡ã³ã¼ããçæããã®ããçè§£ããã
- 2ã§çè§£ããå®è¡ã³ã¼ããArmv8-Aåãã«çæãããããã©ã®Armv8-Aå½ä»¤ã使ã£ããè¯ãããArmv8-Aãªãã¡ã¬ã³ã¹ããã¥ã¢ã«ãè¦ã¦ç¢ºèªããXbyak_aarch64ãæä¾ãã颿°ã使ã£ã¦ã³ã¼ãã£ã³ã°ãã¦ããã
ããã¯ãªããªã大å¤ãªä½æ¥ã§ããããæå³ãç¥ããªãå¤å½èªãããä¸ã¤ã®ç¥ããªãå¤å½èªã«ç¿»è¨³ãããããªä½æ¥ã§ãããã¾ãã¾ç§ã®è¿ãã«åº§ã£ã¦ããæé©åãå¾æãªæ¬ç°ãããæä¼ã£ã¦ããããã¨ã«ãªã£ãã®ã§ãããããªã«ã¶ãoneDNNã«ã¯å¤ãã®DLåãã®æ§ã ãªå¦çãç¨æããã¦ãããããã«åè ç ç©¶å¡ã¨ããã©ãä¸äººã§ããããã®ã¯å°é£ãªéã§ããããã¨è¨ã£ã¦ã仿ãã¢ã»ã³ãã©ã¬ãã«ã®å®è£ ãçè§£ããªããã³ã¼ãã£ã³ã°ä½æ¥ãã§ãã人æãè±å¯ã«éããããç¶æ³ã«ãããã¾ãããå°ãã¾ãããããã§ãXbyakã使ã£ã¦å®è£ ãããã½ã¼ã¹ã³ã¼ãã(ã»ã¼)æ¸ãæãããã¨ãªããArmv8-Aåãã®å®è¡ã³ã¼ããçæãããã¨ãã§ããããã«ãã¦ãã¾ããJITã¢ã»ã³ãã©ã®ç¿»è¨³æ©è½:Xbyak_translator_aarch64(ã³ã¼ããã¼ã :éé¢è£å®è¨ç»)ãéçºãããã¨ã«ãã¾ããã ä¸å³ã«Xbyak_translator_aarch64(以ä¸ãTranslator)ã®åä½ã示ãã¾ããTranslatorã¯æ¬¡ã®æµãã§Armv8-Aå½ä»¤ã®æ©æ¢°èªçæãè¡ãã¾ãã
- Xbyakã使ã£ã¦x64ã®æ©æ¢°èªãçæããã
- 1ã§çæããæ©æ¢°èªããã³ã¼ãããå½ä»¤ã®ç¨®é¡(add/sub/mov/vpaddd/vpsubd/vpmovusdwãªã©)ã¨ãªãã©ã³ãã®æ å ±(ã¬ã¸ã¹ã¿ãªãã©ã³ã:ã¬ã¸ã¹ã¿ã®ç¨®é¡(æ±ç¨32/64ãããã¬ã¸ã¹ã¿ãxmm/ymm/zmmã¬ã¸ã¹ã¿)ãã¬ã¸ã¹ã¿ã®çªå·ãã¡ã¢ãªãªãã©ã³ã:ã¢ãã¬ãã·ã³ã°ã¢ã¼ããã¢ãã¬ã¹ã¬ã¸ã¹ã¿ãdisplacementãªã©)ãåå¾ããã
- 2ã§åå¾ããæ å ±ãå ã«ã対å¿ããArmv8-Aå½ä»¤å(1ã¤ã®x64å½ä»¤ã¯1ã¤ä»¥ä¸ã®Armv8-Aå½ä»¤ã«å¤æããã)ã«å¤æãããã®å½ä»¤åã«å¯¾å¿ããXbyak_aarch64ã®é¢æ°ãcallãã¦Armv8-Aã®æ©æ¢°èªåãçæããã
- 1ï½3ããXbyakã使ã£ã¦çæããããã¹ã¦ã®x64ã®æ©æ¢°èªã«å¯¾ãã¦è¡ãã
2ã®x64æ©æ¢°èªã®ãã³ã¼ãå¦çã¯IntelãOSSå ¬éãã¦ããIntel XEDã¨ããã©ã¤ãã©ãªã使ããã¨ã§éçºä¸è¦ã¨ã§ãã¾ãããTranslatorã®ä¸»ããéçºé¨åã¯ã3ã®x64å½ä»¤ã¨Armv8-Aå½ä»¤åã®å¯¾å¿é¢ä¿ãå®ç¾©ãããã¨ã«éç´ããã¾ãã
å®ã¯Xbyak_translator_aarch64ãæåã«å®è£ ãããã¨ããã¨ãã¯ãã®ãããªææ³ã§ã¯ããã¾ããã§ãããä¸è¨1ã2ã®æé ãçµããXbyakã®ã¤ã³ã¿ã¼ãã§ã¼ã¹(颿°callã®å¼æ°ãè¿å¤)ã¯ãã®ã¾ã¾ã§ãä¸ã®å®è£ ãä¿®æ£ãã¦Xbyak_aarch64ã®é¢æ°ãç´æ¥callããæ¹å¼ã§ã®å®è£ ã試ãã¾ãããããããæ«æãã¾ãããx64å½ä»¤ã»ããã¯é·ãæ¡å¼µã®æ´å²ããããé常ã«è¤éãªå½ä»¤ã¨ã³ã³ã¼ãã£ã³ã°ä½ç³»ã«ãªã£ã¦ãã¾ããã©ãã«ã©ããªæ å ±ãã¨ã³ã³ã¼ãããã¦ããããçè§£ããã®ã¯ä¸æä¸å¤ã§ã¯ã§ãã¾ãããXbyak_aarch64ã®éçºã§ãåãã§ããããã®æã®ã½ããã¦ã§ã¢ã§æãã®ã¯å½ä»¤ã¬ãã«ã§èª¤ã£ãæåããããã°ãå ¥ããã¨ã§ããx64ã®å½ä»¤ãArmv8-Aã®å½ä»¤ã«å¤æããéã«1ã¤ã§ãèª¤å¤æããã£ã¦ã¯ãæ£ããããã°ã©ã ãåãã¾ãããããã¦ããã®èª¤å¤æã¯ãã¨ãããããã°ãããã¨ãé常ã«å°é£ã§ãããã¨ã容æã«æ¨æ¸¬ã§ãã¾ããoneDNNãXbyakã使ã£ã¦çæãã1ã¤ã®ãµãã«ã¼ãã³ã¯ãä¸çªå¤ããã®ã§1ä¸ã¹ããããè¶ ããå½ä»¤ããæãã¾ããããã°ã©ã ã¯å®èµ°ãããã§ãè¨ç®çµæãéããã©ããã1ä¸ã¹ãããã®ä¸ã®ã©ããã§å½ä»¤å¤æãã¹ãããããã ãã©ããã£ã¦è¦ã¤ãã¾ããã?
ã¨ãããã¨ã§ãèãæ¹ãæ¹ãã¾ããããããx64ã®æ©æ¢°èªã¯ä¸æ¦çæãã¡ãã£ã¦ãã ããããããéã¢ã»ã³ãã«ãã¦ãåx64å½ä»¤ã®æ å ±ã¯åãããããå½¢ã§åãåºãã¦ããã®ãã¨æçãã¡ããã¾ããããã¨ããä¸å³ã®æ¹å¼ã«ãã©ãçãã¾ãããããã¦ãããããå¦çæ¹å¼ãã¨ãã®ã§ããã°ãéã¢ã»ã³ãã«å¦çã«ã¤ãã¦ã¯Intel XEDã¨ããIntel製ã®ã©ã¤ãã©ãªã使ãããã¨ããå æããããã®ã¢ããã¤ã¹ãæ©ã ã«ããã ããã¨ãã幸éã«ãæµã¾ããç¡äºãXbyak_translator_aarch64ãéçºãããã¨ãã§ããã®ã§ããã
ãã¦ãèªè ã®ä¸ã«ã¯Translateå¦çã®ãªã¼ãã¼ãããã¯å¤§ä¸å¤«ãªã®ãã¨ããçåãæãããæ¹ãããã£ãããããããã¾ãããéãã§ããå®éãXbyakã§x64ã®æ©æ¢°èªãçæ->ãã³ã¼ã->対å¿ããArmv8-Aã®æ©æ¢°èªãçæã¨ããæé ããã©ããããArmv8-Aã®æ©æ¢°èªåãçæããå¦çã«ããããæéããããã¾ããã¨è¨ã£ã¦ãã1ç§ã«æºããªãæéã§ããDLã®å¦çã¨ããã®ã¯ãä¾ãã°ç»åèå¥ã®ããã®ãã¥ã¼ã©ã«ãããã¯ã¼ã¯ãå¦ç¿ããå¦çã¨ãã£ãå ´åãä½ç¾ä¸æãã®æå¸«ç»åãã¼ã¿ã«å¯¾ãã¦ç¹°ãè¿ãå¦çãè¡ãã¨ãã£ããã¨ãè¡ããã¾ããã¹ãã³ã³ã使ã£ã¦æ°æéã¨ãæ°æ¥ã¨ãã£ããªã¼ãã¼ã®è¨ç®æéã§ããããã¨æ¯ã¹ãã¨ãå®è¡ã³ã¼ãçæã«1ç§ãããã¨ããã®ã¯å ¨ãç¡è¦ãã¦ãã¾ããªããã¨ã«ãªãã¾ããä»åã®éçºæ¹å¼ã鏿ã§ã¯ãDLå¦çã®ãã®ãããªè¨ç®ç¹æ§ãèæ ®ãã¦è¡ãã¾ããã
Translatorã«ãããx64åãã«å®è£ ãããoneDNNã®ã½ã¼ã¹ã³ã¼ããæ¸ãæãããã¨ãªãããã®ã¾ã¾Armv8-Aå½ä»¤ã»ããåãã«æµç¨ã§ãããã¨ã«ãªãã¾ããx64ã¨Armv8-Aã®å½ä»¤ã»ããã®å·®ã¯Translatorããã¹ã¦å¸åãã¦ããã¾ããoneDNNã¯ç¾å¨ãçãã«éçºãé²ãããã¦ãããæ¥ã ãDLåãã®æ°ããå¦çã®è¿½å ãããããªãæé©åãXbyakã使ã£ã¦x64å½ä»¤ã»ããåãã«è¡ããã¦ãã¾ãããããæ©è½è¿½å ã»æé©åãã¿ã¤ã ãªã¼ã«Armv8-Aåãã«æä¾ãããã¨ãã§ããããã«ãªãã¾ãããå®éã«oneDNN v1.6ã®Xbyakã使ã£ã¦å®è£ ããã¦ããä¸é¨ããTranslatorã使ã£ã¦Armv8-Aåãã¸ã®ç§»æ¤ãè¡ã£ã¦ã¿ã¾ãããç´2é±éã§Armv8-Aã§å åéã1å¦çé度ã§åãããã¨ãã§ãã¾ãããoneDNNã¯Intelã4年以ä¸ã®æ³æãããã¦x64åãã«æé©åãéãã¦ããã½ããã¦ã§ã¢ã§ãããããããã¾ãã«ããµã¯ãã¨ç§»æ¤ã§ãã¦ãã¾ã£ããã¨ã«é©ãã¾ããã Translatorã®ã½ã¼ã¹ã³ã¼ããOSSã¨ãã¦https://github.com/fujitsu/Xbyak_translator_aarch64ã§å ¬éãã¦ãã¾ããèå³ãããã¾ãããåç §ãã¦ã¿ã¦ãã ããã
CPUãç¨ããDLå¦çé度æ¥çæé«ã¬ãã«ãéæ
Xbyak_aarch64ãXbyak_translator_aarch64ã¨ãã2ã¤ã®æ¦å¨ãæã«ããæã ã¯oneDNNãä¸éãArmv8-Aåãã¸ã®ç§»æ¤ã宿ããã¦ãã¾ããä¸å³ã¯ãã¬ã¼ã ã¯ã¼ã¯å´ã½ããã¦ã§ã¢ã¨ãã¦TensorFlowã¨çµã¿åãããå ´åã®Resnet-50ã®å¦çéåº¦ã®æ¸¬å®çµæã§ãããªãªã¸ãã«ã®oneDNNã®ã½ã¼ã¹ã³ã¼ããArmv8-Aåãã«ã³ã³ãã¤ã«ããã ãã®oneDNNãç¨ããã¨ãåé ã®ã°ã©ãã®ããã«æ°ç¾åã®å¦çé度差ãããã®ã§ãoneDNNã®æ¿ããã«æ±ç¨çãªæ°å¤æ¼ç®ç¨ã®ã©ã¤ãã©ãªãç¨ããå ´åã¨ãä»åéçºããArmv8-Aåãã«æé©åããoneDNNãç¨ããå ´åã¨ã§æ¯è¼ãã¦ãã¾ããArmv8-Aåãã«æé©åããoneDNNã«ãããå¦ç¿å¦çã§ã¯9.2åãæ¨è«å¦çã§ã¯7.8åã¨å¤§å¹ ã«é«éåãããã¨ãã§ãã¦ãã¾ãã
Xbyak_aarch64ãå½ä»¤ã¬ãã«ã§ã«ãªã«ãªã«æé©åããå®è¡ã³ã¼ãã®çæãå¯è½ã«ããXbyak_translator_aarch64ãx64åãã®ã½ã¼ã¹ã³ã¼ãæµç¨ã«ããå¤§å¹ ãªéçºå·¥æ°ç縮ãå®ç¾ãããã¨ã«ãããé«ãå¦çæ§è½ãå®ç¾ããoneDNNãçãéçºæéã§Armv8-Aåãã«ç§»æ¤ã»æé©åãããã¨ãã§ãã¾ããã
oneDNNã®Armv8-Aåãç§»æ¤éçºã®ç¾å¨ Xbyak_aarch64ãXbyak_translator_aarch64ãçµã¿è¾¼ã¿ãArmv8-Aåãã®ç§»æ¤ã宿ãããoneDNNã®ã½ã¼ã¹ã³ã¼ãã¯https://github.com/fujitsu/oneDNNã§å ¬éãã¦ãã¾ããã¾ãããã®ããã°ã®åé ã«è¨è¼ããããã«ãXbyak_aarch64ã¨reorderå¦çã®JITã³ã¼ãçæå¦çã¯æ¬å®¶oneDNNã«ãã«ãªã¯ã¨ã¹ããåºããæ£å¼ã«åãè¾¼ã¾ãã¾ãããä»å¾ããã®ä»ã®å¦çã«ã¤ãã¦ãé æ¬¡ãã«ãªã¯ã¨ã¹ããåºãã¦ããäºå®ã§ãã
æã ãéçºããXbyak_aarch64/Xbyak_translator_aarch64/ Armv8-AåãoneDNNã¯ãããããhttps://github.com/fujitsuã«ã¦ã½ã¼ã¹ã³ã¼ããå ¬éããéææ§ã®é«ãã¹ã¿ã¤ã«ã§éçºãé²ãã¦ãã¾ãããæè¦ãã³ã¡ã³ããã½ã¼ã¹ã³ã¼ãã®ãã«ãªã¯ã¨ã¹ã大æè¿ã§ããã¡ã¼ã«ã®å ´åã¯ãã¡ãarm_dl_oss[at mark]ml.labs.fujitsu.comã¸ãé¡ããã¾ãããã¡ãããgithubä¸ã§ã®ISSUEã¸ã®æ¸ãè¾¼ã¿ãwelcomeã§ãã
ã¾ã¨ã
OSSã¨ãã¦éçºãé²ãããã¦ããDLå¦çã®ã©ã¤ãã©ãªã½ããã¦ã§ã¢oneDNNããã¹ã¼ãã¼ã³ã³ãã¥ã¼ã¿å¯å²³ã§é«éã«åä½ããããããArmv8-Aå½ä»¤ã»ããåãã«æé©åãç§»æ¤ãã¾ãããã¾ããoneDNNç§»æ¤ã«éããå¿ è¦ä¸å¯æ¬ ãªãã¼æè¡ã§ããArmv8-AåãJITã¢ã»ã³ãã©Xbyak_aarch64ã¨ãç§»æ¤éçºãå éããXbyak_translator_aarch64ã®éçºã«ã¤ãã¦ãç´¹ä»ãã¾ãããæã ãéçºããXbyak_aarch64ã¨ãããã使ã£ãArmv8-Aå½ä»¤ã»ããåãã«æé©åããã½ã¼ã¹ã³ã¼ãã¯ãæ¬å®¶oneDNNã«æ£å¼ã«åãè¾¼ã¾ãã¦ãã¾ããä»å¾ããæã ãéçºããArmv8-Aåãã«æé©åããå®è£ ã¯ç¶ç¶çã«ãã«ãªã¯ã¨ã¹ããåºãã¦ããäºå®ã§ãããã¤ããçããã®ãæå ã®ã¹ããã®ä¸ã§ãæã ãéçºããã½ããã¦ã§ã¢ãåä½ããæ¥ãæ¥ããã¨ã夢è¦ã¦ãç ç©¶éçºãç¶ç¶ãã¦ããã¾ãã
èè ç´¹ä»
å·ä¸ å¥å¤ªé(åç:å³)
2007å¹´å¯å£«éç ç©¶æå ¥ç¤¾ãããã¾ã§ãç»åã³ã¼ããã¯LSIãã»ã³ãµãã¼ãã®ç ç©¶éçºãªã©ã«æºããã2019å¹´ããArm HPCç°å¢åãã®ãã£ã¼ãã©ã¼ãã³ã°å¦çã½ããã¦ã§ã¢éçºã«å¾äºãgithub.comã®ã¢ã«ã¦ã³ãã¯kawakami-kã
æ å 康å¿(åç:å·¦)
2009å¹´å¯å£«éç ç©¶æå ¥ç¤¾ãçµã¿è¾¼ã¿åããã«ãã³ã¢ããã»ããµã®ã½ããã¦ã§ã¢æè¡ãHEVCç»åã³ã¼ããã¯åè·¯ãã¯ã¤ã¤ã¬ã¹ã»ã³ãµã¼ãããã¯ã¼ã¯å¶å¾¡ã·ã¹ãã ãç¡ç·å¹²æ¸å¯è¦åæè¡ã®éçºã«æºãã£ã¦ããã2019å¹´ããArm HPCç°å¢åãã®ãã£ã¼ãã©ã¼ãã³ã°å¦çã½ããã¦ã§ã¢éçºã«å¾äºãå³èé帯ãè² å·ããã»ã©ã®ããããµã«æãæã¤ãgithub.comã®ã¢ã«ã¦ã³ãã¯kurihara-kkã
ç¦æ¬ å°äºº(åç:ä¸å¤®)
2012å¹´ããå¯å£«éç ç©¶æã§ããã°ã©ã ã®é«éåã«é¢ããç ç©¶éçºã«å¾äºãææ°ãã¼ãã¦ã§ã¢åãã«ãHPCã¢ããªãLinapckãè¡åç©ãªã©ã®é«éåãè¡ã£ã¦ããã2019å¹´ãããããã¼ã¸ã£ã¼ã¨ãã¦ArmåãDeep learningã®ã½ããã¦ã§ã¢ã¹ã¿ãã¯éçºãè¡ã£ã¦ããã
- ææ°Intel製CPUã§å®ç¾ãããæ§è½ã®ç´70%ã¨ããæçµçã«A64FXã®æé«æ§è½ãå¼ãåºãåãã«ã¯ãå¦çã®ããã«ããã¯é¨åã«ã¤ãã¦ã¯Xbyak_aarch64ã使ã£ã¦Translatorãä»ããç´æ¥æé©ãªJITã³ã¼ããçµã¿ç«ã¦ãã®ãããã↩