ã¯ããã«
ç¾ä»£ã®CPUã§ã¯SIMD(Single Instruction Multiple Data)å½ä»¤ãå©ç¨ãããã¨ãã§ããï¼ SIMDå½ä»¤ã¨ã¯ãã®åã®éãï¼ã²ã¨ã¤ã®å½ä»¤ã§è¤æ°ã®ãã¼ã¿ãå¦çãããã®ã§ããï¼
Intelç³»ã®CPUã§ã¯ï¼MMX/SSE/AVX/AVX-512ã¨ãã£ãSIMDå½ä»¤ãå©ç¨å¯è½ã§ããï¼ARM CPUã§ã¯NEONã¨ããSIMDå½ä»¤ãç¨æããã¦ããï¼ åSIMDã¨SIMDç¨ã®ã¬ã¸ã¹ã¿ã®å¯¾å¿é¢ä¿ã¯ä»¥ä¸ã®ããã«ãªãï¼
é ç® | å©ç¨å¯è½ã¬ã¸ã¹ã¿ |
---|---|
MMX | 64bit ã®MMã¬ã¸ã¹ã¿ |
SSE | 128bit ã®XMMã¬ã¸ã¹ã¿ |
AVX | 256bit ã®YMMã¬ã¸ã·ã¿ |
AVX-512 | 512bit ã®ZMMã¬ã¸ã·ã¿ |
ARM NEON | 64bitã®D(Double-Word)ã¬ã¸ã¹ã¿ããã³128bitã®Q(Quad-Word)ã¬ã¸ã¹ã¿ |
ãããã®ã¬ã¸ã¹ã¿ãç¨ãã¦ï¼ä¾ãã°4ã¤ã®intåãä¸æ°ã«å¦çããã¨ãã£ããã¨ãè¡ãã®ãCPUã«ãããSIMDã§ããï¼
ãã®è¨äºã§ã¯ï¼ãã®SIMDå½ä»¤ãC/C++ããå©ç¨ãããã¨ã«ã¤ãã¦è¨è¿°ããï¼
2017/02/20 追è¨
以ä¸ã®è¨äºã«ï¼ãã詳細ãªå 容ãæ¸ããã®ã§ï¼åèã«ãªããããããªãï¼
2019/02/03 追è¨
å®è¡æã«SSE/AVXçã®x86/x64ã®å½ä»¤ãå©ç¨å¯è½ã§ããããcpuidãç¨ãã¦å¤æããæ¹æ³ã«ã¤ãã¦è¿½è¨ããï¼ ã¾ãï¼ãã®è¨äºä¸ã®ã¦ã¼ãã£ãªãã£é¢æ°ãã¾ã¨ããã·ã³ã°ã«ããããã¡ã¤ã«ãkoturn/SimdUtilã«ã¦å ¬éãã¦ããï¼
SIMDãããã°ã©ã å©ç¨ããã«ã¯
SIMDå½ä»¤ã¨ããã¨å°é£ãããã§ï¼ã¤ã³ã©ã¤ã³ã¢ã»ã³ãã©ãå©ç¨ããªããã°ãªããªããã¨ããã¨ï¼ããã§ã¯ãªãï¼ C/C++ããé¢æ°ã®å½¢ã§å©ç¨ã§ããããã«ï¼åã³ã³ãã¤ã©ã§å ±éã®APIã§ããçµã¿è¾¼ã¿é¢æ°ãæä¾ããã¦ããï¼ çµã¿è¾¼ã¿é¢æ°ã¨ã¯ããé¢æ°ãªã®ã§ï¼é¢æ°å¼ã³åºãã®å½¢ã§è¨è¿°ãããã¨ã«ãªããï¼å®éã«é¢æ°å¼ã³åºããçºçããããã§ã¯ãªãï¼ã¤ã³ã©ã¤ã³å±éããï¼å¯¾å¿ããã¢ã»ã³ãã©ã®å½ä»¤ã¸ã¨ã³ã¼ãçæãããï¼
ãªãï¼SIMDã¬ã¸ã¹ã¿ã«å¯¾ãã¦ï¼ã¡ã¢ãªã®ãã¼ããã¹ãã¢ãè¡ãå ´åï¼å¾è¿°ããããã«å©ç¨å¹ ã¨åãå¢çã«é ç½®ããã¦ããä½ç½®ã«å¯¾ãã¦è¡ãå¿ è¦ãããï¼ ç¹ã«ï¼MMX/SSE/AVX/AVX-512ã®å ´åï¼ã¢ã©ã¤ã³ã¡ã³ãæ¡ä»¶ãæºãããªããã°ï¼SEGVã§è½ã¡ãé¢æ°ãããï¼ è½ã¡ãªãçã®é¢æ°ããããï¼ãããã£ãé¢æ°ã¯è½ã¡ãé¢æ°ããåä½ã¨ãã¦ã¯é ãï¼
ARM NEONã¯è½ã¡ãé¢æ°ã¯ç¡ããï¼ã¢ã©ã¤ã³ã¡ã³ãæ¡ä»¶ãæºããã¦ãããæ¹ãé«éã«åä½ããã¨æãããï¼
ã¤ã³ã¯ã«ã¼ã
ä½ã¯ã¨ãããï¼ã¾ãçµã¿è¾¼ã¿é¢æ°ã宣è¨ããã¦ããããããã¤ã³ã¯ã«ã¼ãããªããã°å§ã¾ããªãï¼ åSIMDå½ä»¤ã»ããã¨ãããã®å¯¾å¿é¢ä¿ã¯ä»¥ä¸ã®ããã«ãªãï¼
å½ä»¤ã»ãã | ããããã¡ã¤ã« |
---|---|
MMX | <mmintrin.h> |
SSE | <xmmintrin.h> |
SSE2 | <emmintrin.h> |
SSE3 | <pmmintrin.h> |
SSSE3 | <tmmintrin.h> |
SSE4.1 | <smmintrin.h> |
SSE4.2 | <nmmintrin.h> |
AES | <wmmintrin.h> |
AVX, AVX2, FMA | <immintrin.h> |
AVX-512 | <zmmintrin.h> |
ARM NEON | <arm_neon.h> |
MMX/SSE/AVX/AVX-512é¢é£ã®ãããã¯å¤ãï¼ãããããã¡ãã¡ã¤ã³ã¯ã«ã¼ãããã®ã¯é¢åã§ããï¼ ç¾å®çã«ã¯ã¾ã¨ãã¦ã¤ã³ã¯ã«ã¼ããããã¨ãå¯è½ãªããããå©ç¨ããã®ãããï¼ ãã ãï¼MSVCã¨gcc/clangã§ããããç°ãªãããï¼æ³¨æããªããã°ãªããªãï¼
ç°å¢ | ããããã¡ã¤ã« |
---|---|
MSVC | <intrin.h> |
gcc/clang | <x86intrin.h> |
å ·ä½çãªã¤ã³ã¯ã«ã¼ãé¨åã®ã³ã¼ããæ¸ãã¨ä»¥ä¸ã®ããã«ãªãï¼
#ifdef _MSC_VER # include <intrin.h> #else # include <x86intrin.h> #endif
ãªãï¼gcc/clangã§ãï¼x64ç°å¢ãªãã° <intrin.h>
ãåå¨ãããï¼x86ç°å¢ã§ãå©ç¨å¯è½ãªæ¹ã«åããã¦ããæ¹ãä½ãã¨é½åãè¯ãã ããï¼
ã³ã³ãã¤ã«ãªãã·ã§ã³
å®ã¯ããããã¤ã³ã¯ã«ã¼ãããã ãã§ã¯SIMDã®çµã¿è¾¼ã¿é¢æ°ã¯å©ç¨ã§ããªãï¼
以ä¸ã®ããã«ã³ã³ãã¤ã«ãªãã·ã§ã³ãæå®ããå¿
è¦ãããï¼
gccã§ã¯ããããã¤ã³ã¯ã«ã¼ãããã ãã§ã¯SIMDã®çµã¿è¾¼ã¿é¢æ°ã¯å©ç¨ã§ããªãããï¼ä»¥ä¸ã®ããã«ã³ã³ãã¤ã«ãªãã·ã§ã³ãæå®ããå¿ è¦ãããï¼ ä¸æ¹ï¼MSVCã¯ãªãã·ã§ã³æå®ãããªãã¦ãSIMDã®çµã¿è¾¼ã¿é¢æ°ãå©ç¨ã§ããï¼
ãªãï¼å
¨ã¦ã®x64ããã»ããµã§ã¯SSE2ã¾ã§ã¯å©ç¨ã§ããããï¼gccã§ãã£ã¦ãx64ãã¤ããªãçæããã®ã§ããã°ï¼ -msse2
ã¨ãã£ããªãã·ã§ã³ã®æå®ç¡ãã«SSE2ã¾ã§ã®çµã¿è¾¼ã¿é¢æ°ãå©ç¨ã§ããããã ï¼
gccã®å ´åï¼ã³ã³ãã¤ã©ã®èªåãã¯ãã«åã§ã©ã®å½ä»¤ãå©ç¨ãããã®è¨±å¯ã¨å©ç¨å¯è½ãªçµã¿è¾¼ã¿é¢æ°ã®è¨±å¯ãå ¼ãã¦ããã®ã«å¯¾ãï¼MSVCã¯èªåãã¯ãã«åã§ã©ã®å½ä»¤ãå©ç¨ãããã®è¨±å¯ã®ã¿ã§ããï¼ x86/x64ã«ããã¦ã¯ï¼å¾è¿°ããcpuidã«ããå®è¡æã®å©ç¨å¯è½ãªSIMDå½ä»¤ã®å¤å®ãå¯è½ãªããï¼MSVCã®æ¹ãèéãå©ãããã«æãããï¼
å½ä»¤ã»ãã | gccã®ãªãã·ã§ã³ | MSVCã®ãªãã·ã§ã³ | å®ç¾©ããããã¯ã |
---|---|---|---|
MMX | -mmmx |
/arch:MMX |
__MMX__ |
SSE | -msse |
/arch:SSE |
__SSE__ |
SSE2 | -msse2 |
/arch:SSE2 |
__SSE2__ |
SSE3 | -msse3 |
__SSE3__ |
|
SSSE3 | -mssse3 |
__SSSE3__ |
|
SSE4.1 | -msse4.1 |
__SSE4_1__ |
|
SSE4.2 | -msse4.2 |
__SSE4_2__ |
|
AES | -maes |
__AES__ |
|
AVX | -mavx |
/arch:AVX |
__AVX__ |
AVX2 | -mavx2 |
/arch:AVX2 |
__AVX2__ |
FMA | -mfma |
__FMA__ |
|
AVX-512 | -mavx512* ( * 㯠bw , cq , ed ãªã©) |
__AVX512*__ |
|
ARM NEON | -mfpu=neon ãªã© |
__ARM_NEON ã¾ã㯠__ARM_NEON__ |
MMX/SSE/AVX/AVX-512é¢é£ã®ãªãã·ã§ã³ã¯ï¼ -march=native
ã -mtune=native
ãªã©ãæå®ãããã¨ã§ï¼ä¸æ¬ã§ä¸è¨ã®ãªãã·ã§ã³ã®ãã¡ï¼å©ç¨å¯è½ãªãã®ãæå®ã§ããï¼
ARM CPUç°å¢ã®gccã§ã¯ï¼ -march=native
ã -mtune=native
ã¨æå®ãããã¨ãã§ããªãå ´åãããï¼ãã®ã¨ãã¯å©ç¨ãã¦ããARM CPUã«åããã¦ï¼ -fpu=neon-fp-armv8
ãªã©ã¨æå®ããå¿
è¦ãããï¼ããã¯Raspberry Pi 3ã®ä¾ï¼ï¼
ä¸è¨ã®è¡¨ã§ã¯ç°¡ç¥ã«ç´¹ä»ãããï¼gccã®AVX-512ã«é¢ãããªãã·ã§ã³ã¯ä»¥ä¸ã®ããã«å¤æ°ããï¼
-mavx512f
-mavx512er
-mavx512cd
-mavx512pf
-mavx512dq
-mavx512bw
-mavx512vl
-mavx512ifma
-mavx512vbmi
ãªãï¼ç¾å¨ã®ã¨ããAVX-512ãå©ç¨ã§ããCPUã¯éããã¦ããï¼
-march=native
ãæå®ããã¨ãã¦ãï¼AVX-512ãæå¹ã«ãªããªãå ´åã®æ¹ãå¤ãã®ã§ï¼ä¸è¨ã®ãªãã·ã§ã³ãå¥éæå®ããã¨ï¼ã³ã³ãã¤ã«ã ãã¯éãã ããï¼
ãããï¼é対å¿ã®CPUã§AVX-512å½ä»¤ãå®è¡ããã¨ã¦ãï¼ä»¥ä¸ã®ãããªã¨ã©ã¼ã¡ãã»ã¼ã¸ãåºåãããã ããï¼
ï¼ããã¯MSYS2ã§zshä¸ã§å®è¡ããçµæã§ããï¼
$ ./main.exe zsh: illegal hardware instruction ./main.exe
AVX-512ã®åä½ã確èªããã ããªãã°ï¼Intelå ¬å¼ã®ã¨ãã¥ã¬ã¼ã¿ãå©ç¨ããã¨ããï¼ äºãï¼AVX-512å½ä»¤ãå«ã¾ããå®è¡ãã¤ããªãçæãï¼ä»¥ä¸ã®ããã«å®è¡ããï¼
$ sde -- ./main.exe
å¤æ°ã®ã¢ã©ã¤ã³ã¡ã³ããæå®ãã
C++11ï¼C11ããè¨èªã®æ¨æºæ©è½ã¨ãã¦ï¼å¤æ°ã®ã¢ã©ã¤ã³ã¡ã³ããæå®ãããã¨ãã§ããããã«ãªã£ããï¼ãã以åã¯å¤æ°ã®ã¢ã©ã¤ã³ã¡ã³ãã¯ã³ã³ãã¤ã©ç¬èªã®æ©è½ãå©ç¨ããªããã°ï¼æå®ãããã¨ãã§ããªãï¼ å¤ãã³ã³ãã¤ã©ã§ã³ã³ãã¤ã«ãããã¨ãèæ ®ããã¨ï¼ä»¥ä¸ã®ããã«å·®ãå¸åãããã¯ããå®ç¾©ããã¨ããï¼
#include <cstddef> #include <iostream> #if defined(__cplusplus) && __cplusplus < 201103L # ifdef _MSC_VER # define alignas(n) __declspec(align(n)) # else # define alignas(n) __attribute__((aligned(n))) # endif // _MSC_VER #endif // defined(__cplusplus) && __cplusplus < 201103L // 以ä¸ï¼å©ç¨ã³ã¼ã int main() { static const int ALIGN = 32; alignas(ALIGN) unsigned char array[10] = {0}; if ((reinterpret_cast<std::ptrdiff_t>(array)) % ALIGN == 0) { std::cout << "Static array is " << ALIGN << " byte aligned.\n"; } else { std::cout << "Static array is not " << ALIGN << " byte aligned.\n"; } return 0; }
ä¸è¨ã¯C++ç¨ã ãï¼Cè¨èªãªã以ä¸ã®ããã«å®ç¾©ããã¨ããï¼
#include <stddef.h> #include <stdio.h> #if defined(__STDC_VERSION__) && __STDC_VERSION__ < 201102L # ifdef _MSC_VER # define _Alignas(n) __declspec(align(n)) # else # define _Alignas(n) __attribute__((aligned(n))) # endif // _MSC_VER #endif // defined(__cplusplus) && __cplusplus < 201103L /* 以ä¸ï¼å©ç¨ã³ã¼ã */ #define ALIGN 32 int main(void) { _Alignas(ALIGN) unsigned char array[10] = {0}; if ((ptrdiff_t) array % ALIGN == 0) { printf("Static array is %d byte aligned.\n", ALIGN); } else { printf("Static array is not %d byte aligned.\n", ALIGN); } return 0; }
ã¢ã©ã¤ã³ãããã¡ã¢ãªãåç確ä¿ãã
é常ã®C/C++ã«ããã std::malloc()
ã std::calloc()
ï¼ new
çã§ã¯16byteã32byteå¢çã«ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ãããã¨ã¯ã§ããªãï¼
以ä¸ã«ç¤ºãå°ç¨ã®ã¡ã¢ãªç¢ºä¿é¢æ°ãå¿
è¦ã¨ãªãï¼
ï¼Cè¨èªã®å ´åï¼ <cstdlib>
㯠<stdlib.h>
ã«èªã¿æãããã¨ï¼
ã¡ã¢ãªç¢ºä¿é¢æ° | ã¡ã¢ãªè§£æ¾é¢æ° | ããã | ç¹å¾´ |
---|---|---|---|
_aligned_malloc() |
_aligned_free() |
<malloc.h> |
MSVCã®ã¿ï¼ |
posix_memalign() |
std::free() |
<cstdlib> |
gcc/clangã®ã¿ï¼ |
aligned_alloc() |
std::free() |
<cstdlib> |
gcc/clangã®ã¿ï¼ç¢ºä¿ãµã¤ãºã¯ã¢ã©ã¤ã³ã¡ã³ãã®æ´æ°åã«éãï¼C11/C++17ã®æ¨æºã©ã¤ãã©ãªé¢æ° |
memalign() |
std::free() |
<malloc.h> |
gcc/clangã®ã¿ï¼å»æ¢ããã¦ããã¨ã®ãã¨ï¼ |
_mm_malloc() |
_mm_free() |
<malloc.h> |
Intel CPUã®ã¿ï¼ |
種ã ã®ã¢ã©ã¤ã³ãããã¡ã¢ãªç¢ºä¿é¢æ°ãããï¼ã©ããå©ç¨ããã°ãããå¤æã«å°ããããããªãï¼ ãããï¼ããã¾ãã«ã¯ï¼ä»¥ä¸ã®ããã«å©ç¨ããé¢æ°ãå¤æããã°ããï¼
- MSVCãªã
_aligned_malloc()
ã¨_aligned_free()
- gcc/clangãªã
posix_memalign()
ã¨std::free()
ãããèæ ®ãï¼æ¡ä»¶ã³ã³ãã¤ã«ã§å©ç¨ããé¢æ°ãåå²ããã©ããã¼é¢æ°ãä½ãã¨ããï¼ ç°¡åãªã³ã¼ãã¯ä»¥ä¸ã®ããã«ãªãï¼
ãªãï¼C++11以éï¼ std::align()
ã std::aligned_storage()
ã¨ãã£ãé¢æ°ãå©ç¨ã§ãããï¼ std::align()
ã¯æ¢ã«ç¢ºä¿ããããããã¡ã®æå®ãããã¢ãã¬ã¹ãããã¤ã³ã¿ãé²ãï¼ã¢ã©ã¤ã³ã¡ã³ãæ¡ä»¶ãæºããä½ç½®ã®ã¢ãã¬ã¹ãè¿å´ããã ãã®é¢æ°ã§ããï¼ std::aligned_storage()
ã¯ã¢ã©ã¤ã³ãããéçé
åãä½æããããã®é¢æ°ãªã®ã§ï¼ãã使ãåæãæªãã¨ãããï¼
// <type_traits> ã¯C++11以éã®ãã®ãªã®ã§ï¼ãã以åã§ã³ã³ãã¤ã«ãããå ´åã¯é¢é£é¨åãåé¤ããã㨠#include <cstddef> #include <iostream> #include <memory> #include <type_traits> #if defined(_MSC_VER) || defined(__MINGW32__) # include <malloc.h> #else # include <cstdlib> #endif // defined(_MSC_VER) || defined(__MINGW32__) /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ° * @tparam T 確ä¿ããã¡ã¢ãªã®è¦ç´ åï¼ãã®é¢æ°ã®è¿å´å¤ã¯T* * @param [in] nBytes 確ä¿ããã¡ã¢ãªãµã¤ãº (åä½ã¯byte) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ template<typename T = void> static inline T* alignedMalloc(std::size_t nBytes, std::size_t alignment = alignof(T)) noexcept { #if defined(_MSC_VER) || defined(__MINGW32__) return reinterpret_cast<T*>(::_aligned_malloc(nBytes, alignment)); #else void* p; return reinterpret_cast<T*>(::posix_memalign(&p, alignment, nBytes) == 0 ? p : nullptr); #endif // defined(_MSC_VER) || defined(__MINGW32__) } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ°ï¼é ååãã«alignedMallocã®å¼æ°æå®ãç°¡ç¥åããã¦ãã * @tparam T 確ä¿ããé åã®è¦ç´ åï¼ãã®é¢æ°ã®è¿å´å¤ã¯T* * @param [in] size 確ä¿ããè¦ç´ æ°ï¼ããªãã¡ç¢ºä¿ãããµã¤ãºã¯ size * sizeof(T) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ template<typename T> static inline T* alignedAllocArray(std::size_t size, std::size_t alignment = alignof(T)) noexcept { return alignedMalloc<T>(size * sizeof(T), alignment); } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªã解æ¾ããé¢æ° * @param [in] ptr 解æ¾å¯¾è±¡ã®ã¡ã¢ãªã®å é çªå°ãæããã¤ã³ã¿ */ static inline void alignedFree(void* ptr) noexcept { #if defined(_MSC_VER) || defined(__MINGW32__) ::_aligned_free(ptr); #else std::free(ptr); #endif // defined(_MSC_VER) || defined(__MINGW32__) } // 以ä¸ï¼å©ç¨ã³ã¼ã /*! * @brief std::unique_ptr ã§å©ç¨ããã¢ã©ã¤ã³ãããã¡ã¢ãªç¨ã®ã«ã¹ã¿ã ããªã¼ã¿ */ struct AlignedDeleter { void operator()(void* p) const noexcept { alignedFree(p); } }; int main() { static constexpr int ALIGN = 32; std::unique_ptr<unsigned char[], AlignedDeleter> array(alignedAllocArray<unsigned char>(10, ALIGN)); if (array.get() == nullptr) { std::cerr << "Failed to allocate memory" << std::endl; return 1; } if ((reinterpret_cast<std::ptrdiff_t>(array.get())) % ALIGN == 0) { std::cout << "Dynamic allocated memory is " << ALIGN << " byte aligned.\n"; } else { std::cout << "Dynamic allocated memory is not " << ALIGN << " byte aligned.\n"; } return 0; }
ãã®ã³ã¼ãã¯C++11ã®ç¯çã®ãã®ã§ãããï¼Cè¨èªã®ç¯å²ã§æ¸ãç´ãã¨ä»¥ä¸ã®ããã«ãªãï¼
C99以é㯠inline
ãå©ç¨å¯è½ã§ãããï¼å¤ãã³ã³ãã¤ã©ã使ç¨ãããã¨ãèæ
®ãï¼ç½®ãæãããã¯ããè¨è¿°ããï¼
#include <stdio.h> #include <stddef.h> #if defined(_MSC_VER) || defined(__MINGW32__) # include <malloc.h> #else # include <stdlib.h> #endif /* defined(_MSC_VER) || defined(__MINGW32__) */ #ifndef __cplusplus # if defined(_MSC_VER) # define inline __inline # define __inline__ __inline # elif !defined(__GNUC__) && !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L # define inline # define __inline # endif #endif /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ° * @param [in] size 確ä¿ããã¡ã¢ãªãµã¤ãº (åä½ã¯byte) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ static inline void* alignedMalloc(size_t size, size_t alignment) { #if defined(_MSC_VER) || defined(__MINGW32__) return _aligned_malloc(size, alignment); #else void* p; return posix_memalign((void**) &p, alignment, size) == 0 ? p : NULL; #endif /* _MSC_VER */ } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªã解æ¾ããé¢æ° * @param [in] ptr 解æ¾å¯¾è±¡ã®ã¡ã¢ãªã®å é çªå°ãæããã¤ã³ã¿ */ static inline void alignedFree(void* ptr) { #if defined(_MSC_VER) || defined(__MINGW32__) _aligned_free(ptr); #else free(ptr); #endif /* _MSC_VER */ } /* 以ä¸ï¼å©ç¨ã³ã¼ã */ int main(void) { static const int ALIGN = 32; unsigned char* array = (unsigned char*) alignedMalloc(10 * sizeof(unsigned char), ALIGN); if (array == NULL) { fprintf(stderr, "Failed to allocate memory\n"); return 1; } if (((ptrdiff_t) array) % ALIGN == 0) { printf("Dynamic allocated memory is %d byte aligned.\n", ALIGN); } else { printf("Dynamic allocated memory is not %d byte aligned.\n", ALIGN); } alignedFree(array); return 0; }
ãªãï¼ä»åã¯é©å½ãªã¢ã©ã¤ã³ã¡ã³ããæå®ãããï¼å®éã«SSE/AVX/AVX-512/NEONãç¨ããã¨ãã¯ï¼SSE/AVX/AVX-512/NEONã®å¤æ°åããã¢ã©ã¤ã³ã¡ã³ããåå¾ããã¨ããï¼
åãå¤æ°ããã¢ã©ã¤ã³ã¡ã³ããå¾ãæ©è½ã¯C++11ããã³C11以éã§ããã°ï¼ alignof
æ¼ç®åã§åå¾ã§ãï¼ãã以åã®ç°å¢ã§ããã°ï¼ã³ã³ãã¤ã©ã®æ¡å¼µæ©è½ãç¨ãããã¨ã§åå¾ã§ããï¼
ãã®å·®ãå¸åãããªãï¼ä»¥ä¸ã®ãããªãã¯ããå®ç¾©ããã¨ããï¼
#if defined(__cplusplus) && __cplusplus < 201103L # ifdef _MSC_VER # define alignof(n) __alignof(n) # else # define alignof(n) __alignof__(n) # endif // _MSC_VER #endif // defined(__cplusplus) || cplusplus < 201103L // alingas(alignof(__m256i)) ã®ãããªå½¢ã§ä½¿ç¨
SSE/AVX/NEON ã®ãµã³ãã«ã³ã¼ã
ç°¡åãªãµã³ãã«ã³ã¼ããSSE/AVX/NEONã®ä¾ã¨ãã¦æ示ããï¼ ãã®ã³ã¼ãã¯MSVC/gc/clangã®ãããã®ã³ã³ãã¤ã©ã§ãã³ã³ãã¤ã«ãããã¨ãã§ããããã«ãã¦ããï¼
AVX-512ã«ã¤ãã¦ã¯ï¼å©ç¨å¯è½ãªCPUãæè¼ãããã·ã³ãæå ã«ç¡ãããå²æãããï¼AVXã¨åæ§ã®ã³ã¼ãã§è¨è¿°ã§ããã¨æãï¼ ã³ã³ãã¤ã«æã«ä»¥ä¸ã®ãã¯ããå®ç¾©ããã¨ï¼å¯¾å¿ããå½ä»¤ãç¨ããã³ã¼ããæå¹åãããï¼ æå¹åãããã¨ãã¦ãï¼ã³ã³ãã¤ã©ã対å¿ãã¦ããªãå ´åã¯ï¼åé ã®é¨åã§ã¨ã©ã¼ãçºçããã¯ãã ï¼ ã¾ãï¼ä»¥ä¸ã®ãããã®ãã¯ããå®ç¾©ããªãã£ãå ´åï¼SIMDãç¨ããªãã³ã¼ãã¨ãªãï¼
ãã¯ã | æå¹åãããSIMD |
---|---|
ENABLE_AVX |
AVX |
ENABLE_SSE |
SSE |
ENABLE_NEON |
ARM NEON |
ç´æ¥çã« __AVX__
ã __SSE2__
çã®ãã¯ããå®ç¾©ããã¦ãããã©ããã§å¤æããªãã®ã¯ï¼ã³ã³ãã¤ã©ãAPIã¨ãã¦æä¾ãã¦ããã¨ãã¦ãï¼CPUã対å¿ãã¦ãããï¼SIMDå½ä»¤ãå©ç¨ã§ããªãå ´åãããããã ï¼
ã¾ãï¼AVXãSSEã®åãæ¿ãã容æã«ãªãï¼ãã³ããã¼ã¯ãã¹ããããããã¨ããå©ç¹ãããã ããï¼
ãã¦ï¼å ·ä½çã«ã¯ä»¥ä¸ã®ããã«ãªãã·ã§ã³ãæå®ãã¦ã³ã³ãã¤ã«ããã¨ããï¼ gccã®å ´åã¯ï¼
æå¹åããæ©è½ | ã³ãã³ã |
---|---|
AVX-512 | $ g++ -std=gnu++11 -march=native -mavx512f -DENABLE_AVX main.cpp -o main.o |
AVX | $ g++ -std=gnu++11 -march=native -DENABLE_AVX main.cpp -o main.o |
SSE | $ g++ -std=gnu++11 -march=native -DENABLE_SSE main.cpp -o main.o |
ARM NEON | $ g++ -std=gnu++11 -mfpu=neon-fp-armv8 -DENABLE_NEON main.cpp -o main.o |
SIMDãå©ç¨ããªã | $ g++ -std=gnu++11 main.cpp -o main.o |
ã§ããï¼MSVCã®å ´åã¯ï¼
æå¹åããæ©è½ | ã³ãã³ã |
---|---|
AVX | > cl.exe /arch:AVX /DENABLE_AVX main.cpp |
SSE | > cl.exe /arch:SSE2 /DENABLE_SSE main.cpp |
SIMDãå©ç¨ããªã | > cl.exe main.cpp |
ã¨ãã£ãå ·åã§ããï¼
ãã¯ãã«ã®å ç©è¨ç®
å®çªã®ãã¯ãã«ã®å
ç©ãè¨ç®ããã³ã¼ãã示ãï¼
FMAï¼ç©åæ¼ç®ï¼ãå©ç¨å¯è½ãªå ´åã¯ï¼ãã¡ããç¨ãã¦ï¼é«éã«å¦çã§ããããã«ãã¦ããï¼
ã¾ãï¼SIMDãç¨ããªãå ´åã§ãã£ã¦ãï¼C++11/C11以é㧠<cmath>
ããæä¾ããã¦ãã std::fma()
ãç¨ãããã¨ã§ï¼å
ç©è¨ç®ã®é«éåãæå¾
ã§ããããã«ããï¼
#if defined(ENABLE_AVX512) && !defined(__AVX512F__) # error Macro: ENABLE_AVX512 is defined, but unable to use AVX512F intrinsic functions #elif defined(ENABLE_AVX) && !defined(__AVX__) # error Macro: ENABLE_AVX is defined, but unable to use AVX intrinsic functions #elif defined(ENABLE_SSE) && !defined(__SSE2__) # error Macro: ENABLE_SSE is defined, but unable to use SSE intrinsic functions #elif defined(ENABLE_NEON) && !defined(__ARM_NEON) && !defined(__ARM_NEON__) # error Macro: ENABLE_NEON is defined, but unable to use NEON intrinsic functions #else #include <cmath> #include <cstddef> #include <algorithm> #include <iostream> #include <memory> #include <type_traits> #if defined(_MSC_VER) || defined(__MINGW32__) # include <malloc.h> #else # include <cstdlib> #endif #if defined(ENABLE_AVX512) || defined(ENABLE_AVX) || defined(ENABLE_SSE) # ifdef _MSC_VER # include <intrin.h> # else # include <x86intrin.h> # endif // _MSC_VER #elif defined(ENABLE_NEON) # include <arm_neon.h> #endif // defined(ENABLE_AVX512) || defined(ENABLE_AVX) || defined(ENABLE_SSE) /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ° * @tparam T 確ä¿ããã¡ã¢ãªã®è¦ç´ åï¼ãã®é¢æ°ã®è¿å´å¤ã¯T* * @param [in] nBytes 確ä¿ããã¡ã¢ãªãµã¤ãº (åä½ã¯byte) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ template<typename T = void> static inline T* alignedMalloc(std::size_t nBytes, std::size_t alignment = alignof(T)) noexcept { #if defined(_MSC_VER) || defined(__MINGW32__) return reinterpret_cast<T*>(::_aligned_malloc(nBytes, alignment)); #else void* p; return reinterpret_cast<T*>(::posix_memalign(&p, alignment, nBytes) == 0 ? p : nullptr); #endif // defined(_MSC_VER) || defined(__MINGW32__) } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ°ï¼é ååãã«alignedMallocã®å¼æ°æå®ãç°¡ç¥åããã¦ãã * @tparam T 確ä¿ããé åã®è¦ç´ åï¼ãã®é¢æ°ã®è¿å´å¤ã¯T* * @param [in] size 確ä¿ããè¦ç´ æ°ï¼ããªãã¡ç¢ºä¿ãããµã¤ãºã¯ size * sizeof(T) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ template<typename T> static inline T* alignedAllocArray(std::size_t size, std::size_t alignment = alignof(T)) noexcept { return alignedMalloc<T>(size * sizeof(T), alignment); } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªã解æ¾ããé¢æ° * @param [in] ptr 解æ¾å¯¾è±¡ã®ã¡ã¢ãªã®å é çªå°ãæããã¤ã³ã¿ */ static inline void alignedFree(void* ptr) noexcept { #if defined(_MSC_VER) || defined(__MINGW32__) ::_aligned_free(ptr); #else std::free(ptr); #endif // defined(_MSC_VER) || defined(__MINGW32__) } /*! * @brief std::unique_ptr ã§å©ç¨ããã¢ã©ã¤ã³ãããã¡ã¢ãªç¨ã®ã«ã¹ã¿ã ããªã¼ã¿ */ struct AlignedDeleter { void operator()(void* p) const noexcept { alignedFree(p); } }; #if defined(ENABLE_AVX512) static constexpr int ALIGN = alignof(__m512); #elif defined(ENABLE_AVX) static constexpr int ALIGN = alignof(__m256); #elif defined(ENABLE_SSE) static constexpr int ALIGN = alignof(__m128); #elif defined(ENABLE_NEON) static constexpr int ALIGN = alignof(float32x4_t); #else static constexpr int ALIGN = 8; #endif // defined(ENABLE_AVX512) /*! * @brief å ç©è¨ç®ãè¡ãé¢æ° * @param [in] a ãã¯ãã«ãã®1 * @param [in] b ãã¯ãã«ãã®2 * @param [in] n ãã¯ãã«ã®ãµã¤ãº * @return å ç© */ static inline float innerProduct(const float* a, const float* b, std::size_t n) { #if defined(ENABLE_AVX512) static constexpr std::size_t INTERVAL = sizeof(__m512) / sizeof(float); __m512 sumx16 = {0}; for (std::size_t i = 0; i < n; i += INTERVAL) { __m512 ax16 = _mm512_load_ps(&a[i]); __m512 bx16 = _mm512_load_ps(&b[i]); # ifdef __FMA__ sumx16 = _mm512_fmadd_ps(ax16, bx16, sumx16); # else sumx16 = _mm512_add_ps(sumx16, _mm512_mul_ps(ax16, bx16)); # endif // __FMA__ } alignas(ALIGN) float s[INTERVAL] = {0}; _mm512_store_ps(s, sumx16); std::size_t offset = n - n % INTERVAL; return std::inner_product( a + offset, a + n, b + offset, std::accumulate(std::begin(s), std::end(s), 0.0f)); #elif defined(ENABLE_AVX) static constexpr std::size_t INTERVAL = sizeof(__m256) / sizeof(float); __m256 sumx8 = {0}; for (std::size_t i = 0; i < n; i += INTERVAL) { __m256 ax8 = _mm256_load_ps(&a[i]); __m256 bx8 = _mm256_load_ps(&b[i]); # ifdef __FMA__ sumx8 = _mm256_fmadd_ps(ax8, bx8, sumx8); # else sumx8 = _mm256_add_ps(sumx8, _mm256_mul_ps(ax8, bx8)); # endif // __FMA__ } alignas(ALIGN) float s[INTERVAL] = {0}; _mm256_store_ps(s, sumx8); std::size_t offset = n - n % INTERVAL; return std::inner_product( a + offset, a + n, b + offset, std::accumulate(std::begin(s), std::end(s), 0.0f)); #elif defined(ENABLE_SSE) static constexpr std::size_t INTERVAL = sizeof(__m128) / sizeof(float); __m128 sumx4 = {0}; for (std::size_t i = 0; i < n; i += INTERVAL) { __m128 ax4 = _mm_load_ps(&a[i]); __m128 bx4 = _mm_load_ps(&b[i]); # ifdef __FMA__ sumx4 = _mm_fmadd_ps(ax4, bx4, sumx4); # else sumx4 = _mm_add_ps(sumx4, _mm_mul_ps(ax4, bx4)); # endif // __FMA__ } alignas(ALIGN) float s[INTERVAL] = {0}; _mm_store_ps(s, sumx4); float sum = std::accumulate(std::begin(s), std::end(s), 0.0f); std::size_t offset = n - n % INTERVAL; return std::inner_product( a + offset, a + n, b + offset, std::accumulate(std::begin(s), std::end(s), 0.0f)); #elif defined(ENABLE_NEON) static constexpr std::size_t INTERVAL = sizeof(float32x4_t) / sizeof(float); float32x4_t sumx4 = {0}; for (std::size_t i = 0; i < n; i += INTERVAL) { float32x4_t ax4 = vld1q_f32(&a[i]); float32x4_t bx4 = vld1q_f32(&b[i]); sumx4 = vmlaq_f32(sumx4, ax4, bx4); } std::size_t offset = n - n % INTERVAL; return std::inner_product( a + offset, a + n, b + offset, std::accumulate(std::begin(s), std::end(s), 0.0f)); #else float sum = 0.0f; for (std::size_t i = 0; i < n; i++) { // <cmath>ã®std::fmaé¢æ°ãç¨ããã¨ï¼ç©åæ¼ç®ããã¼ãã¦ã§ã¢ã®ãµãã¼ããåãããã¨ãæå¾ ã§ãã // å¦çã¨ãã¦ã¯ï¼ sum += a[i] * b[i]; ã¨åã sum = std::fma(a[i], b[i], sum); } return sum; #endif // defined(ENABLE_AVX512) } int main() { static constexpr int N_ELEMENT = 256; std::unique_ptr<float[], AlignedDeleter> a(alignedAllocArray<float>(N_ELEMENT, ALIGN)); std::unique_ptr<float[], AlignedDeleter> b(alignedAllocArray<float>(N_ELEMENT, ALIGN)); for (int i = 0; i < N_ELEMENT; i++) { a[i] = static_cast<float>(i); b[i] = static_cast<float>(i); } std::cout << innerProduct(a.get(), b.get(), N_ELEMENT) << std::endl; return 0; } #endif // defined(ENABLE_AVX512) && !defined(__AVX512F__)
æè¿åæ³ã«ããç»åã®2åæ¡å¤§
æè¿åæ³ï¼ããªãã¡åç´ãªãã¯ã»ã«ã³ãã¼ã®ã¿ãè¡ã£ã¦ï¼8bitã°ã¬ã¼ã¹ã±ã¼ã«ç»åã2åã«æ¡å¤§ããã³ã¼ããè¨è¿°ããï¼ 2åæ¡å¤§ã¨ããæ¡ä»¶ã«éå®ããã°ï¼åºåå ç»åã®ã¤ã³ããã¯ã¹å¤ã®ã¨ãå¤ãåç´ã«ãªãã®ã§ï¼SIMDã§ç°¡åã«å¦çãè¨è¿°ã§ããï¼
èªã¿è¾¼ãç»åãã¡ã¤ã«å㯠test.jpg
ã¨ãï¼èªã¿è¾¼ã¿ã«OpenCVãç¨ããï¼
ç»åãã¡ã¤ã«ã®æ¨ªå¹
ã¯ï¼16ã¾ãã¯32ã®åæ°ã§ãªããã°ãªããªãï¼
ã³ã³ãã¤ã«ã¯ä»¥ä¸ã®ããã«ããã¨ããï¼
$ g++ -std=gnu++11 main.cpp -march=native -DENABLE_AVX -I/usr/include/opencv -I/usr/include/opencv2 -lopencv_core -lopencv_highgui -lopencv_imgcodecs -o main.o
AVX-512ãå©ç¨ããå ´åã¯ï¼ -mavx512vbmi -DENABLE_AVX512
ãä»å ããã¨ããï¼
ãªãï¼OpenCVã® cv::Mat
ã«ã«ã¹ã¿ã ã¢ãã±ã¼ã¿ãé©ç¨ãããã¨ãã§ããããããï¼ã³ã¼ããç
©éã«ãªããããªã®ã§ï¼SSE/AVXã«ããã¦ã¯ã¢ã©ã¤ã³ã¡ã³ãæ¡ä»¶ãæºãããªãã¦ãããé¢æ°ãç¨ãã¦ããï¼
#if defined(ENABLE_AVX512) && !defined(__AVX512F__) # error Macro: ENABLE_AVX512 is defined, but unable to use AVX512F intrinsic functions #elif defined(ENABLE_AVX) && !defined(__AVX__) # error Macro: ENABLE_AVX is defined, but unable to use AVX intrinsic functions #elif defined(ENABLE_SSE) && !defined(__SSE2__) # error Macro: ENABLE_SSE is defined, but unable to use SSE intrinsic functions #elif defined(ENABLE_NEON) && !defined(__ARM_NEON) && !defined(__ARM_NEON__) # error Macro: ENABLE_NEON is defined, but unable to use NEON intrinsic functions #else // defined(ENABLE_AVX512) && !defined(__AVX512F__) #include <cmath> #include <cstddef> #include <iostream> #include <memory> #include <type_traits> #if defined(_MSC_VER) || defined(__MINGW32__) # include <malloc.h> #else # include <cstdlib> #endif #if defined(ENABLE_AVX512) || defined(ENABLE_AVX) || defined(ENABLE_SSE) # ifdef _MSC_VER # include <intrin.h> # else # include <x86intrin.h> # endif // _MSC_VER #elif defined(ENABLE_NEON) # include <arm_neon.h> #endif // defined(ENABLE_AVX512) || defined(ENABLE_AVX) || defined(ENABLE_SSE) #include <opencv2/opencv.hpp> #if defined(_MSC_VER) && _MSC_VER >= 1400 || \ defined(__GNUC__) && defined(__GNUC_MINOR__) && (__GNUC__ > 2 || __GNUC__ == 2 && __GNUC_MINOR__ >= 92) # define restrict __restrict #else # define restrict #endif /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ° * @tparam T 確ä¿ããã¡ã¢ãªã®è¦ç´ åï¼ãã®é¢æ°ã®è¿å´å¤ã¯T* * @param [in] nBytes 確ä¿ããã¡ã¢ãªãµã¤ãº (åä½ã¯byte) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ template<typename T = void> static inline T* alignedMalloc(std::size_t nBytes, std::size_t alignment = alignof(T)) noexcept { #if defined(_MSC_VER) || defined(__MINGW32__) return reinterpret_cast<T*>(::_aligned_malloc(nBytes, alignment)); #else void* p; return reinterpret_cast<T*>(::posix_memalign(&p, alignment, nBytes) == 0 ? p : nullptr); #endif // defined(_MSC_VER) || defined(__MINGW32__) } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªãåç確ä¿ããé¢æ°ï¼é ååãã«alignedMallocã®å¼æ°æå®ãç°¡ç¥åããã¦ãã * @tparam T 確ä¿ããé åã®è¦ç´ åï¼ãã®é¢æ°ã®è¿å´å¤ã¯T* * @param [in] size 確ä¿ããè¦ç´ æ°ï¼ããªãã¡ç¢ºä¿ãããµã¤ãºã¯ size * sizeof(T) * @param [in] alignment ã¢ã©ã¤ã³ã¡ã³ã (2ã®ã¹ãä¹ãæå®ãããã¨) * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ template<typename T> static inline T* alignedAllocArray(std::size_t size, std::size_t alignment = alignof(T)) noexcept { return alignedMalloc<T>(size * sizeof(T), alignment); } /*! * @brief ã¢ã©ã¤ã³ã¡ã³ããããã¡ã¢ãªã解æ¾ããé¢æ° * @param [in] ptr 解æ¾å¯¾è±¡ã®ã¡ã¢ãªã®å é çªå°ãæããã¤ã³ã¿ */ static inline void alignedFree(void* ptr) noexcept { #if defined(_MSC_VER) || defined(__MINGW32__) ::_aligned_free(ptr); #else std::free(ptr); #endif // defined(_MSC_VER) || defined(__MINGW32__) } #if defined(ENABLE_AVX512) static constexpr int ALIGN = alignof(__m512i); #elif defined(ENABLE_AVX) static constexpr int ALIGN = alignof(__m256i); #elif defined(ENABLE_SSE) static constexpr int ALIGN = alignof(__m128i); #elif defined(ENABLE_NEON) static constexpr int ALIGN = alignof(uint8x16_t); #else static constexpr int ALIGN = 8; #endif // defined(ENABLE_AVX512) /*! * @brief å ¥åç»åãã¼ã¿ãæè¿åæ³ã«ããï¼2åã®ãµã¤ãºã«æ¡å¤§ãã * @param [out] dstImageData åºåç»åãã¼ã¿é åã®å é ã¸ã®ãã¤ã³ã¿ * @param [in] dstWidth åºåç»åãã¼ã¿ã®æ¨ªå¹ * @param [in] dstHeight åºåç»åãã¼ã¿ã®ç¸¦å¹ * @param [in] srcImageData å ¥åç»åãã¼ã¿é åã®å é ã¸ã®ãã¤ã³ã¿ * @param [in] srcWidth å ¥åç»åãã¼ã¿ã®æ¨ªå¹ * @param [in] srcHeight å ¥åç»åãã¼ã¿ã®ç¸¦å¹ * @return ã¢ã©ã¤ã³ã¡ã³ããï¼åç確ä¿ãããã¡ã¢ãªé åã¸ã®ãã¤ã³ã¿ */ static inline void scale2x( unsigned char* restrict dstImageData, int dstWidth, int dstHeight, const unsigned char* restrict srcImageData, int srcWidth, int srcHeight) noexcept { static constexpr int X_RATIO = 2; static constexpr int Y_RATIO = 2; #if defined(ENABLE_AVX512) static constexpr int INTERVAL = sizeof(__m512i) / sizeof(unsigned char); static const __m512i LOWIDX = _mm512_setr_epi64( 0x4303420241014000, 0x4707460645054404, 0x4b0b4a0a49094808, 0x4f0f4e0e4d0d4c0c, 0x5313521251115010, 0x5717561655155414, 0x5b1b5a1a59195818, 0x5f1f5e1e5d1d5c1c); static const __m512i HIGHIDX = _mm512_setr_epi64( 0x6323622261216020, 0x6727662665256424, 0x6b2b6a2a69296828, 0x6f2f6e2e6d2d6c2c, 0x7333723271317030, 0x7737763675357434, 0x7b3b7a3a79397838, 0x7f3f7e3e7d3d7c3c); #elif defined(ENABLE_AVX) static constexpr int INTERVAL = sizeof(__m256i) / sizeof(unsigned char); #elif defined(ENABLE_SSE) static constexpr int INTERVAL = sizeof(__m128i) / sizeof(unsigned char); #elif defined(ENABLE_NEON) static constexpr int INTERVAL = sizeof(uint8x16_t) / sizeof(unsigned char); #else static constexpr int INTERVAL = sizeof(unsigned char); #endif // defined(ENABLE_AVX512) for (int i = 0; i < dstHeight; i++) { for (int j = 0; j < dstWidth; j += INTERVAL * X_RATIO) { #if defined(ENABLE_AVX512) // 64pixelåã®ç»ç´ ãã¼ã¿ããã¼ã __m512i v512 = _mm512_loadu_si512(reinterpret_cast<const __m512i*>(&srcImageData[i / Y_RATIO * srcWidth + j / X_RATIO])); // ã¤ã³ã¿ãªã¼ã __m512i v512l = _mm512_permutex2var_epi8(v512, LOWIDX, v512); __m512i v512u = _mm512_permutex2var_epi8(v512, HIGHIDX, v512); // 64pixel x 2ã®ãã¼ã¿ãæ¸ã込㿠_mm512_storeu_si512(reinterpret_cast<__m512i*>(&dstImageData[i * dstWidth + j + sizeof(__m512i) * 0]), v512l); _mm512_storeu_si512(reinterpret_cast<__m512i*>(&dstImageData[i * dstWidth + j + sizeof(__m512i) * 1]), v512u); #elif defined(ENABLE_AVX) // 32pixelåã®ç»ç´ ãã¼ã¿ããã¼ã __m256i v256 = _mm256_loadu_si256(reinterpret_cast<const __m256i*>(&srcImageData[i / Y_RATIO * srcWidth + j / X_RATIO])); // ã¤ã³ã¿ãªã¼ã __m256i v256l_ = _mm256_unpacklo_epi8(v256, v256); __m256i v256u_ = _mm256_unpackhi_epi8(v256, v256); // ä¸ä¸128bit交æ __m256i v256l = _mm256_permute2f128_si256(v256l_, v256u_, 0x20); __m256i v256u = _mm256_permute2f128_si256(v256l_, v256u_, 0x31); // 32pixel x 2ã®ãã¼ã¿ãæ¸ã込㿠_mm256_storeu_si256(reinterpret_cast<__m256i*>(&dstImageData[i * dstWidth + j + sizeof(__m256i) * 0]), v256l); _mm256_storeu_si256(reinterpret_cast<__m256i*>(&dstImageData[i * dstWidth + j + sizeof(__m256i) * 1]), v256u); #elif defined(ENABLE_SSE) // 16pixelåã®ç»ç´ ãã¼ã¿ããã¼ã __m128i v128 = _mm_loadu_si128(reinterpret_cast<const __m128i*>(&srcImageData[i / Y_RATIO * srcWidth + j / X_RATIO])); // ã¤ã³ã¿ãªã¼ã __m128i v128l = _mm_unpacklo_epi8(v128, v128); __m128i v128u = _mm_unpackhi_epi8(v128, v128); // 16pixel x 2ã®ãã¼ã¿ãæ¸ã込㿠_mm_storeu_si128(reinterpret_cast<__m128i*>(&dstImageData[i * dstWidth + j + sizeof(__m128i) * 0]), v128l); _mm_storeu_si128(reinterpret_cast<__m128i*>(&dstImageData[i * dstWidth + j + sizeof(__m128i) * 1]), v128u); #elif defined(ENABLE_NEON) // 16pixelåã®ç»ç´ ãã¼ã¿ããã¼ã uint8x16_t v128 = vld1q_u8(&srcImageData[i / Y_RATIO * srcWidth + j]); // ã¤ã³ã¿ãªã¼ã uint8x16x2_t v128x2 = vzipq_u8(v128, v128); // 16pixel x 2ã®ãã¼ã¿ãæ¸ã込㿠vst1q_u8(dstImageData[i * dstWidth + j + sizeof(uint8x16_t) * 0], v128x2.val[0]); vst1q_u8(dstImageData[i * dstWidth + j + sizeof(uint8x16_t) * 1], v128x2.val[1]); #else dstImageData[i * dstWidth + j] = srcImageData[i / Y_RATIO * srcWidth + j]; #endif // defined(ENABLE_AVX512) } } } int main() { cv::Mat img = cv::imread("test.jpg", 0); if (img.data == nullptr) { std::cerr << "Cannot open image file: test.jpg" << std::endl; return 1; } cv::Mat scaledImg(cv::Size(img.cols * 2, img.rows * 2), CV_8UC1); scale2x(scaledImg.data, scaledImg.cols, scaledImg.rows, img.data, img.cols, img.rows); cv::namedWindow("src", CV_WINDOW_AUTOSIZE); cv::namedWindow("scaled", CV_WINDOW_AUTOSIZE); cv::imshow("src", img); cv::imshow("scaled", scaledImg); std::cout << "Please hit any key on the window to exit this program" << std::endl; cv::waitKey(0); return 0; } #endif // defined(ENABLE_AVX512) && !defined(__AVX512F__)
ä½ã®èç¥ãç¡ãã«ï¼SSE/AVXãARM NEONã®çµã¿è¾¼ã¿é¢æ°ãåãå©ç¨ãããï¼SSE/AVXã«é¢ãã¦ã¯Intelã®Intrinsics Guideãï¼ARM NEONã«é¢ãã¦ã¯ARM NEON Intrinsicsãåç §ããã¨ããï¼
SSE/AVXã®å¤æ°åã¯ä»¥ä¸ã®éãï¼
å | å 容 |
---|---|
__m128 |
float å4åå |
__m128d |
double å2åå |
__m128i |
æ´æ°å (int ã unsigned char ãªã©ãæ ¼ç´ã§ãã) |
__m256 |
float å8åå |
__m256d |
double å4åå |
__m256i |
æ´æ°å (int ã unsigned char ãªã©ãæ ¼ç´ã§ãã) |
__m512 |
float å16åå |
__m512d |
double å8åå |
__m512i |
æ´æ°å (int ã unsigned char ãªã©ãæ ¼ç´ã§ãã) |
SSE/AVXã®çµã¿è¾¼ã¿é¢æ°ã¯åºæ¬çã«
- SSEã®å ´åï¼
_mm_[xxx]{[u]}_[yyy]
- AVXã®å ´åï¼
_mm256_[xxx]{[u]}_[yyy]
- AVX-512ã®å ´åï¼
_mm512_[xxx]{[u]}_[yyy]
ã®å½¢å¼ã§å½åããã¦ããï¼
[xxx]
ï¼ [{u}]
ï¼ [yyy]
ã®é¨åã«ã¤ãã¦ã¯ä»¥ä¸ã®éãï¼
該å½é¨å | å 容 |
---|---|
[xxx] |
load ã store ãªã©ï¼è¡ãããå½ä»¤ãããã«ãã |
[u] |
u ãä»ãã¦ããé¢æ°ã¯ã¢ã©ã¤ã³ã¡ã³ãæ¡ä»¶ãæºããã¦ããªãã¦ãï¼SEGVã§è½ã¡ãªã |
[yyy] |
å¼æ°ã®åã«ãã£ã¦å¤åããï¼ ps ãªã __m128 ï¼ pd ãªã __m128d ï¼ si128 ãªã __m128i |
ps
ï¼ pd
ã¯ãããã Precision Singleï¼ Precision Double ã®ç¥ã§ããããã ï¼
ï¼si
ã¯èª¿ã¹ã¦ããªãï¼
ARM NEONã®å¤æ°åã¯è¦ãç®éãï¼ [xxx][size]x[NNN]{x[MMM]}
ã®å½¢å¼ã¨ãªã£ã¦ããï¼
該å½é¨å | å 容 |
---|---|
[xxx] |
uint ã int ï¼ float ãªã©ã®ãã¯ã¿ã®1è¦ç´ ã®åãããã«ãã |
[size] |
ãã¯ã¿ã®è¦ç´ å1ã¤ã®ãµã¤ãº (åä½ã¯bit) |
[NNN] |
ãã¯ã¿è¦ç´ ã®åæ° |
[MMM] |
ã¤ã³ã¿ãªã¼ãç¨ã«ãã£ã¤ããNEONã¬ã¸ã¹ã¿ã®åæ°ï¼2ãã4ã¾ã§ã®å¤ãåãï¼1ã¤ã®å ´åã¯çç¥ããã |
ARM NEONã®çµã¿è¾¼ã¿é¢æ°ãç´æçã«å©ç¨ã§ããå½åã§ï¼ v[xxx]{[q]}_{yyy}
ã¨ãªã£ã¦ããï¼
該å½é¨å | å 容 |
---|---|
[xxx] |
add ã ld ãªã©ï¼è¡ãããå½ä»¤ãããã«ãã |
[q] |
qãä»ãã¦ããã°Qã¬ã¸ã¹ã¿(128bit)ãç¨ããå½ä»¤ï¼ä»ãã¦ããªããªãã°Dã¬ã¸ã¹ã¿(64bit)ãç¨ããå½ä»¤ |
[yyy] |
å¼æ°ã®åã«ãã£ã¦å¤åããï¼ u8 ï¼ s16 ï¼ f32 ãªã© |
SSEãAVXãå®è¡æã«å©ç¨å¯è½ãã©ããã調ã¹ã
å©ç¨å¯è½ãã©ããã調ã¹ãã¢ããã¼ã·ã§ã³
ããã¾ã§ã¯ã³ã³ãã¤ã«æã«ã©ã®å½ä»¤ã使ç¨ããããæå®ãããã¨ãåæã«ãã¦ããï¼ ãããï¼å®è¡æã«SIMDå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ããå ´åãããï¼
Linuxã§ããã°ï¼åºæ¬çã«ããã°ã©ã ã¯ãã®ç°å¢ã§ã³ã³ãã¤ã«ãï¼å®è¡ãããã¨ãå¤ãããï¼å®è¡æã«SIMDå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãªãã¦ããããï¼Windowsã«ããã¦ã¯ããç°å¢ã§ã³ã³ãã¤ã«ããããã°ã©ã ãæ§ã ãªç°å¢ã§åä½ããããã¨ãå¤ãããï¼å©ç¨å¯å¦ã調ã¹ãå¿ è¦ãããï¼
ããã§ã¯ï¼SSE/AVXçã®x86/x64ã«ãããSIMDå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãæ¹æ³ã示ãï¼ ï¼ARMã®NEONã«ã¤ãã¦ã¯æªèª¿æ»ï¼
cpuidå½ä»¤ã¨cpuidã®çµã¿è¾¼ã¿é¢æ°
çãã¯ç°¡åã§cpuidå½ä»¤ãå©ç¨ããã¨ããï¼ ãã®å½ä»¤ã¯ã¢ã»ã³ãã©ã§ã¯1å½ä»¤ã¨ãã¦ç¨æããã¦ããï¼
mov $1,%eax ; cpuidã®å¼æ°1 mov $0,%ecx ; cpuidã®å¼æ°2 cpuid ; ããã§eax, ebx, ecx, edxã«çµæãæ ¼ç´ããã
ã¾ã eax
ããã³ ecx
ã«åå¾ãããCPUã®æ
å ±ã«é¢ããå¤ãã»ãããï¼ãã®å¾cpuidå½ä»¤ãå®è¡ããã¨ï¼eax, ebx, ecx, edxã«æ
å ±ãè¿å´ãããå½ä»¤ã¨ãªã£ã¦ããï¼
ã¢ã»ã³ãã©ï¼ããã³ã¤ã³ã©ã¤ã³ã¢ã»ã³ãã©ã§ãªããã°å©ç¨ã§ããªãã®ãã¨ããã¨ããã§ã¯ãªãï¼gcc, clang, MSVCã§ããã°ï¼cpuidã®çµã¿è¾¼ã¿é¢æ°ãç¨æããã¦ããï¼ ãããï¼gcc/clangã¨MSVCã§å¼æ°çãç°ãªãããï¼ä»¥ä¸ã®ããã«çµ±ä¸ãã¦å©ç¨ã§ããã¤ã³ã©ã¤ã³é¢æ°ãç¨æããã¨æ¥½ã§ããï¼
#include <array> #include <type_traits> #if defined(__GNUC__) # include <cpuid.h> #elif defined(_MSC_VER) # include <intrin.h> #endif /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam T int* * @param [out] cpuInfo cpuidã®çµææ ¼ç´å ï¼cpuInfo[0]ããcpuInfo[3]ã«çµæãæ ¼ç´ãããï¼ * @param [in] eax cpuidã®å¼æ° */ template< typename T, typename std::enable_if<std::is_same<T, int*>::value, std::nullptr_t>::type = nullptr > static inline void cpuid(T cpuInfo, int eax) noexcept { #if defined(__GNUC__) ::__cpuid(eax, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) ::__cpuid(cpuInfo, eax); #endif // defined(__GNUC__) } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®é åã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuid(int (&cpuInfo)[kSize], int eax) noexcept { static_assert(kSize >= 4, "CPU info array size must be four or more"); cpuid(&cpuInfo[0], eax); } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®std::arrayã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuid(std::array<int, kSize>& cpuInfo, int eax) noexcept { static_assert(kSize >= 4, "CPU info array size must be four or more"); cpuid(cpuInfo.data(), eax); } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam T int* * @param [out] cpuInfo cpuidã®çµææ ¼ç´å ï¼cpuInfo[0]ããcpuInfo[3]ã«çµæãæ ¼ç´ãããï¼ * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ template< typename T, typename std::enable_if<std::is_same<T, int*>::value, std::nullptr_t>::type = nullptr > static inline void cpuidex(T cpuInfo, int eax, int ecx) noexcept { #if defined(__GNUC__) ::__cpuid_count(eax, ecx, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) ::__cpuidex(cpuInfo, eax, ecx); #endif // defined(__GNUC__) } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®é åã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuidex(int (&cpuInfo)[kSize], int eax, int ecx) noexcept { static_assert(kSize >= 4, "[util::cpuidex] CPU info array size must be four or more"); cpuidex(&cpuInfo[0], eax, ecx); } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®std::arrayã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuidex(std::array<int, kSize>& cpuInfo, int eax, int ecx) noexcept { static_assert(kSize >= 4, "[util::cpuidex] CPU info array size must be four or more"); cpuidex(cpuInfo.data(), eax, ecx); }
cpuid()
ã¯eaxãæå®ãï¼ecxã¯0ã¨ãã¦ï¼ç¬¬ä¸å¼æ°ã«eaxããedxã®å¤ãé ã«æ ¼ç´ããé¢æ°ï¼ cpuidex()
㯠cpuid()
ã®ecxæå®çã§ããï¼
ä¸è¨ã¯ç¬¬ä¸å¼æ°ã«é
åã sd::array
ãæ¾ãè¾¼ãã ã¨ãï¼ãµã¤ãºãã³ã³ãã¤ã«æã«å¤å®ããããã«ãã¦ããï¼
Cè¨èªç¨ã«æ¸ãç´ããªã以ä¸ã®ãããªåç´ãªå½¢ã§ããï¼
#if defined(__GNUC__) # include <cpuid.h> #elif defined(_MSC_VER) # include <intrin.h> #endif #ifndef __cplusplus # if defined(_MSC_VER) # define inline __inline # define __inline__ __inline # elif !defined(__GNUC__) && !defined(__STDC_VERSION__) || __STDC_VERSION__ < 199901L # define inline # define __inline # endif #endif /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @param [out] cpuInfo cpuidã®çµææ ¼ç´å ï¼cpuInfo[0]ããcpuInfo[3]ã«çµæãæ ¼ç´ãããï¼ * @param [in] eax cpuidã®å¼æ° */ static inline void cpuid(int* cpuInfo, int eax) { #if defined(__GNUC__) __cpuid(eax, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) __cpuid(cpuInfo, eax); #endif // defined(__GNUC__) } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @param [out] cpuInfo cpuidã®çµææ ¼ç´å ï¼cpuInfo[0]ããcpuInfo[3]ã«çµæãæ ¼ç´ãããï¼ * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ static inline void cpuidex(int* cpuInfo, int eax, int ecx) { #if defined(__GNUC__) __cpuid_count(eax, ecx, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) __cpuidex(cpuInfo, eax, ecx); #endif // defined(__GNUC__) }
ããã§ï¼CPUã®æ å ±ãåå¾ããæºåã¯ã§ããï¼
cpuidããåå¾ã§ããæ å ±ããï¼ã©ãããã°SIMDå½ä»¤ãå©ç¨ã§ãããå¤å®ã§ãããã¯cpuidã«ã¤ãã¦ã®ããã¥ã¡ã³ãçãåç §ããã¨ãããï¼è¡¨ã«ã¾ã¨ããã¨ä»¥ä¸ã®éãã§ããï¼
SIMDå½ä»¤ | å¼æ°eax | å¼æ°ecx | ã¬ã¸ã¹ã¿ã¨ãã©ã°ããã |
---|---|---|---|
MMX | 1 | 0 | edx [bit 23] |
SSE | 1 | 0 | edx [bit 25] |
SSE2 | 1 | 0 | edx [bit 26] |
SSE3 | 1 | 0 | ecx [bit 0] |
SSSE3 | 1 | 0 | ecx [bit 9] |
SSE4.1 | 1 | 0 | ecx [bit 19] |
SSE4.2 | 1 | 0 | ecx [bit 20] |
SSE4A | 0x80000001 | 0 | ecx [bit 6] |
AVX | 1 | 0 | ecx [bit 28] |
AVX2 | 7 | 0 | ebx [bit 5] |
FMA | 1 | 0 | ecx [bit 12] |
AVX512F | 7 | 0 | ebx [bit 16] |
AVX512BW | 7 | 0 | ebx [bit 30] |
AVX512CD | 7 | 0 | ebx [bit 28] |
AVX512DQ | 7 | 0 | ebx [bit 17] |
AVX512ER | 7 | 0 | ebx [bit 27] |
AVX512IFMA52 | 7 | 0 | ebx [bit 21] |
AVX512PF | 7 | 0 | ebx [bit 26] |
AVX512VL | 7 | 0 | ebx [bit 31] |
AVX512_4FMAPS | 7 | 0 | edx [bit 2] |
AVX512_4VNNIW | 7 | 0 | edx [bit 3] |
AVX512BITALG | 7 | 0 | ecx [bit 12] |
AVX512VPOPCNTDQ | 7 | 0 | ecx [bit 14] |
AVX512VBMI | 7 | 0 | ecx [bit 1] |
AVX512VBMI2 | 7 | 0 | ecx [bit 6] |
AVX512VNNI | 7 | 0 | ecx [bit 11] |
ã¡ãªã¿ã«ï¼x64ã§ã¯SSEï¼SSE2ã¯å©ç¨å¯è½ã§ããã¨ã®ãã¨ãªã®ã§ï¼ããããå¤å®ããå¿ è¦ã¯ãªãï¼
以ä¸ãè¸ã¾ãã¦ï¼ä»¥ä¸ã®ãããªã¤ã³ã©ã¤ã³é¢æ°ãå®ç¾©ããããããã¡ã¤ã«ãç¨æãã¦ããã¨ä¾¿å©ã§ããï¼ ãªãï¼åå空éãå ããçï¼å¤å°æ¹è¯ãããã®ãGitHubã«ç½®ãã¦ããï¼
// cpuid.hpp #ifndef CPUID_HPP #define CPUID_HPP #include <algorithm> #include <array> #include <string> #include <type_traits> #include <utility> #if defined(__GNUC__) # include <cpuid.h> #elif defined(_MSC_VER) # include <intrin.h> #endif /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam T int* * @param [out] cpuInfo cpuidã®çµææ ¼ç´å ï¼cpuInfo[0]ããcpuInfo[3]ã«çµæãæ ¼ç´ãããï¼ * @param [in] eax cpuidã®å¼æ° */ template< typename T, typename std::enable_if<std::is_same<T, int*>::value, std::nullptr_t>::type = nullptr > static inline void cpuid(T cpuInfo, int eax) noexcept { #if defined(__GNUC__) ::__cpuid(eax, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) ::__cpuid(cpuInfo, eax); #endif // defined(__GNUC__) } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®é åã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuid(int (&cpuInfo)[kSize], int eax) noexcept { static_assert(kSize >= 4, "CPU info array size must be four or more"); cpuid(&cpuInfo[0], eax); } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®std::arrayã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuid(std::array<int, kSize>& cpuInfo, int eax) noexcept { static_assert(kSize >= 4, "CPU info array size must be four or more"); cpuid(cpuInfo.data(), eax); } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam T int* * @param [out] cpuInfo cpuidã®çµææ ¼ç´å ï¼cpuInfo[0]ããcpuInfo[3]ã«çµæãæ ¼ç´ãããï¼ * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ template< typename T, typename std::enable_if<std::is_same<T, int*>::value, std::nullptr_t>::type = nullptr > static inline void cpuidex(T cpuInfo, int eax, int ecx) noexcept { #if defined(__GNUC__) ::__cpuid_count(eax, ecx, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) ::__cpuidex(cpuInfo, eax, ecx); #endif // defined(__GNUC__) } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®é åã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuidex(int (&cpuInfo)[kSize], int eax, int ecx) noexcept { static_assert(kSize >= 4, "[util::cpuidex] CPU info array size must be four or more"); cpuidex(&cpuInfo[0], eax, ecx); } /*! * @brief cpuidã®å®è¡çµæã第ä¸å¼æ°ã®std::arrayã«æ ¼ç´ãã * * å®è¡çµæã®eaxãcpuInfo[0]ï¼ebxãcpuInfo[1]ï¼ecxãcpuInfo[2]ï¼edxãcpuInfo[3]ã«ã³ãã¼ãã * * @tparam kSize é åãµã¤ãº * @param [out] cpuInfo cpuidã®çµææ ¼ç´å é åï¼è¦ç´ æ°ã4以ä¸ã§ãªããã°ã³ã³ãã¤ã«ã¨ã©ã¼ã¨ãªã * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° */ template<std::size_t kSize> static inline void cpuidex(std::array<int, kSize>& cpuInfo, int eax, int ecx) noexcept { static_assert(kSize >= 4, "[util::cpuidex] CPU info array size must be four or more"); cpuidex(cpuInfo.data(), eax, ecx); } /*! * @brief cpuidã®å®è¡çµæã®ãã¡ï¼æå®ã¬ã¸ã¹ã¿ã®æå®ããããç«ã£ã¦ãããã©ãã調ã¹ã * @param [in] eax cpuidã®å¼æ° * @param [in] index cpuidã®çµæã®ã¤ã³ããã¯ã¹ï¼0ãªãeaxï¼1ãªãebxï¼2ãªãecxï¼3ãªãedx * @param [in] nBit ç«ã£ã¦ãããã©ãã調ã¹ããããã * @return æå®ã¬ã¸ã¹ã¿ã®æå®ããããç«ã£ã¦ãããªãtrueï¼ããã§ãªããã°false */ static inline bool cpuidBit(int eax, int index, int nBit) noexcept { std::array<int, 4> cpuInfo; cpuid(cpuInfo, eax); return (cpuInfo[index] & (1 << nBit)) != 0; } /*! * @brief cpuidã®å®è¡çµæã®ãã¡ï¼æå®ã¬ã¸ã¹ã¿ã®æå®ããããç«ã£ã¦ãããã©ãã調ã¹ã * @param [in] eax cpuidã®å¼æ° * @param [in] ecx cpuidã®å¼æ° * @param [in] index cpuidã®çµæã®ã¤ã³ããã¯ã¹ï¼0ãªãeaxï¼1ãªãebxï¼2ãªãecxï¼3ãªãedx * @param [in] nBit ç«ã£ã¦ãããã©ãã調ã¹ããããã * @return æå®ã¬ã¸ã¹ã¿ã®æå®ããããç«ã£ã¦ãããªãtrueï¼ããã§ãªããã°false */ static inline bool cpuidexBit(int eax, int ecx, int index, int nBit) noexcept { std::array<int, 4> cpuInfo; cpuidex(cpuInfo, eax, ecx); return (cpuInfo[index] & (1 << nBit)) != 0; } /*! * @brief MMXå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return MMXå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isMmxAvailable() noexcept { return cpuidBit(1, 3, 23); } /*! * @brief SSEå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSEå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSseAvailable() noexcept { return cpuidBit(1, 3, 25); } /*! * @brief SSE2å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSE2å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSse2Available() noexcept { return cpuidBit(1, 3, 26); } /*! * @brief SSE3å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSE3å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSse3Available() noexcept { return cpuidBit(1, 2, 0); } /*! * @brief SSSE3å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSSE3å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSsse3Available() noexcept { return cpuidBit(1, 2, 9); } /*! * @brief SSE4.1å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSE4.1å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSse41Available() noexcept { return cpuidBit(1, 2, 19); } /*! * @brief SSE4.2å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSE4.2å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSse42Available() noexcept { return cpuidBit(1, 2, 20); } /*! * @brief SSE4Aå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return SSE4Aå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isSse4aAvailable() noexcept { std::array<int, 4> cpuInfo; cpuid(cpuInfo, 0x80000000); if (static_cast<unsigned int>(cpuInfo[0]) < 0x80000001U) { return false; } return cpuidBit(0x80000001, 2, 6); } /*! * @brief AVXå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVXå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvxAvailable() noexcept { return cpuidBit(1, 2, 28); } /*! * @brief AVX2å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX2å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx2Available() noexcept { return cpuidBit(7, 1, 5); } /*! * @brief FMAå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return FMAå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isFmaAvailable() noexcept { return cpuidBit(1, 2, 12); } /*! * @brief AVX512Få½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512Få½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512FAvailable() noexcept { return cpuidBit(7, 1, 16); } /*! * @brief AVX512BWå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512BWå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512BwAvailable() noexcept { return cpuidBit(7, 1, 30); } /*! * @brief AVX512CDå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512CDå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512CdAvailable() noexcept { return cpuidBit(7, 1, 28); } /*! * @brief AVX512DQå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512DQå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512DqAvailable() noexcept { return cpuidBit(7, 1, 17); } /*! * @brief AVX512ERå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512ERå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512ErAvailable() noexcept { return cpuidBit(7, 1, 27); } /*! * @brief AVX512IFMA52å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512IFMA52å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512Ifma52Available() noexcept { return cpuidBit(7, 1, 21); } /*! * @brief AVX512PFå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512PFå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512PfAvailable() noexcept { return cpuidBit(7, 1, 26); } /*! * @brief AVX512VLå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512VLå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512VlAvailable() noexcept { return cpuidBit(7, 1, 31); } /*! * @brief AVX512_4FMAPSå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512_4FMAPSå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512_4fmapsAvailable() noexcept { return cpuidBit(7, 3, 2); } /*! * @brief AVX512_4VNNIWå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512_4VNNIWå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512_4vnniwAvailable() noexcept { return cpuidBit(7, 3, 3); } /*! * @brief AVX512BITALGå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512BITALGå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512BitalgAvailable() noexcept { return cpuidBit(7, 2, 12); } /*! * @brief AVX512VPOPCNTDQå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512VPOPCNTDQå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512VpopcntdqAvailable() noexcept { return cpuidBit(7, 2, 14); } /*! * @brief AVX512VBMIå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512VBMIå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512VbmiAvailable() noexcept { return cpuidBit(7, 2, 1); } /*! * @brief AVX512VBMI2å½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512VBMI2å½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512Vbmi2Available() noexcept { return cpuidBit(7, 2, 6); } /*! * @brief AVX512VNNIå½ä»¤ãå©ç¨å¯è½ãã©ããã調ã¹ãï¼ * @return AVX512VNNIå½ä»¤ãå©ç¨å¯è½ãªãã°trueï¼ããã§ãªããã°falseï¼ */ static inline bool isAvx512VnniAvailable() noexcept { return cpuidBit(7, 2, 6); } //// 以ä¸ã¯ãã¾ã /*! * @brief CPUã®ãã³ãIDã第ä¸å¼æ°ã®ãã¤ã³ã¿ã®æãã¡ã¢ãªé åã«ã³ãã¼ãã * * å é ãã13byteã®ä¸æ¸ããè¡ã * * @tparam T char* * @param [out] vendorId CPUã®ãã³ãID */ template< typename T, typename std::enable_if<std::is_same<T, char*>::value, std::nullptr_t>::type = nullptr > static inline void copyCpuVendorId(T vendorId) noexcept { std::array<int, 4> cpuInfo; cpuid(cpuInfo, 0); const auto p = reinterpret_cast<int*>(vendorId); p[0] = cpuInfo[1]; p[1] = cpuInfo[3]; p[2] = cpuInfo[2]; vendorId[12] = '\0'; } /*! * @brief CPUã®ãã³ãIDã第ä¸å¼æ°ã®é åã«ã³ãã¼ãã * * é åã®è¦ç´ æ°ã¯13å以ä¸ã§ãªããã°ãªããªã * * @tparam kSize é åã®ãµã¤ãº * @param [out] vendorId CPUã®ãã³ãID */ template<std::size_t kSize> static inline void copyCpuVendorId(char (&vendorId)[kSize]) noexcept { static_assert(kSize >= 12, "CPU vendor ID array size must be 12 or more"); copyCpuVendorId(vendorId.data()); } /*! * @brief CPUã®ãã³ãIDã第ä¸å¼æ°ã®std::arrayã«ã³ãã¼ãã * * std::arrayã®è¦ç´ æ°ã¯13å以ä¸ã§ãªããã°ãªããªã * * @tparam kSize é åã®ãµã¤ãº * @param [out] vendorId CPUã®ãã³ãID */ template<std::size_t kSize> static inline void copyCpuVendorId(std::array<char, kSize>& vendorId) noexcept { static_assert(kSize >= 12, "CPU vendor ID array size must be 12 or more"); copyCpuVendorId(vendorId.data()); } /*! * @brief CPUã®ãã³ãIDãstd::stringã¨ãã¦å¾ã * @return CPUã®ãã³ãID */ static inline std::string getCpuVendorId() noexcept { std::array<char, 32> vendorId; std::fill(std::begin(vendorId), std::end(vendorId), '\0'); copyCpuVendorId(vendorId); return std::string{ vendorId.data() }; } /*! * @brief CPUã®ãã©ã³ãæååã第ä¸å¼æ°ã®ãã¤ã³ã¿ã®æãã¡ã¢ãªé åã«ã³ãã¼ãã * @tparam T char* * @param [out] brandString ãã©ã³ãæåååºåå é å */ template< typename T, typename std::enable_if<std::is_same<T, char*>::value, std::nullptr_t>::type = nullptr > static inline void copyCpuBrandString(T brandString) noexcept { std::array<int, 4> cpuInfo; cpuid(cpuInfo, 0x80000000); if (static_cast<unsigned int>(cpuInfo[0]) < 0x80000004) { brandString[0] = '\0'; return; } const auto p = reinterpret_cast<int*>(brandString); cpuid(cpuInfo, 0x80000002); std::copy(std::begin(cpuInfo), std::end(cpuInfo), &p[0]); cpuid(cpuInfo, 0x80000003); std::copy(std::begin(cpuInfo), std::end(cpuInfo), &p[cpuInfo.size()]); cpuid(cpuInfo, 0x80000004); std::copy(std::begin(cpuInfo), std::end(cpuInfo), &p[cpuInfo.size() * 2]); } /*! * @brief CPUã®ãã©ã³ãæååã第ä¸å¼æ°ã®é åã«ã³ãã¼ãã * @param [out] brandString ãã©ã³ãæåååºåå é å */ template<std::size_t kSize> static inline void copyCpuBrandString(char (&brandstring)[kSize]) noexcept { static_assert(kSize >= 64, "CPU brand string array size must be 64 or more"); copyCpuBrandString(brandstring); } /*! * @brief CPUã®ãã©ã³ãæååã第ä¸å¼æ°ã®std::arrayã«ã³ãã¼ãã * @param [out] brandString ãã©ã³ãæåååºåå é å */ template<std::size_t kSize> static inline void copyCpuBrandString(std::array<char, kSize>& brandstring) noexcept { static_assert(kSize >= 64, "CPU brand string array size must be 64 or more"); copyCpuBrandString(brandstring.data()); } /*! * @brief CPUã®ãã©ã³ãæååãstd::stringã¨ãã¦å¾ã * @return CPUã®ãã©ã³ãæåå */ static inline std::string getCpuBrandString() noexcept { std::array<char, 64> brandStringArray; std::fill(std::begin(brandStringArray), std::end(brandStringArray), '\0'); copyCpuVendorId(brandStringArray); return std::string{ brandStringArray.data() }; } #endif // CPUID_HPP
ä¸è¨ã®é¢æ°ãç¨ããã¨ï¼ä¾ãã°ï¼AVX2ãå©ç¨å¯è½ã§ãããã©ããã¯
auto hasAvx2 = isAvx2Available();
ã®ããã«ãã¦èª¿ã¹ãããï¼
MSDNã®cpuidã®ãµã³ãã«ã³ã¼ã
ã¡ãªã¿ã«ï¼MSDNã«ã __cpuid()
ãå©ç¨ãã¦å©ç¨å¯è½ãªSIMDå½ä»¤ã調ã¹ããµã³ãã«ã³ã¼ããããï¼
ãã®ãµã³ãã«ã³ã¼ãã¯MSVCã§ã¯ã³ã³ãã¤ã«ã§ãããï¼gccã§ã¯ã³ã³ãã¤ã«ã§ããªãï¼
両è
å
±ã«ã³ã³ãã¤ã«ã§ããããã«ãããªãï¼ä»¥ä¸ã®ããã«æ¸ãç´ãã¨ããï¼
Wandboxã§ã®å®è¡çµæã¯ãã®ããã«ãªãï¼
// InstructionSet.cpp Compile by using: cl /EHsc /W4 InstructionSet.cpp // processor: x86, x64 // Uses the __cpuid intrinsic to get information about // CPU extended instruction set support. #include <algorithm> #include <array> #include <bitset> #include <iostream> #include <string> #include <vector> #if defined(__GNUC__) # include <cpuid.h> #elif defined(_MSC_VER) # include <intrin.h> #endif static inline void cpuid(int* cpuInfo, int eax) noexcept { #if defined(__GNUC__) __cpuid(eax, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) __cpuid(cpuInfo, eax); #endif // defined(__GNUC__) } template<std::size_t kSize> static inline void cpuid(std::array<int, kSize>& cpuInfo, int eax) noexcept { static_assert(kSize >= 4, "CPU info array size must be four or more"); cpuid(cpuInfo.data(), eax); } static inline void cpuidex(int* cpuInfo, int eax, int ecx) noexcept { #if defined(__GNUC__) __cpuid_count(eax, ecx, cpuInfo[0], cpuInfo[1], cpuInfo[2], cpuInfo[3]); #elif defined(_MSC_VER) __cpuidex(cpuInfo, eax, ecx); #endif // defined(__GNUC__) } template<std::size_t kSize> static inline void cpuidex(std::array<int, kSize>& cpuInfo, int eax, int ecx) noexcept { static_assert(kSize >= 4, "CPU info array size must be four or more"); cpuidex(cpuInfo.data(), eax, ecx); } class InstructionSet { // forward declarations class InstructionSet_Internal; public: // getters static std::string Vendor() noexcept { return CPU_Rep.vendor_; } static std::string Brand() noexcept { return CPU_Rep.brand_; } static bool SSE3() noexcept { return CPU_Rep.f_1_ECX_[0]; } static bool PCLMULQDQ() noexcept { return CPU_Rep.f_1_ECX_[1]; } static bool MONITOR() noexcept { return CPU_Rep.f_1_ECX_[3]; } static bool SSSE3() noexcept { return CPU_Rep.f_1_ECX_[9]; } static bool FMA() noexcept { return CPU_Rep.f_1_ECX_[12]; } static bool CMPXCHG16B() noexcept { return CPU_Rep.f_1_ECX_[13]; } static bool SSE41() noexcept { return CPU_Rep.f_1_ECX_[19]; } static bool SSE42() noexcept { return CPU_Rep.f_1_ECX_[20]; } static bool MOVBE() noexcept { return CPU_Rep.f_1_ECX_[22]; } static bool POPCNT() noexcept { return CPU_Rep.f_1_ECX_[23]; } static bool AES() noexcept { return CPU_Rep.f_1_ECX_[25]; } static bool XSAVE() noexcept { return CPU_Rep.f_1_ECX_[26]; } static bool OSXSAVE() noexcept { return CPU_Rep.f_1_ECX_[27]; } static bool AVX() noexcept { return CPU_Rep.f_1_ECX_[28]; } static bool F16C() noexcept { return CPU_Rep.f_1_ECX_[29]; } static bool RDRAND() noexcept { return CPU_Rep.f_1_ECX_[30]; } static bool MSR() noexcept { return CPU_Rep.f_1_EDX_[5]; } static bool CX8() noexcept { return CPU_Rep.f_1_EDX_[8]; } static bool SEP() noexcept { return CPU_Rep.f_1_EDX_[11]; } static bool CMOV() noexcept { return CPU_Rep.f_1_EDX_[15]; } static bool CLFSH() noexcept { return CPU_Rep.f_1_EDX_[19]; } static bool MMX() noexcept { return CPU_Rep.f_1_EDX_[23]; } static bool FXSR() noexcept { return CPU_Rep.f_1_EDX_[24]; } static bool SSE() noexcept { return CPU_Rep.f_1_EDX_[25]; } static bool SSE2() noexcept { return CPU_Rep.f_1_EDX_[26]; } static bool FSGSBASE() noexcept { return CPU_Rep.f_7_EBX_[0]; } static bool BMI1() noexcept { return CPU_Rep.f_7_EBX_[3]; } static bool HLE() noexcept { return CPU_Rep.isIntel_ && CPU_Rep.f_7_EBX_[4]; } static bool AVX2() noexcept { return CPU_Rep.f_7_EBX_[5]; } static bool BMI2() noexcept { return CPU_Rep.f_7_EBX_[8]; } static bool ERMS() noexcept { return CPU_Rep.f_7_EBX_[9]; } static bool INVPCID() noexcept { return CPU_Rep.f_7_EBX_[10]; } static bool RTM() noexcept { return CPU_Rep.isIntel_ && CPU_Rep.f_7_EBX_[11]; } static bool AVX512F() noexcept { return CPU_Rep.f_7_EBX_[16]; } static bool AVX512DQ() noexcept { return CPU_Rep.f_7_EBX_[17]; } static bool RDSEED() noexcept { return CPU_Rep.f_7_EBX_[18]; } static bool ADX() noexcept { return CPU_Rep.f_7_EBX_[19]; } static bool AVX512IFMA() noexcept { return CPU_Rep.f_7_EBX_[21]; } static bool AVX512PF() noexcept { return CPU_Rep.f_7_EBX_[26]; } static bool AVX512ER() noexcept { return CPU_Rep.f_7_EBX_[27]; } static bool AVX512CD() noexcept { return CPU_Rep.f_7_EBX_[28]; } static bool SHA() noexcept { return CPU_Rep.f_7_EBX_[29]; } static bool AVX512BW() noexcept { return CPU_Rep.f_7_EBX_[30]; } static bool AVX512VL() noexcept { return CPU_Rep.f_7_EBX_[31]; } static bool PREFETCHWT1() noexcept { return CPU_Rep.f_7_ECX_[0]; } static bool AVX512VBMI() noexcept { return CPU_Rep.f_7_ECX_[1]; } static bool AVX512VBMI2() noexcept { return CPU_Rep.f_7_ECX_[6]; } static bool AVX512VNNI() noexcept { return CPU_Rep.f_7_ECX_[11]; } static bool AVX512BITALG() noexcept { return CPU_Rep.f_7_ECX_[12]; } static bool AVX512VPOPCNTDQ() noexcept { return CPU_Rep.f_7_ECX_[14]; } static bool AVX512_4VNNIW() noexcept { return CPU_Rep.f_7_EDX_[2]; } static bool AVX512_4FMAPS() noexcept { return CPU_Rep.f_7_EDX_[3]; } static bool LAHF() noexcept { return CPU_Rep.f_81_ECX_[0]; } static bool LZCNT() noexcept { return CPU_Rep.isIntel_ && CPU_Rep.f_81_ECX_[5]; } static bool ABM() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_ECX_[5]; } static bool SSE4a() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_ECX_[6]; } static bool XOP() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_ECX_[11]; } static bool TBM() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_ECX_[21]; } static bool SYSCALL() noexcept { return CPU_Rep.isIntel_ && CPU_Rep.f_81_EDX_[11]; } static bool MMXEXT() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_EDX_[22]; } static bool RDTSCP() noexcept { return CPU_Rep.isIntel_ && CPU_Rep.f_81_EDX_[27]; } static bool _3DNOWEXT() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_EDX_[30]; } static bool _3DNOW() noexcept { return CPU_Rep.isAMD_ && CPU_Rep.f_81_EDX_[31]; } private: static const InstructionSet_Internal CPU_Rep; class InstructionSet_Internal { public: InstructionSet_Internal() : nIds_{0} , nExIds_{0} , vendor_{} , brand_{} , isIntel_{false} , isAMD_{false} , f_1_ECX_{0} , f_1_EDX_{0} , f_7_EBX_{0} , f_7_ECX_{0} , f_7_EDX_{0} , f_81_ECX_{0} , f_81_EDX_{0} , data_{} , extdata_{} { std::array<int, 4> cpui; // Calling __cpuid with 0x0 as the function_id argument // gets the number of the highest valid function ID. cpuid(cpui, 0); nIds_ = cpui[0]; for (int i = 0; i <= nIds_; ++i) { cpuidex(cpui, i, 0); data_.push_back(cpui); } // Capture vendor string std::array<char, 0x20> vendor; std::fill(std::begin(vendor), std::end(vendor), '\0'); *reinterpret_cast<int*>(&vendor[0]) = data_[0][1]; *reinterpret_cast<int*>(&vendor[4]) = data_[0][3]; *reinterpret_cast<int*>(&vendor[8]) = data_[0][2]; vendor_ = std::string(vendor.data()); if (vendor_ == "GenuineIntel") { isIntel_ = true; } else if (vendor_ == "AuthenticAMD") { isAMD_ = true; } // load bitset with flags for function 0x00000001 if (nIds_ >= 1) { f_1_ECX_ = data_[1][2]; f_1_EDX_ = data_[1][3]; } // load bitset with flags for function 0x00000007 if (nIds_ >= 7) { f_7_EBX_ = data_[7][1]; f_7_ECX_ = data_[7][2]; f_7_EDX_ = data_[7][3]; } // Calling __cpuid with 0x80000000 as the function_id argument // gets the number of the highest valid extended ID. cpuid(cpui, 0x80000000); nExIds_ = cpui[0]; std::array<char, 0x40> brand; std::fill(std::begin(brand), std::end(brand), '\0'); for (int i = 0x80000000; i <= nExIds_; ++i) { cpuidex(cpui, i, 0); extdata_.push_back(cpui); } // load bitset with flags for function 0x80000001 if (static_cast<unsigned int>(nExIds_) >= 0x80000001) { f_81_ECX_ = extdata_[1][2]; f_81_EDX_ = extdata_[1][3]; } // Interpret CPU brand string if reported if (static_cast<unsigned int>(nExIds_) >= 0x80000004) { std::copy(std::cbegin(extdata_[2]), std::cend(extdata_[2]), reinterpret_cast<int*>(&brand[0])); std::copy(std::cbegin(extdata_[3]), std::cend(extdata_[3]), reinterpret_cast<int*>(&brand[0] + sizeof(extdata_[0]))); std::copy(std::cbegin(extdata_[4]), std::cend(extdata_[4]), reinterpret_cast<int*>(&brand[0] + sizeof(extdata_[0]) * 2)); brand_ = std::string(brand.data()); } }; int nIds_; int nExIds_; std::string vendor_; std::string brand_; bool isIntel_; bool isAMD_; std::bitset<32> f_1_ECX_; std::bitset<32> f_1_EDX_; std::bitset<32> f_7_EBX_; std::bitset<32> f_7_ECX_; std::bitset<32> f_7_EDX_; std::bitset<32> f_81_ECX_; std::bitset<32> f_81_EDX_; std::vector<std::array<int, 4>> data_; std::vector<std::array<int, 4>> extdata_; }; // class InstructionSet_Internal }; // class InstructionSet // Initialize static member data const InstructionSet::InstructionSet_Internal InstructionSet::CPU_Rep; // Print out supported instruction set extensions int main() { auto &outstream = std::cout; auto support_message = [&outstream](std::string isa_feature, bool is_supported) { outstream << isa_feature << (is_supported ? " supported" : " not supported") << std::endl; }; std::cout << InstructionSet::Vendor() << std::endl; std::cout << InstructionSet::Brand() << std::endl; support_message("3DNOW", InstructionSet::_3DNOW()); support_message("3DNOWEXT", InstructionSet::_3DNOWEXT()); support_message("ABM", InstructionSet::ABM()); support_message("ADX", InstructionSet::ADX()); support_message("AES", InstructionSet::AES()); support_message("AVX", InstructionSet::AVX()); support_message("AVX2", InstructionSet::AVX2()); support_message("AVX512CD", InstructionSet::AVX512CD()); support_message("AVX512ER", InstructionSet::AVX512ER()); support_message("AVX512F", InstructionSet::AVX512F()); support_message("AVX512DQ", InstructionSet::AVX512DQ()); support_message("AVX512IFMA", InstructionSet::AVX512IFMA()); support_message("AVX512PF", InstructionSet::AVX512PF()); support_message("AVX512BW", InstructionSet::AVX512BW()); support_message("AVX512VL", InstructionSet::AVX512VL()); support_message("AVX512VBMI", InstructionSet::AVX512VBMI()); support_message("AVX512VBMI2", InstructionSet::AVX512VBMI2()); support_message("AVX512VNNI", InstructionSet::AVX512VNNI()); support_message("AVX512BITALG", InstructionSet::AVX512BITALG()); support_message("AVX512VPOPCNTDQ", InstructionSet::AVX512VPOPCNTDQ()); support_message("AVX512_4VNNIW", InstructionSet::AVX512_4VNNIW()); support_message("AVX512_4FMAPS", InstructionSet::AVX512_4FMAPS()); support_message("BMI1", InstructionSet::BMI1()); support_message("BMI2", InstructionSet::BMI2()); support_message("CLFSH", InstructionSet::CLFSH()); support_message("CMPXCHG16B", InstructionSet::CMPXCHG16B()); support_message("CX8", InstructionSet::CX8()); support_message("ERMS", InstructionSet::ERMS()); support_message("F16C", InstructionSet::F16C()); support_message("FMA", InstructionSet::FMA()); support_message("FSGSBASE", InstructionSet::FSGSBASE()); support_message("FXSR", InstructionSet::FXSR()); support_message("HLE", InstructionSet::HLE()); support_message("INVPCID", InstructionSet::INVPCID()); support_message("LAHF", InstructionSet::LAHF()); support_message("LZCNT", InstructionSet::LZCNT()); support_message("MMX", InstructionSet::MMX()); support_message("MMXEXT", InstructionSet::MMXEXT()); support_message("MONITOR", InstructionSet::MONITOR()); support_message("MOVBE", InstructionSet::MOVBE()); support_message("MSR", InstructionSet::MSR()); support_message("OSXSAVE", InstructionSet::OSXSAVE()); support_message("PCLMULQDQ", InstructionSet::PCLMULQDQ()); support_message("POPCNT", InstructionSet::POPCNT()); support_message("PREFETCHWT1", InstructionSet::PREFETCHWT1()); support_message("RDRAND", InstructionSet::RDRAND()); support_message("RDSEED", InstructionSet::RDSEED()); support_message("RDTSCP", InstructionSet::RDTSCP()); support_message("RTM", InstructionSet::RTM()); support_message("SEP", InstructionSet::SEP()); support_message("SHA", InstructionSet::SHA()); support_message("SSE", InstructionSet::SSE()); support_message("SSE2", InstructionSet::SSE2()); support_message("SSE3", InstructionSet::SSE3()); support_message("SSE4.1", InstructionSet::SSE41()); support_message("SSE4.2", InstructionSet::SSE42()); support_message("SSE4a", InstructionSet::SSE4a()); support_message("SSSE3", InstructionSet::SSSE3()); support_message("SYSCALL", InstructionSet::SYSCALL()); support_message("TBM", InstructionSet::TBM()); support_message("XOP", InstructionSet::XOP()); support_message("XSAVE", InstructionSet::XSAVE()); }
cpuidå½ä»¤èªä½ãå©ç¨å¯è½ãã©ããã調ã¹ã
cpuidå½ä»¤èªä½ãå©ç¨å¯è½ãã©ããã調ã¹ãå¿ è¦ãããã®ã§ã¯ãªããï¼ã¨çåãæããã人ããããããããªãï¼ å®ã¯ãã®éãã§ï¼ããªãæã®CPUã§ã¯cpuidå½ä»¤ããªãã£ããããï¼
cpuidå½ä»¤ãå©ç¨å¯è½ãã©ããã¯ï¼ã¤ã³ãã«ã®ããã¥ã¡ã³ãã«è¨è¼ãã¦ããããã«ï¼eflagsã®21bitç®ãå¤æ´å¯è½ã§ãããã©ããã調ã¹ãã¨ããï¼
ãã ãï¼ããã¯Cè¨èªï¼C++ã§è¨è¿°ãããã¨ã¯ã§ããªãã®ã§ï¼ã¤ã³ã©ã¤ã³ã¢ã»ã³ãã©ã«é ¼ãå¿ è¦ãããï¼
#if defined(_MSC_VER) && defined(_WIN64) # ifndef WIN32_LEAN_AND_MEAN # define WIN32_LEAN_AND_MEAN # define CPUID_WIN32_LEAN_AND_MEAN_IS_NOT_DEFINED # endif // !WIN32_LEAN_AND_MEAN # ifndef NOMINMAX # define NOMINMAX # define CPUID_NOMINMAX_IS_NOT_DEFINED # endif // !NOMINMAX # include <windows.h> # ifdef CPUID_WIN32_LEAN_AND_MEAN_IS_NOT_DEFINED # undef CPUID_WIN32_LEAN_AND_MEAN_IS_NOT_DEFINED # undef WIN32_LEAN_AND_MEAN # endif // CPUID_WIN32_LEAN_AND_MEAN_IS_NOT_DEFINED # ifdef CPUID_NOMINMAX_IS_NOT_DEFINED # undef CPUID_NOMINMAX_IS_NOT_DEFINED # undef NOMINMAX # endif // CPUID_NOMINMAX_IS_NOT_DEFINED #endif // defined(_MSC_VER) && defined(_WIN64) static inline bool isCpuidSupported() noexcept { #if defined(__x86_64__) || defined(_WIN64) || defined(__MINGW64__) // x64ã¨ã (å ¨ã¦ã®Intel x64ããã»ããµã§ã¯cpuidå½ä»¤ã¯å©ç¨å¯è½ãªããï¼ãã®ããã«çé¢ç®ã«èª¿ã¹ãå¿ è¦ã¯ãªã) # if defined(__GNUC__) bool result; __asm__ __volatile__ ( "pushfq\n\t" "pushfq\n\t" "pop %%rax\n\t" "mov %%rax, %%rcx\n\t" "xor $0x200000, %%rax\n\t" "push %%rax\n\t" "popfq\n\t" "pushfq\n\t" "pop %%rax\n\t" "xor %%rcx, %%rax\n\t" "shr $21, %%rax\n\t" "popfq\n\t" : "=a" (result) : : "cc", "%rcx"); return result; # elif defined(_MSC_VER) // MSVCã®x64ã§ã¯ã¤ã³ã©ã¤ã³ã¢ã»ã³ãã©ãå©ç¨ã§ããªãã®ã§ã // ãã·ã³ã³ã¼ãé åãç¨æãããã®ã¡ã¢ãªé åã«å®è¡æ¨©éãä¸ãã¦ã // eflagsã®21bitç®ãå¤æ´å¯è½ãã©ããã調ã¹ã // cdecl function code std::uint8_t code[] = { 0x9c, // pushfq 0x9c, // pushfq 0x58, // pop rax 0x48, 0x89, 0xc1, // mov rcx,rax 0x48, 0x35, 0x00, 0x00, 0x20, 0x00, // xor rax,200000h 0x50, // push rax 0x9d, // popfq 0x9c, // pushfq 0x58, // pop rax 0x48, 0x31, 0xc8, // xor rax,rcx 0x48, 0xc1, 0xe8, 0x15, // shr rax,21 0x9d, // popfq 0xc3 // ret }; ::DWORD oldProtect; ::VirtualProtect(code, sizeof(code), PAGE_EXECUTE_READWRITE, &oldProtect); const auto result = reinterpret_cast<bool(__cdecl*)()>(reinterpret_cast<unsigned char*>(code))(); ::VirtualProtect(code, sizeof(code), oldProtect, &oldProtect); return result; # endif // defined(__GNUC__) #else // x86ã®ã¨ã # if defined(__GNUC__) bool result; __asm__ __volatile__ ( "pushfl\n\t" "pushfl\n\t" "pop %%eax\n\t" "mov %%eax, %%ecx\n\t" "xorl $0x200000, %%eax\n\t" "push %%eax\n\t" "popfl\n\t" "pushfl\n\t" "pop %%eax\n\t" "xorl %%ecx, %%eax\n\t" "shrl $21, %%eax\n\t" "popfl\n\t" : "=a" (result) : : "cc", "%ecx"); return result; # elif defined(_MSC_VER) bool result; __asm { pushfd pushfd pop eax mov ecx, eax xor eax, 200000h push eax popfd pushfd pop eax xor eax, ecx shr eax, 21 mov result, al popfd } return result; # endif // defined(__GNUC__) #endif // defined(__x86_64__) || defined(_WIN64) || defined(__MINGW64__) }
Intelã«ããã¨ï¼å
¨ã¦ã®x64ããã»ããµã§cpuidå½ä»¤ãå©ç¨å¯è½ã§ããããï¼x64ã®æ¹ã®ã³ã¼ãã¯ä¸è¦ã§ï¼å¸¸ã« true
ãè¿ãããã«ãã¦ãããï¼
ã¾ã¨ã
ãã®è¨äºã§ã¯ä»¥ä¸ã®ãã¨ãç´¹ä»ããï¼
- SIMDã®æ¦è¦
- SIMDã®çµã¿è¾¼ã¿é¢æ°ã®å©ç¨æ¹æ³
- ã³ã³ãã¤ã©ã®å·®ãå¸åããã¢ã©ã¤ã³ã¡ã³ãã®æå®æ¹æ³
- ãã¯ãã«ã®å ç©ãè¨ç®ãããµã³ãã«ã³ã¼ã
- SSE/AVXçã®å®è¡æå©ç¨å¯è½å¤å®
ç¹ã«ï¼SIMDã®çµã¿è¾¼ã¿é¢æ°ã®å©ç¨æ¹æ³ãç°¡åã«ã¾ã¨ããã¨ä»¥ä¸ã®ããã«ãªãï¼
alignas(alignof(__m256i)) ...
ã®å½¢ã§ï¼å¤æ°ã®ã¢ã©ã¤ã³ã¡ã³ãæå®- å¤ãMSVCãªã
__declspec(align(32))
- å¤ãgccãªã
__attribute__((aligned(32)))
- å¤ãMSVCãªã
- gcc
#include <x86intrin>
$ g++ -march=native ...
pisix_memalign()
ã§ã¢ã©ã¤ã³ãããåçã¡ã¢ãªç¢ºä¿ï¼std::free()
ã§è§£æ¾
- MSVC
#include <intrin>
> cl.exe /arch:AVX2 ...
_aligned_malloc()
ã§ã¢ã©ã¤ã³ãããåçã¡ã¢ãªç¢ºä¿ï¼_aligned_free()
ã§è§£æ¾
- AVX-512é対å¿ã®CPUã§AVX-512ããã¹ãããå ´åã¯ï¼Intelã®ã¨ãã¥ã¬ã¼ã¿ãå©ç¨
ãã®è¨äºã¯ããã¾ã§SIMDã®åºç¤ã«éããªããï¼ãã¨ã¯çµã¿è¾¼ã¿é¢æ°ã調ã¹ï¼ãã¾ãçµã¿åããããã¨ã§ï¼SIMDãããã°ã©ã ã«çµã¿è¾¼ããããã«ãªããããããªãï¼
åèæç®
- Intel Intrinsics Guide
- Intel® Software Development Emulator | Intel® Software
- ãããªåç· » SSEã¨AVXã§é«æ¬¡å ãã¯ãã«ã®å ç©è¨ç®ãé«éåãã¦ã¿ã
- SSE.æµ®åå°æ°ç¹æ¼ç®æåæé©åã¯æ¬å½ã«å¹æçãªã®ã - ãã¼
- SIMDæ¼ç® - MUGI COM
- æ¦è¦: ã¹ããªã¼ãã³ã° SIMD æ¡å¼µå½ä»¤
- x86/x64 SIMDå½ä»¤ä¸è¦§è¡¨ãï¼SSEï½AVX2ï¼
- ARM NEON Intrinsics - Using the GNU Compiler Collection (GCC)
- ARM NEON Development
- SIMD Assembly Tutorial: ARM NEON
- NEON ã使ç¨ã㦠Zynq-7000 AP SoC ã§ã®ã½ããã¦ã§ã¢æ§è½ãåä¸
- ARM gcc ããããã¦ãã¦é
- 2012 Intel® Processor Identification and the CPUID Instruction
- AMD CPUID Specification
- __cpuid, __cpuidex | Microsoft Docs