OpenCLã§CPU/GPUã使ãåããï¼
æè¿ãPG-Stromã«èå³ãããã¨ããæ¹ããã¡ããã¡ãããåå¥ã«è³ªåã¡ã¼ã«ãé ãäºãããã
ãã®ä¸ã§é ããã³ã¡ã³ãã«èå³æ·±ãæ´å¯ãã
GPUã«ããã¢ã¯ã»ã©ã¬ã¼ã·ã§ã³ã¯ç¢ºãã«èå³æ·±ãæ©è½ã ããã©ããPG-Stromã®æ¬è³ªã¯çªãè©°ãã¦ããã°ãã¤ãã©ã¤ã³å¦çã®ãåãã ããï¼ã ãããè¨ç®å¦çãCPUã§ããããã«ãã¦ãè¯ãããããªãï¼
確ãã«ãGPUã«ãã並åå¦çã¯ãããã¨ç©åãè²»ç¨å¯¾å¹æãããããããã©ããä¾ãã°æ£è¦è¡¨ç¾ãããã¿ãããGPUåããããªãå¦çãããã
PG-Stromã®å ´åãSQLã®WHEREå¥ã«ä¸ããããæ¡ä»¶ããè¡ãè©ä¾¡ããé¢æ°ãèªåçã«çæããããJITã³ã³ãã¤ã«ãã¦å®è¡ããããã¶ãããã©ã³ãã®æç¹ã§CPUå®è¡ç¨ãGPUå®è¡ç¨ã®ï¼ç¨®é¡ã®é¢æ°ãèªåçã«çæãã¦ãè¨ç®ãµã¼ãã«æ¸¡ãã¨ããå¦çã¯å®ç¾å¯è½ã ããã
NVidiaã®GPUãåæã¨ããCUDAã¨ç°ãªããOpenCLã¯AMDã®GPUãIntelã®Xeon Phiããµãã¼ããããããã©ããããOpenCL Cã§æ¸ãããã³ã¼ããCPUç¨ã«ã³ã³ãã¤ã«ããäºãå¯è½ã
ãã®è¾ºã®äºæ ããããä»ãPG-StromãOpenCLã§åå®è£ ãç´ãã¦ããã(works in progress)
ã ããCPUã¨GPUç¨ã«ããããè¨ç®ãµã¼ãæ¸ãã®ããå®è£ ãè¤éã«ãªã£ã¦ãã ãªãï½ã¨æã£ã¦ããã¨ãããå®ã¯åéãã§ããäºãå¤æã
以ä¸ã®gpuinfoã³ãã³ãã®åºåã¯ãNVidiaã®CUDA 4.2ã¨ãIntelã®OpenCL SDKãã¤ã³ã¹ãã¼ã«ããç°å¢ã§ã®ãã®ã
ãªãã¨ãPlatform-1ã§GPUããPlatform-2ã§CPUãèªèããã¦ããã
NVidiaã®OpenCLã©ã¤ãã©ãªã§ããIntelã®OpenCLã©ã¤ãã©ãªã§ãåæ§ã
ã¨ããäºã¯ã§ãããè¦æ±ãããè¨ç®ã®ç¹æ§ã«å¿ãã¦ã1åã®è¨ç®ãµã¼ãã§CPU/GPUã使ãåããã¨ããè¸å½ãã§ããã¨ããäºã«ãªãããããã¾ãããã
ããã¼ãé¢ç½ããå®è£ æ欲ãæ»ãç«ã¦ãããã
ãªãã以ä¸ã®ã³ãã³ã gpuinfo ã®URLã¯âã§ãã
https://github.com/kaigai/gpuinfo
[kaigai@iwashi gpuinfo]$ ./gpuinfo platform-index: 1 platform-vendor: NVIDIA Corporation platform-name: NVIDIA CUDA platform-version: OpenCL 1.1 CUDA 4.2.1 platform-profile: FULL_PROFILE platform-extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll Device-01 Device type: GPU Vendor: NVIDIA Corporation (id: 000010de) Name: GeForce GT 640 Version: OpenCL 1.1 CUDA Driver version: 310.32 OpenCL C version: OpenCL C 1.1 Profile: FULL_PROFILE Device available: yes Address bits: 32 Compiler available: yes Double FP config: Denorm, INF/NaN, R/nearest, R/zero, R/INF, FMA Endian: little Error correction support: no Execution capability: kernel, native kernel Extensions: cl_khr_byte_addressable_store cl_khr_icd \ cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query \ cl_nv_pragma_unroll cl_khr_global_int32_base_atomics \ cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics \ cl_khr_local_int32_extended_atomics cl_khr_fp64 Global memory cache size: 32 KB Global memory cache type: read-write Global memory cacheline size: 128 Global memory size: 2047 MB Host unified memory: no Image support: yes Image 2D max size: 32768 x 32768 Image 3D max size: 4096 x 4096 x 4096 Local memory size: 49152 Local memory type: SRAM Max clock frequency: 901 Max compute units: 2 Max constant args: 9 Max constant buffer size: 65536 Max memory allocation size: 511 MB Max parameter size: 4352 Max read image args: 256 Max samplers: 32 Max work-group size: 1024 Max work-item sizes: {1024,1024,64} Max write image args: 16 Memory base address align: 4096 Min data type align size: 128 Native vector width - char: 1 Native vector width - short: 1 Native vector width - int: 1 Native vector width - long: 1 Native vector width - float: 1 Native vector width - double: 1 Preferred vector width - char: 1 Preferred vector width - short: 1 Preferred vector width - int: 1 Preferred vector width - long: 1 Preferred vector width - float: 1 Preferred vector width - double: 1 Profiling timer resolution: 1000 Queue properties: out-of-order execution, profiling Sindle FP config: Denorm, INF/NaN, R/nearest, R/zero, R/INF, FMA platform-index: 2 platform-vendor: Intel(R) Corporation platform-name: Intel(R) OpenCL platform-version: OpenCL 1.2 LINUX platform-profile: FULL_PROFILE platform-extensions: cl_khr_fp64 cl_khr_icd cl_khr_global_int32_base_atomics \ cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics \ cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store \ cl_intel_printf cl_ext_device_fission cl_intel_exec_by_local_thread Device-01 Device type: CPU Vendor: Intel(R) Corporation (id: 00008086) Name: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz Version: OpenCL 1.2 (Build 56860) Driver version: 1.2 OpenCL C version: OpenCL C 1.2 Profile: FULL_PROFILE Device available: yes Address bits: 64 Compiler available: yes Double FP config: Denorm, INF/NaN, R/nearest, R/zero, R/INF, FMA Endian: little Error correction support: no Execution capability: kernel, native kernel Extensions: cl_khr_fp64 cl_khr_icd \ cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics \ cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics \ cl_khr_byte_addressable_store cl_intel_printf cl_ext_device_fission \ cl_intel_exec_by_local_thread Global memory cache size: 256 KB Global memory cache type: read-write Global memory cacheline size: 64 Global memory size: 386942 MB Host unified memory: yes Image support: yes Image 2D max size: 16384 x 16384 Image 3D max size: 2048 x 2048 x 2048 Local memory size: 32768 Local memory type: DRAM Max clock frequency: 2600 Max compute units: 32 Max constant args: 480 Max constant buffer size: 131072 Max memory allocation size: 96735 MB Max parameter size: 3840 Max read image args: 480 Max samplers: 480 Max work-group size: 1024 Max work-item sizes: {1024,1024,1024} Max write image args: 480 Memory base address align: 1024 Min data type align size: 128 Native vector width - char: 16 Native vector width - short: 8 Native vector width - int: 4 Native vector width - long: 2 Native vector width - float: 4 Native vector width - double: 2 Preferred vector width - char: 16 Preferred vector width - short: 8 Preferred vector width - int: 4 Preferred vector width - long: 2 Preferred vector width - float: 4 Preferred vector width - double: 2 Profiling timer resolution: 1 Queue properties: out-of-order execution, profiling Sindle FP config: Denorm, INF/NaN, R/nearest
Â