ãã ã¨ãããããã ã¨ããã
æ¨å¹´ããªã¿ã¯ã§Tim Child氏の発表ãèãã¦ä»¥æ¥ãå®è£
ã§ããªããã®ãã¨æã£ã¦æãã¦ããã¢ã¤ãã¢ããããGPUã®å¦çè½åã使ã£ã¦ãPostgreSQLã®æ¤ç´¢å¦çãé«éåã§ããªããï¼ã¨ãããã®ã§ããã
ç¹ã«è¤éãªè¨ç®ãå«ãã¯ã¨ãªã®å ´åãIndex-Scanã«è½ã¡ãªãã§ãå
¨ä»¶ã¹ãã£ã³ãèµ°ããã¨ãå¾ã
ã«ãã¦ãããããããã£ãã±ã¼ã¹ã§æå¹ã«ä½ç¨ããã®ã§ã¯ãªããããï¼ã¨ããçæ³ã§ããã
ã¯ãªã¹ãã¹ä¼æã®éãå²ã¨ã¾ã¨ã¾ã£ãéçºæéãåãäºãã§ããã®ã§ãPostgreSQLã®FDW(Foreign Data Wrapper)ã¨ãã¦åä½ããã¢ã¸ã¥ã¼ã«ãä½æãã¦ã¿ãã
ã¢ã¸ã¥ã¼ã«ã®åå㯠PG-Strom ã§ããã¤ã風ã«ããã
ã¨ããããã¨çºé³ããã
ãã㯠GPU ã®å¦çåä½ã§ãã Streaming Multiprocessor ã«ç±æ¥ããã
ãã¡ãããç¾ç¶ã®FDWã®I/Fåæãªã®ã§ãæ´æ°ã¯ä¸å¯è½ã§ã½ã¼ããéç´é¢æ°ãã¢ã¸ã¥ã¼ã«å´ã«åºããªãã¨ãã諸ã å¶ç´ã¯ãããããããã¿ã¤ãã¨ãã¦ã¯ã¾ãã¾ãã®æ§è½ã§ããã
â» ãªããä¸è¨ã®GPUé¢é£ã®è¨è¿°ã¯èè ï¼CUDAããã°ã©ãã³ã°æ´ï¼ã¶æï¼ã®ç解ã«ãããã®ã§ããééã£ã¦ããããææãã ããããããæãã¦ãã ããï½
ãã³ããã¼ã¯
ããªãæ£æçãªãã¹ãã±ã¼ã¹ã§ã¯ãããã2,000ä¸ä»¶ã®ã¬ã³ã¼ããããªããã¼ãã«ãå ¨ä»¶ã¹ãã£ã³ããã¯ã¨ãªãæå ã®ç°å¢ã§å®æ½ãã¦ã¿ãããªããæè¼ãã¦ããGPUã¯Nvidia GTX450ecoã§ããã
-- å¾æ¥ã®ãã¼ãã« mytest=# SELECT count(*) FROM pgbench_accounts WHERE (xval - 23.45) * (xval - 23.45) + (yval - 54.32) * (yval - 54.32) < 100; count -------- 629291 (1 row) Time: 29030.738 ms -- PG-Stromãå©ç¨ mytest=# SELECT count(*) FROM pgstrom_accounts WHERE (xval - 23.45) * (xval - 23.45) + (yval - 54.32) * (yval - 54.32) < 100; count -------- 629291 (1 row) Time: 2337.475 ms
é©ããäºã«ã1/10以ä¸ã®å¿çæéã§ã¯ã¨ãªãå®è¡ãã¦ãã¾ã£ãã§ã¯ãªããã
ãããå©ç¨ãã¦ããGPUã¯100Euroç¨åº¦ã®ã·ã§ãã¤ãã®ã ãã«ã伸ã³ãããããã ããã
ããä¸åãä»åº¦ã¯ãã£ã¹ã¯IOã®å½±é¿ãé¤ãããããã¼ãã«å ¨ä½ããããã¡ã«ä¹ããµã¤ãºï¼shared_buffer=960MBã§ã件æ°ã500ä¸ä»¶ã«åæ¸ï¼ã§è©¦ãã¦ã¿ãã
-- å¾æ¥ã®ãã¼ãã« mytest=# SELECT count(*) FROM t1 WHERE (xval - 23.45) * (xval - 23.45) + (yval - 54.32) * (yval - 54.32) < 100; count -------- 157800 (1 row) Time: 4106.045 ms mytest=# SELECT count(*) FROM t2 WHERE (xval - 23.45) * (xval - 23.45) + (yval - 54.32) * (yval - 54.32) < 100; count -------- 157800 (1 row) Time: 393.346 ms
ããï¼
ã¢ã¤ãã¢
PostgreSQLã®å ´åãåºæ¬çã«å
¨ä»¶ã¹ãã£ã³æã®å¦ç㯠(1) ãã£ã¹ã¯(or ãããã¡)ããã®èªã¿åºã (2) ã¿ãã«ã®å
容ã«åºã¥ã㦠WHERE æ¡ä»¶å¥ãè©ä¾¡ ã®ç¹°ãè¿ãã¨ãªãã
ãã®ããã(1)ã®å¦çä¸ã¯(2)ãå®è¡ã§ããã(2)ã®å¦çä¸ã¯(1)ãå®è¡ã§ããªããCPUãã«ãã¹ã¬ããåã¯ã²ã¨ã¤ã®ã¢ã¤ãã¢ã ããPostgreSQLã¯ã¡ã¢ãªç®¡çãI/Oå¨ããå«ã㦠Thread-Safe ãªæ§é ã«ã¯ãªã£ã¦ããªããããããã¯é常ã«é£ããã
PG-Stromã§ã¯ã(2)ã®WHEREæ¡ä»¶å¥ã®å¦çã GPU å´ã«ä»»ããäºã§ãCPUãI/Oã«å°å¿µãããã
è¨ç®å¦çã¯GPUå´ã§éåæã«å®æ½ãã¦ãããã®ã§ãCPUããè¦ãå ´åãããã«ãããã¼ã¿ãè©ä¾¡ãã¦ããã¦é æ´ãã¨é ¼ãã§ããã¨ããã°ããI/Oå¦çããã¦ããéã«è¨ç®çµæãåºæ¥ä¸ãã£ã¦ãããã¨ããç®æ®µã§ããã
ãã ãGPUã«å¦çããããã°ä¸äºOKãã¨ããã¨ãããã¯åå±ãå¸ããªãã
GPUã§è¨ç®ãããã«ã¯ãGPUæè¼ã®device memoryã«ãã¼ã¿ã転éããå¿
è¦ãããããããã«ã¯ PCI-Eãéãã¦è»¢éããå¿
è¦ãããããã®ç®æã®å¸¯åã¯CPU-Memoryéã«æ¯ã¹ã¦é常ã«å°ããã®ã§ããã(x16ã®ãã¹ã§ãMAXçå´2.5GB/s)
ãããã£ã¦ãGPUããã¤ã¹ã«è»¢éãããã¼ã¿ã®éã¯ã§ããã ãå°ãªãããæ¹ãããã
é常ãSQLã®WHEREæ¡ä»¶å¥ããã¼ãã«ã®å
¨ã¦ã®ã«ã©ã ãåç
§ããã¨ããäºã¯èãã«ããã
ãââã®æ¡ä»¶ãæºããã¬ã³ã¼ããåãåºããã¨ããã®ãã¯ã¨ãªã®ç®çã ããã ã
PG-Stromã§ã¯WHEREæ¡ä»¶å¥ã®å¦çãGPUå´ã§å®è¡ãããããã®éãGPUããã¤ã¹å´ã«è»¢éãããã®ã¯è¨ç®ã«å¿
è¦ãªã«ã©ã ã ãã§ãããæ®éã¯æ°å¤ãã¼ã¿ã®æ¹ãæååãããçããããPCI-Eãä»ãã¦GPUããã¤ã¹ã«è»¢éã®å¿
è¦ãããã®ã¯ããã¼ãã«å
¨ä½ã®10%-20%ç¨åº¦ã§ã¯ãªããããã
ãã¼ã¿æ§é ã¨éåæå¦ç
PG-Stromã®å
é¨ãã¼ã¿æ§é ããä¸è¨ã®æ¹éã«å¾ã£ã¦ç·¨æããã¦ããã
ä¾ãã°ãaãbãcãdã®4ã¤ã®ã«ã©ã ãæã¤FOREIGN TABLEãå®ç¾©ããã¨ããPG-Stromã¯åã
ã®ã«ã©ã ã«å¯¾å¿ãããã¼ãã«ã"pg_strom"ã¹ãã¼ãå
ã«ä½æããããããã®ãã¼ãã«ã¯ãè¡ãä¸æã«èå¥ãã rowid (int64) ã¨ãå
ã
ã®ãã¼ã¿ãé
ååãããã¼ã¿åãæã¤äºã«ãªãã
æè¿æµè¡ã®ã«ã©ã æåDBçãªãã¼ã¿æ§é ã¨ãã訳ã ããããã¾ã§ã PostgreSQL ã®ãã©ã³ã¶ã¯ã·ã§ã³ç®¡çã®æ å ã§ãã¼ã¿æ§é ãè¦å®ãã¦ããã®ã§ããã®è¾ºã®åä»ãªå¦ç㯠PG-Strom ã®å´ã§ã¯ä¸åãã¼ã¿ããã§æ¸ã¾ãã¦ããã
ãã®æ§ãªãã¼ã¿æ§é ãæã¤äºã«ãããPG-Stromã§ã¯PCI-Eãä»ãã¦GPUããã¤ã¹ã«éãè¾¼ããã¼ã¿ãé«éã«DBããèªã¿è¾¼ããããã«ãªã£ã¦ãããèªã¿è¾¼ãã ãã¼ã¿ã¯ãã£ã³ã¯ã¨å¼ã¶åºå®é·ã®ãããã¡ã«èããããé 次GPUããã¤ã¹ã«éåºãGPUã§ã®æ¼ç®å¦çãè¡ããçµæã®æ¸ãæ»ããè¡ãããã
å®éã«ã¯ãããã®ä¸é£ã®å¦çã¯å
¨ã¦éåæã«å®æ½ããããããCPUã¯ãã®éãDBãããã¼ã¿ãèªã¿è¾¼ã¿ã次ã®ãã£ã³ã¯ã®ã»ããã¢ãããå¯è½ã§ãããããCPU/GPUãå¹ççã«å©ç¨ããäºãã§ããã
Just-in-time compile 㨠native code å®è¡
CPUã¨GPUã«ã¯ããããå¾æä¸å¾æã®åéããããGPUã¯é常ã«å¤ãã®ä¸¦åæ¼ç®ã¦ããããå調ãã¦åä½ãããäºã«ããé«ãè¨ç®è½åãçºæ®ããããä¸å¾æãªåéãããããã®ä¸ã¤ãæ¡ä»¶åå²ã§ããã
NVidiaã®GPUã§ã¯32åã®å®è¡ã¦ããããå«ãStreaming Multiprocessorã¨ããåä½ã§ãSIMDã©ã¤ã¯ãªå¦çãè¡ããããGPUå ã®å¦çãæ¡ä»¶åå²ãå«ã¿ãç¹å®ã®ã¹ã¬ããã§ã¯ IF æ¡ä»¶ãçã«ãå¥ã®ã¹ã¬ããã§ã¯å½ã«ãªããããªå ´åãå ¨ã¦ã®ã¹ã¬ãããIFæã®çãããã¯/å½ãããã¯ãå¦çããIFæ¡ä»¶ã«åè´ããªãã±ã¼ã¹ãç ´æ£ããã¨ããå¦çãè¡ãããããã®ãããç¹ã«IFãããã¯ã®ãµã¤ãºã大ãããªãã«å¾ã£ã¦ãGPUå ã§æ¡ä»¶å¥ãå¦çããéã®ã³ã¹ããç¡è¦ã§ããªããã®ã¨ãªãã
PostgreSQLå é¨ã§ã¯WHEREæ¡ä»¶å¥ãããªã¼ç¶ã®ãã¼ã¿æ§é ã«ãã£ã¦ä¿æãã¦ããããããªã¼ãé ã«ã¹ãã£ã³ãã¦ã次㯠'+' æ¼ç®åã ããâ¦ã㨠switch() æã§åå²ããããããªå¦çã¯ãææªã®å¹çãã¨ããäºã«ãªãã
â» ãã ã並åã«å®è¡ããå ¨ã¦ã®ã¹ã¬ããã§IFæ¡ä»¶ã®è©ä¾¡çµæãåãå ´åã«ã©ããªããï¼ã¨ããç¹ã¯ã調ã¹ãéãã§ã¯ããåãããªãã£ãããã®å ´åã«ããã«ãã£ãé¿ããããã®ã§ããã°ãGPUå´ã§ã³ã³ããã¼ã«å¦çãè¡ãã®ãä¸ã¤ã®ã¢ã¤ãã¢ã
ãã®ãããPG-Stromã§ã¯Just-in-time compileã®æè¡ã使ã£ã¦åçã«ãã¤ãã£ãã®GPUã³ã¼ããçæãã¦å®è¡ããã¨ããæ¹éãæ¡ç¨ãã¦ããã
å©ç¨è ã®ã¯ã¨ãªãPG-Strom管çä¸ã®å¤é¨ãã¼ãã«ãåç §ããå ´åãQuery Plannerã¯PG-Stromã«å¯¾ãã¦ã¯ã¨ãªå®è¡ãã©ã³ãä½æããããè¦æ±ããããã®æãPG-Strom Plannerã¯WHEREæ¡ä»¶å¥ã«å¾ã£ã¦åçã«GPUããã¤ã¹ç¨ã®ã³ã¼ããçæããnvccï¼NVidia ã® GPUåãã³ã³ãã¤ã©ï¼ãå®è¡ãã¦GPUåãã®ãã¤ãã£ãã³ã¼ããçæããããã¡ãããæ¯åã³ã³ãã¤ã©ãèµ·åãã¦ãã¦ã¯æéã®ç¡é§ãªã®ã§ãçæãããã¤ããªã¯å ±æã¡ã¢ãªä¸ã«ãã£ãã·ã¥ãããã
次ãã§ãQuery-ExecutorãPG-Strom Executorãå¼ã³åºãã¨ãåè¿°ã®éããpg_stromã¹ãã¼ãå
ããèªã¿åºãããã¼ã¿ã¨å
±ã«ãGPUåãã®ãã¤ãã£ãã³ã¼ããããã¤ã¹å´ã«éåºãããéåæã«å®è¡ãããã
WHEREæ¡ä»¶å¥ã¯æ¢ã«Planner段éã§å±éããã¦ããã®ã§ãæ¹ãã¦å·¨å¤§ãª switch æãå¦çããå¿
è¦ã¯â¦ãªãã
ã¡ãªã¿ã«ãEXPLAINæã§ã©ã®ãããªGPUåãã®ã³ã¼ããçæããã¦ããããè¦ãäºãã§ããã
mytest=# EXPLAIN SELECT * FROM pgstrom_accounts WHERE (xval - 23.45) * (xval - 23.45) + (yval - 54.32) * (yval - 54.32) < 100; QUERY PLAN -------------------------------------------------------------- Foreign Scan on pgstrom_accounts (cost=2.00..0.00 rows=1000 width=368) Required Cols : aid, bid, abalance, filler, xval, yval Used in clause : xval, yval 1: typedef unsigned long size_t; 2: typedef long __clock_t; 3: typedef __clock_t clock_t; 4: #include "crt/device_runtime.h" 5: 6: typedef char bool_t; 7: 8: __global__ void 9: pgstrom_qual(unsigned char rowmap[], 10: double c5_values[], 11: unsigned char c5_nulls[], 12: double c6_values[], 13: unsigned char c6_nulls[]) 14: { 15: int offset_base = blockIdx.x * blockDim.x + threadIdx.x; 16: int offset = offset_base * 8; 17: unsigned char result = rowmap[offset_base]; 18: unsigned char errors = 0; 19: unsigned char cn5 = c5_nulls[offset_base]; 20: unsigned char cn6 = c6_nulls[offset_base]; 21: int bitmask; 22: 23: for (bitmask=1; bitmask < 256; bitmask <<= 1) 24: { 25: double cv5 = c5_values[offset]; 26: double cv6 = c6_values[offset]; 27: 28: if ((result & bitmask) && !((((cv5 - 23.45) * (cv5 - 23.45)) + ((cv6 - 54.32) * (cv6 - 54.32))) < 100)) 29: result &= ~bitmask; 30: offset++; 31: } 32: rowmap[offset_base] = (result & ~errors); 33: } (36 rows)
å ¬éå
ä»ã®ã¨ããGitHUBã§å
¬éä¸ãã©ã¤ã»ã³ã¹ã¯GPLv3ã§ãã
https://github.com/kaigai/pg_strom
ã¾ã ãããã¿ã¤ã段éãªã®ã§ãç§ã®æ°å次第ã§ä»æ§ã¯å¤ããã¾ãããå½é¢ã¯ããã¥ã¡ã³ããæå¾ ã§ãã¾ãããããã§ã使ã£ã¦ã¿ããã¨ããå¥ç¹ãªæ¹ãããã£ãããã¾ããããTwitter (@kkaigai) ãªã©ã§å¼ã³ããã¦ããããã°ã