pg2arrowã並ååãã
ä»åã¯çããã大好ããªä¾¿å©ãã¼ã«ãpg2arrow
ãã®ã話ã§ãã
PostgreSQLã§ãã¼ã¿ãã«ãªåæåãã¼ã¿å½¢å¼ Apache Arrow ãèªã¿åºãã«ã¯ãArrow_Fdwãå©ç¨ããäºãã§ãã¾ãã
PG-Stromã§ã¯GPU-Direct SQLã«ã対å¿ãã¦ãã¾ãããåæåãã¼ã¿ã¨ããäºããã£ã¦ã被åç
§åããI/Oãçºçããªããåãåã®ãã¼ã¿ãè¿åã«åºã¾ã£ã¦ããã¨ãã大éãã¼ã¿å¦çã«é©ããç¹æ§ãæã£ã¦ããã¾ãã
ã¾ããApache Arrowå½¢å¼ã®ãã¡ã¤ã«ãä½æããã«ã¯PyArrowãPandasãªã©æ§ã
ãªãã¼ã«ãããã¾ãããæã
DBå±ã¨ãã¦ã¯PostgreSQLã«æ ¼ç´ããããã©ã³ã¶ã¯ã·ã§ãã«ãªãã¼ã¿ããåæç¨ã«Apache Arrowå½¢å¼ã¨ãã¦åãåºããã¨ã¨ã¦ãå¬ããããããªæã«ä½¿ãããã¼ã«ãpg2arrow
ãªã®ã§ãã
pg2arrow
ã¯ãPostgreSQLã«ã¯ã¨ãªãæãããã®ååãçµæãApache Arrowå½¢å¼ã®ãã¡ã¤ã«ã¨ãã¦ä¿åããããã®ãã¼ã«ã§ãPG-Stromã¨å梱ãã¦é
å¸ããã¦ãã¾ãï¼ã½ã¼ã¹ã³ã¼ãã®ããªãã®é¨åã Arrow_Fdw ã¨å
±ç¨ãã¦ããããã§ãï¼ãåãåããã«ä½¿ç¨ããSQLã³ãã³ãã¯ããã¼ãã«ãåç´ã«ãã³ãããã ãã§ã¯ãªããä¾ãã°WHEREå¥ã§æ½åºæ¡ä»¶ãæå®ããããè¤æ°ã®ãã¼ãã«ãJOINããçµæãå«ããã¨ãã£ãäºãå¯è½ã§ãã
GitHubã®ãã°ã確èªããã¨ãããæåã®ã³ãããã2019å¹´4æã§ãã®ã§ãããã4å¹´ã»ã©åã«è¨è¨ã»éçºãããã¼ã«ã¨ããäºã§ããã
ä¸æ¹ãpg2arrowã«ã¯ä¸¦ååä½ããµãã¼ããã¦ããªãã¨ããå¼±ç¹ãããã¾ããã
ãã®ããããã¼ãã«ããã³ãããPostgreSQLå´ã®CPUãããã¼ã¿ãåãåã£ã¦æ¸ãè¾¼ãpg2arrowå´ã®CPUããã¾ããã®éã®ãããã¯ã¼ã¯ããªã½ã¼ã¹ã«ä½è£ãããã«ãé¢ãããçµæ§ãªéã³ç¶æ
ã¨ãªã£ã¦ããããä¾ãã°1TBãããã®ãã¼ãã«ãArrowã«å¤æãã¦ãã³ããã¼ã¯ãåããï¼ãã¨ãã£ãå ´åã§ããå¤2:00é *1ã«ããã£ã¨pg2arrowãèµ°ããã¦ãç¿æçµæã確èªããã¨ããäºã常ã§ããã
æ®éã«èãã¦ã並ååããã°å¤§ããé«éåããã¯ãã§ãã
ãã¼ã¿ã®éè¤ãé²ãããã«
DBããèªã¿åºãããã¼ã¿ãå¥ã®å½¢å¼ã«ãã¦ä¿åããã ãã§ãã®ã§ãåççã«ã¯ãè¤æ°ã®ã»ãã·ã§ã³ãä½æãã¦ãããããã並åã«å¦çãè¡ãã°æ¸ã話ã§ãã
ä¾ãã°ããããã¼ãã«ããã«ãã³ããã¦Arrowã«å¤æããã¨ãã¦ãåDBã¯ã©ã¤ã¢ã³ããäºãã«éãªãåããããããæ¼ãã®ç¡ããããªæ¡ä»¶ã§åãåãããå®è¡ããã°è¯ãããã§ããå
¸åçã«ã¯ä½ãã®ãã£ã¼ã«ãã«ããã·ã¥é¢æ°ãä¸ããã¯ã©ã¤ã¢ã³ãæ°ã§å²ã£ãæã®å°ä½ãã¯ã©ã¤ã¢ã³ãçªå·ã«ä¸è´ãããã®ã ããåãåºãã°äºè¶³ãã¾ãã
ï¼ãã㯠PostgreSQL ã®ãã©ã¬ã«ã¯ã¨ãªã§ã使ããã¦ããã¢ã¤ãã¢ã§ãï¼
ããããèæ
®ãã¹ãç¹ã¯ããä¸ã¤ããã¾ãã
ä¾ãã°ãã¯ã©ã¤ã¢ã³ãAãDBã«æ¥ç¶ããå¾ãå¥ã®èª°ãã対象ãã¼ãã«ã100è¡æ´æ°ãããã®å¾ã«ã¯ã©ã¤ã¢ã³ãBãã¯ã©ã¤ã¢ã³ãCãDBã«æ¥ç¶ããã¨ãã¾ãã
ãã®æãæ´æ°ããã100è¡ã®ãã¡30è¡ã¯ãªã¬ã³ã¸è²ã®ã40è¡ã¯æ°´è²ã®ã30è¡ã¯ã¦ã°ã¤ã¹è²ã®è¡ã ã¨ãã¾ããããããã¨ãçµæã¨ãã¦çæãããArrowãã¡ã¤ã«ã¯ãé¨åçã«æ´æ°ããããä¸æ´åãªç¶æ
ã¨ãªã£ã¦ãã¾ãã¾ãã
ããã§ã¯å¤å
¸çãªãã¯ããã¯ã§ããスナップショット同期関数ã使ãã¾ãã
æåã®ã¯ã©ã¤ã¢ã³ãAãDBã«æ¥ç¶ãããã©ã³ã¶ã¯ã·ã§ã³ãéå§ããå¾ãpg_export_snapshot()
ãå¼ã³åºãã¾ãããã®é¢æ°ã¯ãã©ã³ã¶ã¯ã·ã§ã³ã®ã¹ãããã·ã§ããã«ç´ã¥ããã¦ãã¼ã¯ãªèå¥åãè¿ãã¾ããããããä»ã®ã»ãã·ã§ã³ã§ã¤ã³ãã¼ãããã¨ãç°ãªãã»ãã·ã§ã³ã§ãã£ã¦ãå
¨ãåããã¥ã¼ãåç¾ããäºãã§ãã¾ãã
ããã¯ã馴æã¿pg_dump
ã®ä¸¦åãã³ãã§ãç¨ãããã¦ãããå®éã®æåã¯ä»¥ä¸ã®ãããªã¤ã¡ã¼ã¸ã§ãã
ã¯ã©ã¤ã¢ã³ãAï¼æåã«æ¥ç¶ããï¼
ssbm=# BEGIN READ ONLY; BEGIN ssbm=*# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; SET ssbm=*# SELECT pg_catalog.pg_export_snapshot(); pg_export_snapshot --------------------- 0000000B-000000B0-1 (1 row)
ã¯ã©ã¤ã¢ã³ãBãCãï¼ï¼ï¼ï¼ã¯ã¼ã«ã¼ï¼
ssbm=# BEGIN READ ONLY; BEGIN ssbm=*# SET TRANSACTION ISOLATION LEVEL REPEATABLE READ; SET ssbm=*# SET TRANSACTION SNAPSHOT '0000000B-000000B0-1'; SET
pg2arrowã並åã«èµ·åãã¦ã¿ãã
pg2arrowã®ä¸¦åã¢ã¼ãããµãã¼ãããããã«è¿½å ãããã®ã¯ã-n|--num-workers=N_WORKERS
ãªãã·ã§ã³ã¨ã-k|--parallel-keys=PARALLEL_KEYS
ãªãã·ã§ã³ã®ï¼ã¤ã§ãã
ãããã¯äºãã«æä»çã§ãåããã¼ãã«ãã¹ãã£ã³ããéã«èªã¿åºããã¼ã¿ãéãªããªãããæ¤ç´¢æ¡ä»¶ã調æ´ããããã®æ¹æ³ãè¥å¹²ç°ãªã£ã¦ãã¾ãã
ããããå
é¨çãªä¸¦åå¦çï¼ã¯ã¼ã«ã¼ã¹ã¬ããã®æåï¼ã¯å¤ããã¾ããã®ã§ã対象ã¨ãªããã¼ãã«ã®è¨è¨ããã¼ã¿ã®ç¹æ§ã«ãã£ã¦ä½¿ãåãã¦ãã ããã
$ ./pg2arrow --help Usage: pg2arrow [OPTION] [database] [username] General options: -d, --dbname=DBNAME Database name to connect to -c, --command=COMMAND SQL command to run -t, --table=TABLENAME Equivalent to '-c SELECT * FROM TABLENAME' (-c and -t are exclusive, either of them must be given) -n, --num-workers=N_WORKERS Enables parallel dump mode. It requires the SQL command contains $(WORKER_ID) and $(N_WORKERS), to be replaced by the numeric worker-id and number of workers. -k, --parallel-keys=PARALLEL_KEYS Enables yet another parallel dump. It requires the SQL command contains $(PARALLEL_KEY) to be replaced by the comma separated token in the PARALLEL_KEYS. (-n and -k are exclusive, either of them can be give if parallel dump. It is user's responsibility to avoid data duplication.) --inner-join=SUB_COMMAND --outer-join=SUB_COMMAND -o, --output=FILENAME result file in Apache Arrow format --append=FILENAME result Apache Arrow file to be appended (--output and --append are exclusive. If neither of them are given, it creates a temporary file.) -S, --stat[=COLUMNS] embeds min/max statistics for each record batch COLUMNS is a comma-separated list of the target columns if partially enabled. Arrow format options: -s, --segment-size=SIZE size of record batch for each Connection options: -h, --host=HOSTNAME database server host -p, --port=PORT database server port -u, --user=USERNAME database user name -w, --no-password never prompt for password -W, --password force password prompt Other options: --dump=FILENAME dump information of arrow file --progress shows progress of the job --set=NAME:VALUE config option to set before SQL execution --help shows this message Report bugs to <[email protected]>.
-n|--num-workers=N_WORKERS
ãªãã·ã§ã³
-n
ãªãã·ã§ã³ã¯ã·ã³ãã«ã«ã¯ã¼ã«ã¼æ°ãæå®ãã¾ãã
ãã®æãArrowãã¡ã¤ã«ã®å
ã¨ãªãååãSQLæã«ã¯$(WORKER_ID)
ã¨$(N_WORKERS)
ã¨ãããã¯ããåãè¾¼ãäºãã§ãããã®é¨åã¯0ããå§ã¾ãã¦ãã¼ã¯ãªã¯ã¼ã«ã¼IDã¨ããªãã·ã§ã³ã§æå®ãã並å度ã«ããããç½®ãæãããã¾ãã
ã¤ã¾ãã-c
ãªãã·ã§ã³ã§æå®ããSQLããã®ããã«å¤ããã°è¯ãããã§ãã
ãªãªã¸ãã«ï¼
SELECT * FROM lineorder
ä¿®æ£å¾ï¼
SELECT * FROM lineorder WHERE lo_orderkey % $(N_WORKERS) = $(WORKER_ID)
-k|--parallel-keys=PARALLEL_KEYS
ãªãã·ã§ã³
-k
ãªãã·ã§ã³ã¯PARALLEL_KEYSã«æå®ããã«ã³ãåºåãã®ãã¼ã¯ã³æ¯ã«ã¯ã¼ã«ã¼ã¹ã¬ãããèµ·åãããã®ãã¼ã¯ã³ãããããSQLã³ãã³ãä¸ã®$(PARALLEL_KEY)
ã«ç½®ãæãã¾ãã
ä¾ã使ã£ã¦èª¬æãã¾ãããã以ä¸ã®ããã«ãã¼ãã£ã·ã§ã³åããããã¼ãã«ãåå¨ãããã¼ãã£ã·ã§ã³ã®åãã¼ãã«æ¯ã«ã¯ã¼ã«ã¼ã¹ã¬ãããèµ·åãã¦ä¸¦åã«ãã¼ãã«ãèªã¿åºãããå ´åã-k
ãªãã·ã§ã³ã次ã®ããã«ä½¿ãã¾ãã
ssbm=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+------------------+-------------------+--------+-------------+---------------+------------+------------- public | lineorder | partitioned table | kaigai | permanent | | 0 bytes | public | lineorder__p1992 | table | kaigai | permanent | heap | 13 GB | public | lineorder__p1993 | table | kaigai | permanent | heap | 13 GB | public | lineorder__p1994 | table | kaigai | permanent | heap | 13 GB | public | lineorder__p1995 | table | kaigai | permanent | heap | 13 GB | public | lineorder__p1996 | table | kaigai | permanent | heap | 13 GB | public | lineorder__p1997 | table | kaigai | permanent | heap | 13 GB | public | lineorder__p1998 | table | kaigai | permanent | heap | 7894 MB | public | lineorder__p1999 | table | kaigai | permanent | heap | 8192 bytes | public | lineorder_unsort | table | kaigai | permanent | heap | 87 GB | (10 rows)
以ä¸ã®ä¾ã§ã¯ããã¼ãã£ã·ã§ã³åãã¼ãã«ã®ãµãã£ãã¯ã¹ã§ããå¹´å·é¨åã$(PARALLEL_KEY)
ã«ãã£ã¦ç½®ãæãã¾ãã
ããããã¨ã-k
ãªãã·ã§ã³ã§æå®ããã«ã³ãåºåãã®ãã¼å¤ãã¨ã«ããããã¯ã¼ã«ã¼ãçæãããçµæã¨ãã¦ããããåã
ãã¼ãã£ã·ã§ã³åãã¼ãã«ã®æ¡ä»¶ãªãã¹ãã£ã³ãå®è¡ãã¦ãã¾ãã
$ pg2arrow -d ssbm -c 'SELECT * FROM lineorder__p$(PARALLEL_KEY)' -o /opt/hoge/f_lineorder.arrow -k=1992,1993,1994,1995,1996,1997,1998 --progress worker:1 SQL=[SELECT * FROM lineorder__p1993] worker:3 SQL=[SELECT * FROM lineorder__p1995] worker:2 SQL=[SELECT * FROM lineorder__p1994] worker:4 SQL=[SELECT * FROM lineorder__p1996] worker:5 SQL=[SELECT * FROM lineorder__p1997] worker:6 SQL=[SELECT * FROM lineorder__p1998] 2024-03-31 20:33:58 RecordBatch[0]: offset=1648 length=268436376 (meta=920, body=268435456) nitems=1303083 by worker:0 2024-03-31 20:33:59 RecordBatch[1]: offset=268438024 length=268436376 (meta=920, body=268435456) nitems=1303083 by worker:3 : : worker:0 merged pending results by worker:4 2024-03-31 20:38:15 RecordBatch[460]: offset=123480734608 length=127664216 (meta=920, body=127663296) nitems=619722 by worker:0 Total elapsed time: 00:04:26
ã¯ã¼ã«ã¼ã®åä½ã«ã¤ãã¦
ããã§ã大éæãªå¦çã®æµãã«ã触ãã¦ããäºã«ãã¾ãã
Pg2Arrowã並åã¢ã¼ãã§èµ·åããã¨ãmainã¹ã¬ããã§ããworker-0ãã¹ãã¼ãå®ç¾©ãä½æãããªã©ã®åæè¨å®ãè¡ãããã®å¾ãä»ã®ã¯ã¼ã«ã¼ãé 次起åãã¦ããã¾ãã
åã¯ã¼ã«ã¼ã¯ããããPostgreSQLã«æ¥ç¶ããããããç´256MBã®ãããã¡((-s
ãªãã·ã§ã³ã§å¤æ´å¯))ãåã¾ããã³ã«åºåå
ã®Arrowãã¡ã¤ã«ã¸ã¨æ¸ãè¾¼ã¿ãè¡ãã¾ãã
Arrowãã¡ã¤ã«ã¯å
é¨ãRecord Batchã¨å¼ã°ãããããã¯ã«åå²ããã¦ããããã¡ã¤ã«ãã¤ã³ã¿ãé²ããé¨åããã¢ãããã¯ã«è¡ã£ã¦ãã¾ãã°ã以éã®æ¸è¾¼ã¿å¦çã¯ãã«ãã¹ã¬ãããè¤æ°ã®å¥åã®é åã«ç¬ç«ãã¦æ¸ãè¾¼ãäºãã§ãã¾ãããã®ãããã·ã¼ã±ã³ã·ã£ã«ã«å®è¡ããªããã°ãªããªãã¯ãªãã£ã«ã«ã»ã¯ã·ã§ã³ã¯æå°éã«æãããã¦ãã¾ãã
ã¾ããã¯ã¨ãªã®æå¾ã¾ã§èªã¿åºããã«ãé¢ããã256MBã®ãããã¡ãåããããªãã£ãå ´åã¯ãé£æ¥ã¹ã¬ããã®ãããã¡ã«ãã¼ã¸ãããæçµçã«ã¯ worker-0 ã®ãããã¡ã«ãã¼ã¸ããã¾ãã®ã§ã並åã¯ã¼ã«ã¼ã®æ°ãå¢ãããã¨ãã¦ã Record Batch ã®å¤§ããã極端ã«å°ããï¼= PG-Stromã§ã®å®è¡å¹çãä½ä¸ããï¼ãã¼ã¿ãä½ãããããã§ã¯ããã¾ããã
çæãã Arrow ãã¡ã¤ã«ãFDWçµç±ã§åç §ãã
ããã§ã¯ãçæãã Arrow ãã¡ã¤ã«ãåç
§ãã¦ã¿ãäºã«ãã¾ãã
åã
ã®ãã¼ãã«å®ç¾©ï¼ã«ã©ã åããã¼ã¿åï¼ãæå®ããã®ã¯é¢åã§ãã®ã§ãIMPORT FOREIGN SCHEMA
æã使ç¨ããã®ããå§ãã§ãã
ssbm=# import foreign schema f_lineorder from server arrow_fdw into public options (file '/opt/hoge/f_lineorder.arrow'); IMPORT FOREIGN SCHEMA ssbm=# \d f_lineorder Foreign table "public.f_lineorder" Column | Type | Collation | Nullable | Default | FDW options --------------------+---------------+-----------+----------+---------+------------- lo_orderkey | numeric | | | | lo_linenumber | integer | | | | lo_custkey | numeric | | | | lo_partkey | integer | | | | lo_suppkey | numeric | | | | lo_orderdate | integer | | | | lo_orderpriority | character(15) | | | | lo_shippriority | character(1) | | | | lo_quantity | numeric | | | | lo_extendedprice | numeric | | | | lo_ordertotalprice | numeric | | | | lo_discount | numeric | | | | lo_revenue | numeric | | | | lo_supplycost | numeric | | | | lo_tax | numeric | | | | lo_commit_date | character(8) | | | | lo_shipmode | character(10) | | | | Server: arrow_fdw FDW options: (file '/opt/hoge/f_lineorder.arrow')
試ãã«ãSSBMã®Q1_2ãèµ°ããã¦ã¿ã¾ãã
ssbm=# select sum(lo_extendedprice*lo_discount) as revenue from f_lineorder, date1 where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35; revenue --------------- 9624332170119 (1 row)
ãã¡ãããArrowãã¡ã¤ã«ã®å
ã¨ãªã£ãlineorder
ãã¼ãã«ã使ã£ã¦ãåãå¤ãè¿ã£ã¦ãã¾ãã
ssbm=# select sum(lo_extendedprice*lo_discount) as revenue from lineorder, date1 where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35; revenue --------------- 9624332170119 (1 row)
並åPg2Arrowã®ããã©ã¼ãã³ã¹
æå¾ã«ããã¡ããæ°ã«ãªã並åPg2Arrowã®ããã©ã¼ãã³ã¹ãè¦ã¦ã¿ãäºã«ãã¾ãã
測å®ç°å¢ã®ã¹ããã¯ã¯ä»¥ä¸ã®éãã
- CPU: AMD EPYC 7402P (24C/2.8GHz) x1
- RAM: 16GB DDR4-3200 (ECC) x8 [total: 128GB]
- SSD: Intel SSD D7-P5510 [U.2/3.84TB] x4 (md-raid0)
- OS: Red Hat Enterprise Linux 8.7
- DB: PostgreSQL 16.2
以ä¸ã®ãªãã·ã§ã³ã§ pg2arrow ãèµ·åãã87GBã®lineorderãã¼ãã«ãå ¨ã¦Arrowå½¢å¼ã«å¤æããã¾ã§ã®æéãè¨æ¸¬ãã¾ããã
$ pg2arrow -d ssbm -c 'SELECT * FROM lineorder_unsort WHERE lo_orderkey % $(N_WORKERS) = $(WORKER_ID)' -o /opt/hoge/f_lineorder.arrow -n N_WORKERS --progress
ã¾ãç·ã®ç¸¦æ£ã並åå®è¡ãªãã®pg2arrowã§87GBã®lineorderãã¼ãã«ãArrowã«å¤æããæã®ãã®ã§ã1,520ç§ããããã25åã¡ããè¦ãã¦ãã¾ãã
ããã-n
ãªãã·ã§ã³ã§ä¸¦åæ°ãå¢ãããå ´åãããã400ç§ï¼6å40ç§ï¼ãå°ãä¸åã£ã辺ãã§é æã¡ã¨ãªã£ã¦ããããã§ãã
ããã¯æ¤ç´¢æ¡ä»¶ï¼lo_orderkey % $(N_WORKERS) = $(WORKER_ID)
ï¼ã§éè¤è¡ãæé¤ãã¦ãããããã¯ã©ã¤ã¢ã³ãæ°ãå¢å ããã«ãããã£ã¦ãåæã« lineorder ãã¼ãã«ãéè¤ãã¦ã¹ãã£ã³ããªããã°ãªããªããªãããã¼ã¿ã«ã®ãããã¡ããã®èªåºãè² è·ãå¢ãã¦ãã¾ã£ãããã¨è¨ããã§ãããã
ä¸æ¹ããã¼ãã£ã·ã§ã³åãã¼ãã«ã®ååãä¸é¨ã-k
ãªãã·ã§ã³ã§ç½®ãæããããããã®ã¯ã©ã¤ã¢ã³ããå®å
¨ã«ç¬ç«ããé åãã¹ãã£ã³ããäºã«ãªã£ããã¿ã¼ã³ã§ã¯ã並å度ã7ã§ããï¼ãããlineorder__p1998
ã®å®¹éã¯ä»ã®åãã¼ãã«ã®ååãªã®ã§ãå®è³ª6.5並åï¼ã«ãé¢ãããã266ç§ã¨ä¸¦åå®è¡ãªãã®ãã¿ã¼ã³ã«æ¯ã¹ã¦5.7åã®å®è¡æéãè¨é²ãã¦ãã¾ãã
ãã®äºããã並åPg2Arrowã®ããã©ã¼ãã³ã¹åä¸ã享åããã«ã¯ã以ä¸ã®ç¹ã«æ³¨æãæãå¿ è¦ãããã§ãããã
- åç´ã«ä¸¦å度ãå¢ããã ãã§ãå¹æã¯ããããåãé åãéè¤ãã¦ã¹ãã£ã³ããå¦çãå¢ããã¨ãå¹æã¯éå®çã«ãªããã¡ã
- PostgreSQLãã¹ãã£ã³ããéãæ¸ãããããªæ¤ç´¢æ¡ä»¶ã®ä¸ãæ¹ãæã¾ããããã¼ãã£ã·ã§ã³ãªã©ã§åå²ããã¦ãããçæ³çã
*1:ãã£ã¨æ©ãå¯ãï½
PG-Strom v5.0
ããã¶ããç¡æ²æ±°ã®ããã°è¨äºã¨ãªãã¾ããã
ä»åã¯ãè¨è¨ãä¸æ°ãã¦éããé å¼·ã«ãªã£ã PG-Strom v5.0 ããç´¹ä»ãã¾ãã
ãªãåè¨è¨ãå¿ è¦ã ã£ãã®ãï¼
åãã¼ã¸ã§ã³ã® PG-Strom v3.x ã·ãªã¼ãºã®åºæ¬çãªè¨è¨ã¯ã2018å¹´ã®PG-Strom v2.0ã®é ãã大ããå¤ãã£ã¦ãã¾ããã
å½æã®ææ°GPUã¢ãã«ã¯ Volta ä¸ä»£ï¼TESLA V100ï¼ã§ãCUDAã®ãã¼ã¸ã§ã³ã¯9.2ã§ããããããªãã®å¤§æã¨ããäºã¯ãåããé ããã¨æãã¾ãã
ãã®é ãPG-Stromã®éçºã«ããã¦æåªå
ãã¹ã課é¡ã¯ãå
ãå®ç¨ã¨ãªããã¼ã¸ã§ã³ããªãªã¼ã¹ããäºã§ãããï¼â» HeteroDB社ã®åµæ¥ã¯2017å¹´7æã§ãï¼
ã¯ã¨ãªã®å¦çé度ãé«éåããäºã¯å½ç¶ãªã®ã§ããããã以ä¸ã«ãã¾ã PG-Stromã®å
é¨ã¤ã³ãã©ãååã«æ¯ãã¦ããªãä¸ã§ãã¯ã©ãã·ã¥ããã«èµ°ãåãäºãããã°ããã£ãã¨ãã¦ã容æã«åå ç®æãç¹å®ã§ããäºãåªå
ã§ãã£ãã®ã§ããã¾ããGPUå´ã§SQLãå®è¡ããããã¤ã¹ã³ã¼ãã«ãã¦ããæ§ã
ãªå®è£
æ¹å¼ã«ãã©ã¤ãã¦ãã®ä¸ã§æè¯ãé¸æããã¨ããããããå
ãã¯åºãã¨ãåè² ã§ãåãã¢ãããåªå
ããã¨ããç¶æ³ã§ããã
ãã ãã®å¾ãæ°å¹´ãçµã¦ãæããã«ãªã£ã¦ããåé¡ãè¤æ°ããããããã¯ã©ããã®ã¿ã¤ãã³ã°ã§å¤§è¦æ¨¡ãªãªãã¡ã¯ã¿ãªã³ã°ãè¡ããããå¾ãªãã¨èãã¦ãã¾ãããä¾ãã°ä»¥ä¸ã®ãããªåé¡ç¹ã§ãã
åé¡â ï¼CUDA Contextãæ¶è²»ãããªã½ã¼ã¹
ãåç¥ã®ããã«PostgreSQLã¯ãã«ãããã»ã¹ã§åä½ãã¾ããã¯ã©ã¤ã¢ã³ãããã®æ¥ç¶ãçºçãããã³ã«ããã¯ã¨ã³ãããã»ã¹ãfork(2)ãããã®ããã»ã¹ãSQLå¦çã®å¤§åãæ
ãã¾ããã¾ãããµã¤ãºã®å¤§ããªãã¼ãã«ãã¹ãã£ã³ããæãªã©ãä¸æçãªã¯ã¼ã«ã¼ããã»ã¹ãèµ·åãã¦SQLå¦çã並åã«å®è¡ããäºãããã¾ãã
PG-Stromã®è¿½å ããå®è¡è¨ç»ï¼GpuScanãGpuJoinãGpuPreAggï¼ãæ¡ç¨ãããå ´åããããã¯GPUã使ç¨ããããã«CUDA Contextã¨ãããã®ãä½æãã¾ãããããCUDA Contextãä½æããã¨ãããèªä½ãGPUã®ãªã½ã¼ã¹ï¼ããã¤ã¹ã¡ã¢ãªæ°ç¾MBï½ï¼ãæ¶è²»ãã¦ãã¾ããããPostgreSQLã¸ã®åææ¥ç¶æ°ãå¢å ããã¨ãã¯ã¼ãã³ã°ã«å©ç¨ã§ããã¡ã¢ãªãã»ã¨ãã©æ®ããªãäºã«ãªãã¾ãã
åé¡â¡ï¼è¤éãããGPUã³ã¼ãèªåçæãã¸ãã¯
2012å¹´ã«PG-Stromã®æåã®ãããã¿ã¤ããä½æããæãããPG-Stromã¯SQLã¨ãã¦ä¸ããããScanæ¡ä»¶å¼ï¼WHEREå¥ï¼ãJoinçµåæ¡ä»¶ï¼ONï½å¥ï¼ããèªåçã«CUDA C++ã®ã½ã¼ã¹ã³ã¼ããçæãããããå®è¡æã³ã³ãã¤ã«ãã¦GPUç¨ã®ãã¤ãã£ããã¤ããªãçæãã¦ãã¾ããã
ããããæ§ã
ãªç¶æ³ã«å¯¾å¿ãã¦SQLããCUDA C++ç¨ã®ã½ã¼ã¹ã³ã¼ããçæãããã¸ãã¯ã¯é常ã«è¤éã§ãä¾ãã°ãGpuJoinç¨ã®ã½ã¼ã¹ã³ã¼ãçæãå«ãsrc/gpujoin.c
ã¯8500è¡è¿ãã®è¦æ¨¡ããããã½ã¼ã¹ã³ã¼ããä¿å®ããä¸ã§ããªãæ©ã¾ããåé¡ãæ±ãã¦ãã¾ãããï¼è¦ã¯ã¹ãã²ããã£ï¼
åé¡â¢ï¼æä½éå¿ è¦ãª300ms
PG-Strom v3.x以åã¯åããã¯ã¨ã³ãããã»ã¹ãCUDA Contextãä½æããGPUã®ã¡ã¢ãªå²ãå½ã¦ãã¿ã¹ã¯ï¼GPU Kernelï¼ã®æå
¥ãè¡ã£ã¦ãã¾ããã
ãã®CUDA Contextã®ä½æã«ã¯å®ã¯å°ãæéãããããããã100ï½150msç¨åº¦ã®é
延ãä¸å¯é¿ã§ãããã¾ããCUDA C++ã®ã½ã¼ã¹ã³ã¼ããä½æãããããGPUåãã®ãã¤ããªã«ã³ã³ãã¤ã«ããéã«ããæä½ã§200msç¨åº¦ã®æéãããã£ã¦ãã¾ããï¼ãã¡ããå¦çã®è¤éãã«ããã¾ãï¼ã
ä½åç§ããããå¦çãªãã¨ããããæ°åmsç¨åº¦ã®å¿çé度ãè¦æ±ãããã¯ã¨ãªã§ãã®ããã«ãã£ã¯å²ã¨å³ãããã®ãããã¾ãã
åé¡â£ï¼NVIDIA GPU以å¤ã¸ã®æ¡å¼µæ§
ããã¯ç¾æç¹ã§ã¯å¯è½æ§ã®è©±ã«ããã¾ããããä¾ãã°ãPG-Stromã®ä»çµã¿ãComputational Storage Drive (CSD)ã«å®è£ ããäºãã§ããã°ãã¹ãã¬ã¼ã¸å´ã®ããã»ããµã§ScanãJoinãGroupByã¨ãã£ãSQLå¦çã®ä¸é¨ãå®è¡ããããProjectionå¦çãè¡ãäºã§è¢«åç §åã®ã¿ããã¹ãã«è¿ãã¨ããåæåãã¼ã¿æ§é ã«è¿ãäºãã§ããã¯ãã§ããããããPG-StromãCUDA C++ãåæã¨ããã½ã¼ã¹ã³ã¼ãçæã«æ³¨åãã¦ããå ´åãCSDã§å®è¡å¯è½ãªå¦çã®èªåçæé¨åãäºéã«æã¤äºã¨ãªããéçºå¹ç以ä¸ã«ã½ããã¦ã§ã¢å質çã«æ©ã¾ããåé¡ãæ±ããäºã¨ãªãããã§ãã
PG-Strom v5.0ã®ã¢ã¼ããã¯ãã£
ãããã®åé¡ã解決ãããããPG-Strom v5.0ã§ã¯ã¾ãPostgreSQLã®åããã¯ã¨ã³ãããã»ã¹ãCUDA Contextãæã¤æ§é ãå»æ¢ãã¾ããã
代ããã«ã常é§ããã»ã¹ã§ããPG-Strom GPU-Serviceã ããå¯ä¸GPUã¨ç¸å¯¾ãã¦ãªã½ã¼ã¹ã®ç®¡çãã¿ã¹ã¯ã®æå
¥ãè¡ãã¾ãã
PostgreSQLã®ããã¯ã¨ã³ãããã»ã¹ï¼ã§åä½ããPG-Stromã®CustomScanãã³ãã©ï¼ã¯ãããã»ã¹ééä¿¡ãéãã¦GPU-Serviceã«ãªã¯ã¨ã¹ããéåºãããã®å¦ççµæãå¾
ã¤ã ãã§ããGPU-Serviceã¯ãã«ãã¹ã¬ããã§åä½ããpg_strom.max_async_tasks
ãä¸éã¨ãã¦ä¸¦åã«ã¿ã¹ã¯ãå¦çããäºãã§ãã¾ãã
以ä¸ã®æ¨¡å¼å³ãã覧ãã ããã
PostgreSQLããã¯ã¨ã³ãããã»ã¹ãåã
ã«GPUã管çããå ´åã¨ãPG-Strom GPU-Serviceã ããGPUã管çããä»ã®ããã»ã¹ã¯GPU-Serviceã«ãªã¯ã¨ã¹ããéåºããã¢ãã«ã§ã¯ãCUDAã«ãã£ã¦æ¶è²»ãããGPUããã¤ã¹ã¡ã¢ãªã®éã段éãï¼ç¹ã«ã¯ã©ã¤ã¢ã³ãæ°ãå¤ãå ´åï¼ã§ããäºãããåããã¨æãã¾ãã
ããã«ãã®æ§é ã¯ãCUDA Contextãåæåããã¨ãï¼cuCtxCreate()
ï¼ã®100msï½150msé
延ã®åé¡ã解決ãã¾ãããªããªããCUDA Contextã¯GPU-Serviceã®èµ·åæã«æ¢ã«ä½ææ¸ã¿ã§ãããã«æ¯ã¹ãã°UNIXãã¡ã¤ã³ã½ã±ãããéãã¦GPU-Serviceã¨ã®éã«ã³ãã¯ã·ã§ã³ã確ç«ããå¦çæéãªã©å¾®ã
ãããã®ã«ãããããªãããã§ãã
CUDA C++ãã¤ãã£ãã³ã¼ããããçä¼¼å½ä»¤ã³ã¼ãã¸
ããä¸ã¤ãããã¾ã§SQLã¯ã¼ã¯ãã¼ããGPUã§å®è¡ããããã«CUDA C++ã½ã¼ã¹ã³ã¼ããçæãããããå®è¡æã³ã³ãã¤ã©ï¼NVRTCï¼NVIDIA Run Time Compilerï¼ãæé©åãå®è¡æãã¤ããªã®çæã¨ããã¹ããããè¸ãã§ãã¾ããããGpuJoinãªã©ã§CUDA C++ã½ã¼ã¹ã³ã¼ããçæããããã®ãã¸ãã¯ãè¤éã«ãªããããäºãããPG-Strom v5.0ã§ã¯çä¼¼å½ä»¤ã³ã¼ããçæããããã«ãªãã¾ããã
以ä¸ã®å®è¡è¨ç»ãã覧ãã ããã
ããã¯æ§ã
ãªæ¡ä»¶ã§çµãè¾¼ã¿ãè¡ã£ãdate1
ãã¼ãã«ã¨lineorder
ãã¼ãã«ãJOINããlo_extendedprice*lo_discount
ã®çµæãéè¨ããã¯ã¨ãªã®å®è¡è¨ç»ã§ãã
VERBOSEãªãã·ã§ã³ãä»å ããã¨ãGpuPreAggãã©ã³ã®ä¸ã®æ¹ã«ãxxx OpCodeãã¨ãããã®ãåºåããã¦ãã¾ããï¼â» VERBOSEãªãã·ã§ã³æãã ã¨ããã¾ã§è³ãããªEXPLAINåºåã«ã¯ãªãã¾ããï¼
ãã®OpCodeã¨ããã®ã¯ãGPUä¸ã§å®è¡ããæ¼ç®åãååç
§ã®æé ããã¤ããªå½¢å¼ã§ãããã³ã°ãããã®ã§ãEXPLAINã®åºåã§ã¯å¯èªãªå½¢å¼ã«ç´ãããã®ãåºåãã¦ãã¾ãã
å¾æ¥ã¯ãããã«çæããCUDA C++ã®ã½ã¼ã¹ã³ã¼ãã®ãã¡ã¤ã«åãåºåããã¦ãã¾ããããããNVRTCã«æ¸¡ãã¦å®è¡æã³ã³ãã¤ã«ããGPUç¨ã®ãã¤ããªãçæããããã§ãã
=# explain verbose select sum(lo_extendedprice*lo_discount) as revenue from lineorder,date1 where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=12713126.00..12713126.01 rows=1 width=32) Output: pgstrom.sum_fp_num((pgstrom.psum(((lineorder.lo_extendedprice * lineorder.lo_discount))::double precision))) -> Custom Scan (GpuPreAgg) on public.lineorder (cost=12713125.99..12713126.00 rows=1 width=32) Output: (pgstrom.psum(((lineorder.lo_extendedprice * lineorder.lo_discount))::double precision)) GPU Projection: pgstrom.psum(((lineorder.lo_extendedprice * lineorder.lo_discount))::double precision) GPU Scan Quals: ((lineorder.lo_discount >= '1'::numeric) AND (lineorder.lo_discount <= '3'::numeric) AND (lineorder.lo_quantity < '25'::numeric)) [rows: 2400065000 -> 315657500] GPU Join Quals [1]: (date1.d_datekey = lineorder.lo_orderdate) ... [nrows: 315657500 -> 45076290] GPU Outer Hash [1]: lineorder.lo_orderdate GPU Inner Hash [1]: date1.d_datekey GPU-Direct SQL: enabled (GPU-0) KVars-Slot: <slot=0, type='numeric', expr='lineorder.lo_discount'>, <slot=1, type='numeric', expr='lineorder.lo_quantity'>, <slot=2, type='float8', expr='(lineorder.lo_extendedprice * lineorder.lo_discount)'>, <slot=3, type='numeric', expr='lineorder.lo_extendedprice'>, <slot=4, type='int4', expr='date1.d_datekey'>, <slot=5, type='int4', expr='lineorder.lo_orderdate'> KVecs-Buffer: nbytes: 192512, ndims: 3, items=[kvec0=<0x0000-dfff, type='numeric', expr='lo_discount'>, kvec1=<0xe000-1bfff, type='numeric', expr='lo_quantity'>, kvec2=<0x1c000-29fff, type='numeric', expr='lo_extendedprice'>, kvec3=<0x2a000-2c7ff, type='int4', expr='d_datekey'>, kvec4=<0x2c800-2efff, type='int4', expr='lo_orderdate'>] LoadVars OpCode: {Packed items[0]={LoadVars(depth=0): kvars=[<slot=5, type='int4' resno=6(lo_orderdate)>, <slot=1, type='numeric' resno=9(lo_quantity)>, <slot=3, type='numeric' resno=10(lo_extendedprice)>, <slot=0, type='numeric' resno=12(lo_discount)>]}, items[1]={LoadVars(depth=1): kvars=[<slot=4, type='int4' resno=1(d_datekey)>]}} MoveVars OpCode: {Packed items[0]={MoveVars(depth=0): items=[<slot=0, offset=0x0000-dfff, type='numeric', expr='lo_discount'>, <slot=3, offset=0x1c000-29fff, type='numeric', expr='lo_extendedprice'>, <slot=5, offset=0x2c800-2efff, type='int4', expr='lo_orderdate'>]}}, items[1]={MoveVars(depth=1): items=[<offset=0x0000-dfff, type='numeric', expr='lo_discount'>, <offset=0x1c000-29fff, type='numeric', expr='lo_extendedprice'>]}}} Scan Quals OpCode: {Bool::AND args=[{Func(bool)::numeric_ge args=[{Var(numeric): slot=0, expr='lo_discount'}, {Const(numeric): value='1'}]}, {Func(bool)::numeric_le args=[{Var(numeric): slot=0, expr='lo_discount'}, {Const(numeric): value='3'}]}, {Func(bool)::numeric_lt args=[{Var(numeric): slot=1, expr='lo_quantity'}, {Const(numeric): value='25'}]}]} Join Quals OpCode: {Packed items[1]={JoinQuals: {Func(bool)::int4eq args=[{Var(int4): slot=4, expr='d_datekey'}, {Var(int4): kvec=0x2c800-2f000, expr='lo_orderdate'}]}}} Join HashValue OpCode: {Packed items[1]={HashValue arg={Var(int4): kvec=0x2c800-2f000, expr='lo_orderdate'}}} Partial Aggregation OpCode: {AggFuncs <psum::fp[slot=2, expr='(lo_extendedprice * lo_discount)']> arg={SaveExpr: <slot=2, type='float8'> arg={Func(float8)::float8 arg={Func(numeric)::numeric_mul args=[{Var(numeric): kvec=0x1c000- 2a000, expr='lo_extendedprice'}, {Var(numeric): kvec=0x0000-e000, expr='lo_discount'}]}}}} Partial Function BufSz: 16 -> Seq Scan on public.date1 (cost=0.00..78.95 rows=365 width=4) Output: date1.d_datekey Filter: (date1.d_year = 1993) (22 rows)
ããããPG-Strom v3.xãèªåçæããCUDA C++ã³ã¼ãã¯ãã©ã¤ãã©ãªé¨åãäºããã«ããã¦ããæ¹å¼ã«åãæ¿ããäºãããå®è³ªçã«ã¯SQLæãä¸ãããããã³ã«å¤åããå¶å¾¡æ§é ãã½ã¼ã¹ã³ã¼ãèªåçæã¨ããå½¢ã§å¸åãã¦ããã¨ãè¨ãã¾ãã
ããã§ããã®å¶å¾¡æ§é èªä½ãGPUå´ã¸æã¡è¾¼ãã°ï¼ï¼æ¡ä»¶åå²ã«ç¸å½ããé¨åã¯äºãé¢æ°ãã¤ã³ã¿ãã»ãããããªã©ãã¦å®è¡ã³ã¹ããæããï¼ãå®è¡æã³ã³ãã¤ã«ã®æéãçããã¨èããããã§ãã
å®éã«ãæ¯è¼çè¦æ¨¡ã®å°ããªãã¼ãã«ï¼800ä¸ä»¶ï¼ã®éè¨å¦çã PG-Strom v3.5 㨠PG-Strom v5.0 ã§æ¯è¼ãã¦ã¿ã¾ãã
- PG-Strom v3.5
ssbm=# select count(*) from lineorder_8m where lo_orderpriority = '2-HIGH'; count --------- 1604233 (1 row) Time: 1132.471 ms (00:01.132)
- PG-Strom v5.0
=# select count(*) from lineorder_8m where lo_orderpriority = '2-HIGH'; count --------- 1604233 (1 row) Time: 114.781 ms
å ¨ä½ã§æ°åç§ï½ãè¦ããã¯ã¨ãªã§ããã°åæã»ããã¢ããã®æéå·®ã¯å¤§ããå½±é¿ãã¾ããããæ¯è¼çå°ããªãã¼ã¿ã»ããã§ããã°ãã¯ã¨ãªã®å®è¡æéãé å¼µã£ã¦éããã¦ãããä¸å®ã®é度以ä¸ã«ã¯é«éåã§ããªãã¨ããåé¡ãããã¾ãããããããv5.0ã§ã¯GPU-Serviceãæ¢ã«CUDA Contextãåæåãã¦ããä¸ãã³ã¼ãã®ã³ã³ãã¤ã«ï¼æé©åãä¸è¦ã§ãããããæ¯è¼çå°ããªãã¼ãã«ã§ãã£ã¦ãGPUå¦çã®æ©æµãå¾ãããã¨ããã¡ãªãããããã¾ãã
GPU-Serviceã®ãã«ãã¹ã¬ããåã¨ããã©ã¼ãã³ã¹æ¹å
v5.0ã§ã®å¤§è¦æ¨¡ãªãªãã¡ã¯ã¿ãªã³ã°ã«ãã£ã¦ãGPUã管çããã®ã¯GPU-Serviceããã»ã¹ä¸åã ãã«çµãããããã«GPU-Serviceã¯ãã«ãã¹ã¬ããã«ããPostgreSQLããã¯ã¨ã³ãããã»ã¹ããã®ãªã¯ã¨ã¹ãã次ã ã¨æãã¦ããã¾ããããã¯GPUãCUDAã®ã¬ã¤ã¤ããè¦ãã¨å¤§ããªå¤åã§ã並åã«åä½ããPostgreSQLããã¯ã¨ã³ãããã®è¦æ±ãå¦çãããã³ã«CUDA Contextãåãæ¿ããå¿ è¦ããªããªããGPU Kernelãèµ·åããéã®ã¹ã«ã¼ããããåä¸ãã¾ãã
ããã¯å¦çé度ãã·ãã¢ãªç¶æ³ã§å¼ã£ããã£ã¦ããäºããããä¾ãã°ããã¯åãæ§æã®ãµã¼ã*1ä¸ã§Star Schema Benchmark (SSBM)ãå®è¡ããå ´åããã®SSDã®SeqReadé度ã¯6500MB/sã§ãã®ã§ãçè«ä¸ã4æ¬æããå ´åã¯26,000MB/sã¾ã§ã®èªåºãã¹ã«ã¼ããããçºæ®ã§ããã¯ãã§ãã
ããããv3.5ã®çµæãè¦ãã¨20GB/sç¨åº¦ã§æ§è½å¤ãé æã¡ã«ãªã£ã¦ããä¸æ¹ãv5.0ã§ã¯24GB/sç¨åº¦ã¾ã§å¦çæ§è½ã伸ã³ã¦ããäºãåããã¾ããããªãå¤ãã®é¨åã§ä¿®æ£ãå ãããã¦ãããããããã ããé«éåã®è¦å ã¨ããããã§ã¯ãªãã§ãããããv5.0ã«ãªãGPUã¸ã¿ã¹ã¯ãæ¾ãè¾¼ãã¹ã±ã¸ã¥ã¼ãªã³ã°ãããæ´ç·´ãããããã«ãªã£ã¦ããã¨ããäºãåããã¾ãã
ã¾ã¨ã
PG-Stromã«ããããããå é¨ã¢ã¼ããã¯ãã£ã®ä¸æ°ã¯ãå®å®æ§ã»ä¿å®æ§ã大ããé«ããã¨å ±ã«ãGPU-Direct SQLã§ãããã¼ãã¦ã§ã¢çè«é度ã«è¿ãããã©ã¼ãã³ã¹ãçºæ®ããããã«CUDA Contextã®çæãCUDA C++ã³ã¼ãã®ã³ã³ãã¤ã«ã«è¦ããæéã®åæ¸å¹æã§ãã¨ãããæ¯è¼çå°ããªãã¼ã¿ã»ããï¼ï½20GBç¨åº¦ï¼ã§ãã£ã¦ãGPUå©ç¨ã®å¹æãå®æã§ããããã«ãªãã¾ããã
ãããPG-Strom v5.0ã®ç¹å¾´ãä¿®æ£ç¹ã«ã¤ãã¦ã¯ãææ¥ï¼3/15ï¼ã®ã»ããã¼ã§ã話ãããã¦ããã ãã¾ãã®ã§ããã²ãã¡ããä½µãã¦ãåå ããã ããã°ã¨æãã¾ãã
FluentdåãApache Arrowãã©ã°ã¤ã³ã«ã¤ãã¦
æ§æ³ã¯åå¹´ã»ã©åï¼ããä¸ã¶æã»ã©éä¸ãã¦éçºã«åãçµãã§ãããFluentdåãApache Arrowãã©ã°ã¤ã³ãããããåãããã«ãªã£ãã®ã§ãä»åã¯ãã¡ãã®ã¢ã¸ã¥ã¼ã«ã«ã¤ãã¦ãç´¹ä»ãã¾ãã
ããããPG-Stromã¯ãIoT/M2Mé åã§å¤§éã«çºçãããã¼ã¿ãé«éã«å¦çã§ãã¾ãã¨ããã®ãã»ã¼ã«ã¹ãã¤ã³ãã§ãGPU-Direct SQLã¯ãããå種ã®æ©è½ã«ãã£ã¦ãããå®ç¾ãã¦ããã¯ã±ã§ãããå®éã«éç¨ããéã«ã¯ãçºçãããã¼ã¿ããã©ããã£ã¦SQLã§å¦çã§ããããDBã«ã¤ã³ãã¼ããããï¼ãã¨ããåé¡ãããã¾ãã
ä¾ãã°ãPostgreSQLã«ä¸è¡ãã¤INSERTããã¨ããã®ãä¸ã¤ã®è§£ã§ãããã ããåç´ãªI/Oã«æ¯ã¹ãã¨ãDBã¸ã®æ¸ãè¾¼ã¿ã¯ã©ããã¦ãå¦çããã«ããã¯ã«ãªããã¡ã§ãã
ããã§ã大éã«åéãããã°ãã¼ã¿ããå°ãªãæéãã¹ã§ï¼ã¤ã¾ãä¸æãã¡ã¤ã«ã«ä¿åãããã¼ã¿ãå度DBã«ã¤ã³ãã¼ããããªã©ã®æéããããäºãªãï¼æ¤ç´¢ãéè¨ã§ããç¶æ ã«æã£ã¦è¡ãããã«ã以ä¸ã®ããã« Fluentd ãã Apache Arrow å½¢å¼ãã¡ã¤ã«ãåºåãããããç´æ¥ PG-Strom ããèªã¿åºãã¨ããã¹ãã¼ã ãä½ãã¾ããã
Fluentdã¨ã¯ Treasure Data ã®å¤æ©è²ä¹æ°ã«ãã£ã¦éçºããããã°åéãã¼ã«ã§ãSyslogã®ãããªãµã¼ããã°ããIoT/M2Mæ©å¨ã®ããã¤ã¹ãã°ã«è³ãã¾ã§ãå¤ç¨®å¤æ§ãªãã°ãã¼ã¿ãéç©ã»ä¿åããããã«äºå®ä¸ã®ã¹ã¿ã³ãã¼ãã¨ãã¦å©ç¨ããã¦ããã½ããã¦ã§ã¢ã§ãã
Ruby ã§è¨è¿°ããããã©ã°ã¤ã³ã追å ããäºã§ããã°ãã¼ã¿ã®å
¥åºåãå å·¥ãèªå¨ã«ã«ã¹ã¿ãã¤ãºãããã¨ãã§ãã¾ãã
arrow-file ãã©ã°ã¤ã³
Fluentdã®ãã©ã°ã¤ã³ã«ã¯ããã¤ãã«ãã´ãªããããå¤é¨ãããã°ãåãåãInputãã©ã°ã¤ã³ããã°ãæå½¢ããParserãã©ã°ã¤ã³ãåä¿¡ãããã°ãä¸æçã«èç©ããBufferãã©ã°ã¤ã³ããã°ãåºåããOutputãã©ã°ã¤ã³ããªã©ã®ç¨®é¡ãããã¾ãã
Fluentdããã°ãåãåãã¨ãInput/Parserãã©ã°ã¤ã³ã«ãã£ã¦ãã°ã¯å
±éã®å
é¨å½¢å¼ã¸ã¨å¤æããã¾ãã
ããã¯ããã°ã®æ¯ãåãã«å©ç¨ã§ããèå¥åã®tag
ããã°ã®ã¿ã¤ã ã¹ã¿ã³ãtime
ããã³ãçãã°ãæ´å½¢ããé£æ³é
åã§ããrecord
ã§ãã
Bufferãã©ã°ã¤ã³ã¯ããã°ã Output ãã©ã°ã¤ã³ã«æ¸¡ãã¦æ¸ãåºãã¾ã§ã®éãä¸æçã«ãããä¿æãã¾ããããã«ããã渡ãã¾ã§ã®éãä¸æçã«ãããä¿æãã¾ããããã«ãããè¤æ°ã¬ã³ã¼ããã¾ã¨ãã¦æ¸ãè¾¼ãäºã§åºåã®ããã©ã¼ãã³ã¹ãåä¸ããããé害æã®ãªãã©ã¤ãåç´åããäºãã§ãã¾ãã
æå¾ã«ãOutputãã©ã°ã¤ã³ãBufferãã©ã°ã¤ã³ãã渡ããããã°ãããããã®ãã©ã°ã¤ã³ã«å¿ããåºåå
ã«æ¸ãåºãã¾ãã
ä»åãä½æããfluent-plugin-arrow-file
ã¢ã¸ã¥ã¼ã«ã¯ããã® Output ãã©ã°ã¤ã³ã«ç¸å½ãããã®ã§ãåºåå
ã¨ãã¦æå®ããããã¡ã¤ã«ã« Apache Arrow ãã¡ã¤ã«å½¢å¼ã§æ¸ãè¾¼ã¿ã¾ãã
ã¤ã³ã¹ãã¼ã«
ããã§ã¯ãTreasure Data社ã®æä¾ãã Fluentd ã®å®å®æ¿ td-agent ãå©ç¨ãã¾ãã
ã¾ããarrow-fileãã©ã°ã¤ã³ã®ã¤ã³ã¹ãã¼ã«ã«ã¯rake-compiler
ã¢ã¸ã¥ã¼ã«ãå¿
è¦ã§ãã®ã§ãäºãã¤ã³ã¹ãã¼ã«ãã¦ããã¾ãã
Fluentdã®ã¤ã³ã¹ãã¼ã«è©³ç´°ã«ã¤ãã¦ã¯ãこちらãåç §ãã¦ãã ããã
$ curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent4.sh | sh $ sudo /opt/td-agent/bin/fluent-gem install rake-compiler
次ã«ãPG-Stromã®ã½ã¼ã¹ã³ã¼ãããã¦ã³ãã¼ãããfluentd ãã£ã¬ã¯ããªä»¥ä¸ã®ç©ä»¶ããã«ããã¾ãã
$ git clone https://github.com/heterodb/pg-strom.git $ cd pg-strom/fluentd $ make TD_AGENT=1 gem $ sudo make TD_AGENT=1 install
Fluentdã®ãã©ã°ã¤ã³ãã¤ã³ã¹ãã¼ã«ããã¦ããäºã確èªããããã以ä¸ã®ã³ãã³ããå®è¡ãã¾ãã
fluent-plugin-arrow-file
ã表示ããã¦ããã°ãã¤ã³ã¹ãã¼ã«ã¯æåã§ãã
åããã¦ã¿ã
ã§ã¯å®éã«åããã¦ã¿ãäºã«ãã¾ãã
ç°¡åãªä¾ã¨ãã¦ããã¼ã«ã«ã®Apache Httpdãµã¼ãã®ãã°ãç£è¦ããããããã£ã¼ã«ãæ¯ã«ãã¼ã¹ãã¦Apache Arrowå½¢å¼ãã¡ã¤ã«ã«æ¸ãè¾¼ã¿ã¾ãã
<source>ã§/var/log/httpd/access_log
ããã¼ã¿ã½ã¼ã¹ã¨ãã¦æå®ãã¦ããã»ããapache2ã®Parseãã©ã°ã¤ã³ãç¨ãã¦ãhost, user, time, method, path, code, size, referer, agentã®åãã£ã¼ã«ããåãåºãã¦ãã¾ãã
ï¼ããã¯å
¬å¼ãµã¤ãã®Exampleããã®ã³ããã§ãï¼
å¾åã®<match>以ä¸ãarrow-fileãã©ã°ã¤ã³ã®è¨å®ã§ãã
path
ã§åºåå
ãæå®ãã¦ãã¾ããããã§ã¯/tmp/mytest%Y%m%d.%p.arrow
ã¨è¨è¿°ãã¦ãã¾ãããæ¸ãè¾¼ã¿æã«ã%Y
ã%m
ã%d
ã¯ããããå¹´ãæãæ¥ã«ã%p
ã¯ããã»ã¹ã®PIDã«ç½®ãæãããã¾ãã
schema_defs
ã§ã¯ãåºåå
Apache Arrow ãã¡ã¤ã«ã®ã¹ãã¼ãæ§é ãå®ç¾©ãã¾ãã
tsãã¿ã¤ã ã¹ã¿ã³ããhostãmethodãpathãrefererãagentãããããæååï¼Utf8ï¼ã§ãcodeã¨sizeã¯Int32ã§è¨å®ãã¦ãã¾ãã
ã¾ãããããã¡ã«é¢ãã¦ã¯ããå°ã大ããªãµã¤ãºãæå®ãã¹ãã§ãããããã§ã¯åä½ç¢ºèªã®ããæ¯è¼çå°ããªãµã¤ãºï¼4MBã200è¡ï¼ã§ããã¤æ¸ãåºãã®ã¤ã³ã¿ã¼ãã«ã10sã«æå®ãã¦ãã¾ããå®éã«ã¯PG-StromãGPU-Direct SQLãçºåããã®ã«åãããµã¤ãºã®ãããã¡ãµã¤ãºãæå®ããäºããå§ããã¾ããï¼ä¾ãã°ããã©ã«ãå¤ã® 256MB ãªã©ï¼
<source> @typetail path /var/log/httpd/access_log pos_file /var/log/td-agent/httpd_access.pos tag httpd format apache2 <parse> @typeapache2 expression /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>(?:[^\"]|\\.)*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>(?:[^\"]|\\.)*)" "(?<agent>(?:[^\"]|\\.)*)")?$/ time_format %d/%b/%Y:%H:%M:%S %z </parse> </source> <match httpd> @typearrow_file path /tmp/mytest%Y%m%d.%p.arrow schema_defs "ts=Timestamp[sec],host=Utf8,method=Utf8,path=Utf8,code=Int32,size=Int32,referer=Utf8,agent=Utf8" ts_column "ts" <buffer> flush_interval 10s chunk_limit_size 4MB chunk_limit_records 200 </buffer> </match>
ãã¦ãtd-agentãèµ·åãã¾ãã
sudo systemctl start td-agent
以ä¸ã®ããã«ãApache Httpdã®ãã°ã path ã§è¨å®ãã /tmp/mytest%Y%m%d.%p.arrow
ãå±éãããå
ã§ãã /tmp/mytest20220124.3206341.arrow
ã«æ¸ãåºããã¦ãã¾ãã
ä¸èº«ãè¦ã¦ã¿ãã¨ãããã£ã½ãæãã«ãªã£ã¦ããã®ãåããã¾ãã
$ arrow2csv /tmp/mytest20220124.3206341.arrow --head --offset 300 --limit 10 "ts","host","method","path","code","size","referer","agent" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/js/theme_extra.js",200,195,"http://buri/docs/ja/fluentd/","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/js/theme.js",200,4401,"http://buri/docs/ja/fluentd/","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/img/fluentd_overview.png",200,121459,"http://buri/docs/ja/fluentd/","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/search/main.js",200,3027,"http://buri/docs/ja/fluentd/","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/fonts/Lato/lato-regular.woff2",200,182708,"http://buri/docs/ja/css/theme.css","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/fonts/fontawesome-webfont.woff2?v=4.7.0",200,77160,"http://buri/docs/ja/css/theme.css","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/fonts/RobotoSlab/roboto-slab-v7-bold.woff2",200,67312,"http://buri/docs/ja/css/theme.css","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:42","192.168.77.95","GET","/docs/ja/fonts/Lato/lato-bold.woff2",200,184912,"http://buri/docs/ja/css/theme.css","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:43","192.168.77.95","GET","/docs/ja/search/worker.js",200,3724,"http://buri/docs/ja/fluentd/","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36" "2022-01-24 06:13:43","192.168.77.95","GET","/docs/ja/img/favicon.ico",200,1150,"http://buri/docs/ja/fluentd/","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36"
ããã PG-Strom ã®Arrow_Fdwãç¨ãã¦PostgreSQLã«ãããã³ã°ãã¦ã¿ã¾ãã
postgres=# IMPORT FOREIGN SCHEMA mytest FROM SERVER arrow_fdw INTO public OPTIONS (file '/tmp/mytest20220124.3206341.arrow'); IMPORT FOREIGN SCHEMA postgres=# SELECT ts, host, path FROM mytest WHERE code = 404; ts | host | path ---------------------+---------------+---------------------- 2022-01-24 12:02:06 | 192.168.77.73 | /~kaigai/ja/fluentd/ (1 row) postgres=# EXPLAIN SELECT ts, host, path FROM mytest WHERE code = 404; QUERY PLAN ------------------------------------------------------------------------------ Custom Scan (GpuScan) on mytest (cost=4026.12..4026.12 rows=3 width=72) GPU Filter: (code = 404) referenced: ts, host, path, code files0: /tmp/mytest20220124.3206341.arrow (read: 128.00KB, size: 133.94KB) (4 rows)
çæããã Apache Arrow ãã¡ã¤ã«ãå¤é¨ãã¼ãã«ã¨ãã¦ãããã³ã°ãããããSQLããåç §ãã¦ãã¾ãã
Fluentdå´ã§æå½¢ããããã°ã®åãã£ã¼ã«ããåç §ããæ¤ç´¢æ¡ä»¶ãä¸ããäºãã§ãã¾ãã ä¸è¨ã®ä¾ã§ã¯ãHTTPã¹ãã¼ã¿ã¹ã³ã¼ã404ã®ãã°ãæ¤ç´¢ãã1件ãããããã¦ãã¾ãã
ã¾ã¨ã
以ä¸ã®ããã«ãFluentdã§åãåã£ããã°ã Apache Arrow å½¢å¼ãã¡ã¤ã«ã¨ãã¦æ¸ãåºããããããã®ã¾ã¾ãã¤ã¾ãæ¹ãã¦ãã¼ã¿ãã¤ã³ãã¼ãããäºãªã PostgreSQL ããåç
§ããäºãã§ããäºãåããã¾ããã
ããã¯ããã°éç©ç³»ã®ã·ã¹ãã ãããæ¤ç´¢ã»åæç³»ã®ã·ã¹ãã ã¸ãã¼ã¿ã移éããã¨ããæéãªãã«SQLå¦çãçºè¡ã§ããäºãæå³ããã»ããä¾ãã°ããã使ããªããªã£ãå¤ããã°ãã¼ã¿ãOSä¸ã§ã³ãã¼ãã¦éé¿ããã°ãããã ãã§ã¢ã¼ã«ã¤ãä½æ¥ãçµäºãã¾ããï¼Apache Arrowå½¢å¼ã®å ´åããã¡ã¤ã«ã«ã¹ãã¼ãæ§é ãå
å
ãã¦ãããããå¾ã«ãªã£ã¦ãããï¼ãã®ãã¼ãã«ã®DDLã¯ï¼ããªãã¦äºãããã¾ããï¼
å ãã¦ãFluentdã®arrow-fileãã©ã°ã¤ã³ã¯ã¿ã¤ã ã¹ã¿ã³ãã«çµ±è¨æ
å ±ãä»å ããäºãã§ãããããæ¤ç´¢æ¡ä»¶ã«æ¥ä»æå»ç¯å²ã®çµãè¾¼ã¿ãå«ãã±ã¼ã¹ã§ã¯å¤§å¹
ãªæ¤ç´¢æéã®é«éåãè¦è¾¼ããã¨ãã§ãã¾ãã
kaigai.hatenablog.com
課é¡ã¨ãã¦ã¯ãç¾ç¶ãã¾ã ãåãããã«ãªã£ããã¨ããã¬ãã«ã§ãã®ã§ãå®éã« Fluentd ã®ã¤ã³ã¹ã¿ã³ã¹ãä½å°ãç«ã¦ã¦æ¤è¨¼æ¸ã¿ã§ãããã¨ãã訳ã§ã¯ããã¾ãããã§ãã®ã§ããã®è¾ºã¯ãã²ãä¸ç·ã«æ¤è¨¼ãã¾ãããï¼ãã¨ããæ¹ãããã£ãããã¾ããããã声ããããã ããã°ã¨æãã¾ãã
å精度浮åå°æ°ç¹åï¼Float2ï¼ã«ã¤ãã¦
ãã®ã¨ã³ããªã¯PostgreSQL Advent Calendar 2021ã«åå ãã¦ãã¾ãã
å®ã¯ãç¾å¨éçºä¸ã®å¥ã®æ©è½ã«ã¤ãã¦æ¸ãããã£ãã®ã§ãããéã«åãã¾ããã§ãããåçã
ããã§ãæ¥é½ãã¿ãç¨æããã®ããåçãã¯ããããã¯ãããã©â¦ãµã©ããããããã¦ãï¼ï¾ï¾ï½ºï½°ï¾!!
ã¨ããäºã§ãPostgreSQLã§å©ç¨ã§ããå精度浮åå°æ°ç¹åï¼float2ï¼ã®äºã«ã¤ãã¦æ¸ããã¨æãã¾ãã
å精度浮åå°æ°ç¹ã¨ã¯
Cã§è¨ãã°32bitã®float
ã«å¯¾ãã¦64bitã®double
ãå精度ã¨å¼ã¶ããã«ãfloat
ã®ååã§ãã16bitã®æµ®åå°æ°ç¹ãã©ã¼ããããå精度浮åå°æ°ç¹å½¢å¼ã§ãã
ãã¡ããããã¼ã¿éãå°ãªãåã表ç¾ã§ããç¯å²ã精度ã«å¶éã¯ããã®ã§ãããä¸æ¹ã§å¿
è¦ãªã¹ãã¬ã¼ã¸é åã¯å°ãããã¾ãSIMDãGPUã¨ãã£ããã¯ãã«æ¼ç®ãè¡ãå ´åã«ã¯ã¡ã¢ãªãã¹ãæå¹æ´»ç¨ã§ãããã¨ãããæ©æ¢°å¦ç¿ã®åéãªã©ã§æ´»ç¨ãé²ãã§ãã¾ãã
åå | ãããå¹ | ææ°é¨ | ä»®æ°é¨ |
---|---|---|---|
å精度 | 64bit | 11bit | 52bit |
å精度 | 32bit | 8bit | 23bit |
å精度 | 16bit | 5bit | 10bit |
PostgreSQL 㧠float2 åãå®ç¾©ãã
ãã¦ããã®å精度浮åå°æ°ç¹åã§ãããPostgreSQLæ¬ä½ã§ã¯ã¾ã 対å¿ãã¦ãã¾ããã
ãããããApache Arrowã¨ã®ãã¼ã¿äº¤æã«å¿
è¦ã§ãã£ãã®ã§ PG-Strom æ¡å¼µã¢ã¸ã¥ã¼ã«ã®ä¸é¨ã¨ãã¦ä½æããã¢ãã»ã»ã»ã§ã¯ããã®ã§ãããå¥ã«ããèªä½ã¯GPUãNVMEãå¿
è¦ã¨ãããã®ã§ã¯ããã¾ããã®ã§ãããåä½ãåãåºãã¦å©ç¨ããäºãå¯è½ã§ãã
x86_64ã®CPUã§ã¯ä»ã®ã¨ããå精度浮åå°æ°ç¹ããã®ã¾ã¾è¨ç®ããäºã¯ã§ãã¾ããã®ã§ãå é¨çã«ã¯ãããfloat4ãfloat8ã«å¤æããä¸ã§æ¼ç®ãè¡ã£ã¦ãã¾ããGPUå´ã§ããã°float2ã®ã¾ã¾è¨ç®ããäºãã§ããã®ã§ããã
float2 -> float4/float8 ã¸ã®å¤æã¯ããã»ã©é£ãããã¨ã§ã¯ããã¾ãããææ°é¨ãä»®æ°é¨ãããå¹ ãåºããªãæ¹ã«åãã®ã§ãfloat2ã§è¡¨ç¾ã§ããå¤ã¯ç¢ºå®ã«float4/float8ã¸ã¨å¤æããäºãã§ãã¾ãã
ä¾ãã°ãfloat2ããfloat4ã¸ã®å¤æã¯ããã®ãããªåç´ãªãããæä½ã ãã§å¯è½ã§ãã
static inline float fp16_to_fp32(half_t fp16val) { uint32_t sign = ((uint32_t)(fp16val & 0x8000) << 16); int32_t expo = ((fp16val & 0x7c00) >> 10); int32_t frac = ((fp16val & 0x03ff)); uint32_t result; if (expo == 0x1f) { if (frac == 0) result = (sign | 0x7f800000); /* +/-Infinity */ else result = 0xffffffff; /* NaN */ } else if (expo == 0 && frac == 0) result = sign; /* +/-0.0 */ else { if (expo == 0) { expo = FP16_EXPO_MIN; while ((frac & 0x400) == 0) { frac <<= 1; expo--; } frac &= 0x3ff; } else expo -= FP16_EXPO_BIAS; expo += FP32_EXPO_BIAS; result = (sign | (expo << FP32_FRAC_BITS) | (frac << 13)); } return int_as_float(result); }
ä¸æ¹ã§ãfloat4/float8 -> float2 ã¸ã®å¤æã¯ã表ç¾å¯è½ãªç¯å²ãè¶
ããã¨+/-Inf
ã«çºæ£ãã¦å³¶åãã注æãå¿
è¦ã§ãã
postgres=# select 65000::float2; float2 -------- 64992 (1 row) postgres=# select 66000::float2; float2 ---------- Infinity (1 row)
ãã¼ãã«å®ç¾©ã§ float2 ãç¨ãã
å精度浮åå°æ°ç¹ãã¼ã¿å㯠PG-Strom ã«å«ã¾ãã¦ããããã以ä¸ã®ããã«CREATE EXTENSION
ã³ãã³ãã§ã¤ã³ã¹ãã¼ã«ããäºãã§ãã¾ãã
postgres=# CREATE EXTENSION pg_strom ; CREATE EXTENSION postgres=# \dT float2 List of data types Schema | Name | Description ------------+--------+------------- pg_catalog | float2 | (1 row)
æ©éããã¼ãã«ãå®ç¾©ãã¦ããã¼ã¿ãæµãè¾¼ãã§ã¿ã¾ãã
postgres=# CREATE TABLE fp16_test ( id int, a float2, b float2, c float2, d float2, e float2, f float2, g float2, h float2 ); postgres=# insert into fp16_test (select x, 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random() from generate_series(1, 4000000) x); INSERT 0 4000000
ä¸æ¹ãæ¯è¼ã®ããã«å精度浮åå°æ°ç¹ã§åãããã«ãã¼ãã«ãå®ç¾©ãã¦ã¿ã¾ãã
postgres=# CREATE TABLE fp64_test ( id int, a float8, b float8, c float8, d float8, e float8, f float8, g float8, h float8 ); CREATE TABLE postgres=# insert into fp64_test (select x, 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random(), 1000*random() from generate_series(1, 4000000) x); INSERT 0 4000000
ãããåã®è©±ã§ã¯ããã¾ããã大ãããµã¤ãºãå¤ãã£ã¦ãã¾ãã
ï¼ãã ããã¿ãã«ã®ããã 24ãã¤ãåã¯å¿
ããã£ã¤ãã®ã§ãåç´ã«4åéããã¨ã¯ãªãã¾ãããï¼
postgres=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Size | Description --------+-------------------+-------------------+--------+-------------+------------+------------- public | fp16_test | table | kaigai | permanent | 199 MB | public | fp64_test | table | kaigai | permanent | 386 MB |
ã¤ã³ããã¯ã¹ãå¼µããã¨ãã§ãã¾ãã
postgres=# create index on fp16_test(b); CREATE INDEX postgres=# explain select * from fp16_test where b between 100 and 150; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on fp16_test (cost=21238.43..61716.43 rows=1000000 width=20) Recheck Cond: ((b >= '100'::double precision) AND (b <= '150'::double precision)) -> Bitmap Index Scan on fp16_test_b_idx (cost=0.00..20988.43 rows=1000000 width=0) Index Cond: ((b >= '100'::double precision) AND (b <= '150'::double precision)) (4 rows)
ãªãå精度浮åå°æ°ç¹ãGPUã§å¥½ã¾ããã®ãï¼
æå¾ã«ããªãGPUã¢ã¯ã»ã©ã¬ã¼ã·ã§ã³ã®æèã§å精度浮åå°æ°ç¹å½¢å¼ã使ãããããã«ãªã£ã¦ããã®ãã説æãã¾ãã
GPUã®ããã«å¤æ°ã®ã³ã¢ã並åã«åä½ããã¨ããã¨ãããNVIDIAã®GPUã§ã¯Warpã¨å¼ã°ãã32ã¹ã¬ããåä½ã§ã®ã¹ã±ã¸ã¥ã¼ãªã³ã°ãè¡ããã¾ãã*1ãé£æ¥ããã³ã¢ãé£æ¥ããã¡ã¢ãªãããã¼ã¿ããã¼ãããéãcoalescingã¨ãã£ã¦ãä¸åã®ã¡ã¢ãªãã©ã³ã¶ã¯ã·ã§ã³ã§è¤æ°ã¹ã¬ããåã®ãã¼ã¿ããã¼ãããäºãããã¾ãã
ä¾ãã°ãä¸åã®ã¡ã¢ãªãã©ã³ã¶ã¯ã·ã§ã³ã§32byteï¼= 256bitï¼åã®ãã¼ã¿ãL2ãã£ãã·ã¥ãããã¼ã*2ã§ããå ´åãããã32bitã®å精度浮åå°æ°ç¹ãªããæ大ã§8ã¹ã¬ããã«ãã¼ã¿ãä¾çµ¦ã§ãããä¸æ¹ãããã16bitã®å精度浮åå°æ°ç¹ãªããæ大ã§16ã¹ã¬ããã«ãã¼ã¿ãä¾çµ¦ã§ããã¨ããè¨ç®ã«ãªãã
é常ããã®æã®ã¯ã¼ã¯ãã¼ãã§ããã°ãã¡ã¢ãªã¢ã¯ã»ã¹ãæ大ã®å¾éè¦å ã¨ãªã£ã¦ãã¾ãã®ã§ãããããã¨ã大éã®è¨ç®ãããªããã°ãªããªãæ©æ¢°å¦ç¿ã®ãããªã¯ã¼ã¯ãã¼ãã§ãè¨ç®ç²¾åº¦ã«ããç¨åº¦ç®ãã¤ã¶ããï¼-1.0ï½1.0ãååã«è¡¨ç¾ã§ããã°ããããªã©ï¼å ´åã«ã¯ãåä½æéãããã®è¨ç®éãå¢ããããã«ãã¼ã¿éãåãã¨ããå¤æãã¢ãªã¨ãªãã
ãã®è¾ºã«ã¤ãã¦ã¯ãCUDA C++ Programming GuideのMaximize Memory Throughputã®ç« ã詳ããã
è¨ãã¾ã§ããªãããããã¯ãã¼ã¿ãåç´é
åã®å½¢ã§ä¸¦ãã§ãããããªå ´åã®è©±ã§ãä¾ãã°PostgreSQLã®è¡ãã¼ã¿ï¼Heapå½¢å¼ï¼ã§ã¯ãã以åã®æ®µéã§ããããã ãã大æµã®SQLæ¡ä»¶å¼ã®æ¤ç´¢ã¨ããã®ã¯ãç¹å®ã®åãä¸åã ãåç
§ãã¦WHERE X BETWEEN 100 AND 200
ã®æ¡ä»¶ãè©ä¾¡ãããã®ã§ããã®ã§ããã®ããã ãã«è¡âåå¤æã¨ããã®ã¯ãªã¼ãºããã«ã§ã¯ãªãã
ï¼ãã¤ã¦ä¸åº¦å®è£
ãããã¨ãããããããï½ï¼
PG-Stromã§ã対å¿ãã¦ãã Apache Arrow å½¢å¼ãããããã¯GPU Cacheæ©è½ã®ããã«ããã¼ã¿ãåç´é
åã®ããã«ä¸¦ã¶ãã¨ã«ãªã£ã¦ãããã¼ã¿å½¢å¼ã§ããã°ããããã£ãcoalesced accessã«ããã¡ã¢ãªèªã¿åºãã®é«éåå¹æã¨ãããã®ãæå¾
ã§ãããããããªãã
ï¼ãã ããRAM => GPUã¸ã®è»¢éã¨ããã®ããããããã£ã¨é
ãã®ã§ããã¾ãå·®åã¯è¦ãã¦ããªãããâ¦ãï¼
PCI-E 4.0ããã£ã¦ããï¼
çªç¶ã§ããããµã¼ããæ°èª¿ãã¾ããã
EPYC2æè¼ãµã¼ã+PCIe4.0対å¿SSDx4ãã¿ã¼ã pic.twitter.com/SB7UxdR6pP
— æµ·å¤ æµ©å¹³ï½KaiGai Koheið» (@kkaigai) 2021å¹´10æ15æ¥
æ¨å¹´ãå è¡ã㦠NVIDIA A100 ã調éãã¦ããã®ã§ãããææã¡ã®ãµã¼ãèªä½ã¯Skylake-Spã§PCI-E 3.0ä¸ä»£ãªã®ã§ãã¾ã GPUèªä½ã®æã¤ããã³ã·ã£ã«ãè©ä¾¡ã§ããã»ã»ã»ã¨ãã£ãã¨ããã§ããã
調éãããµã¼ãã¯ãSupermicro社ã®AS-2014CS-TRã¨ããã¢ãã«ã§ãæ§æèªä½ã¯ä»¥ä¸ã®éãã¨ãªãã¾ãã
- çä½: AS-2014CS-TR
- CPU: AMD EPYC 7402P (24C; 2.85GHz) x1
- RAM: 128GB [16GB DDR4-3200 (ECC) x8]
- GPU: NVIDIA A100 (PCI-E; 40GB) x1
- SSD: Intel D7-P5510 (U.2; 3.84TB) x4
- HDD: Toshiba 3.5 1.0TB (7.2krpm; 6.0Gb/s) x2
- N/W: AIOM 2-port 10Gbase-T x1
ä»åãã¯ããã¦CPUã¨SSDãPCI-E 4.0ã«å¯¾å¿ããä¸ä»£ã®ãã®ã調éãã¦ãããã§CPU/GPU/SSDãPCI-E 4.0ã«æãã¦ã®ãã³ããã¼ã¯ãå¯è½ã¨ãªãã¾ãã
CPUã«Millanä¸ä»£ã§ã¯ãªãRomaä¸ä»£ãé¸ãã ã®ã¯ããã®ä¸ä»£ã§ã¯I/Oå¨ãã®å¤åããªãï¼ãããï¼ã¨ããã¦ããäºã¨ãæ¨ä»ã®åå°ä½ä¸è¶³ã®å½±é¿ã§ã¿ãã§ããé·ãç´æãããã«å»¶ã³ãæ¸å¿µãã»ã»ã»ã¨ããã¯ã±ã§ããï¼ãããããã®ãµã¼ããçºæ³¨ããã®ã¯ï¼æã ï½ï¼
ãã¦ãæ©éããã¤ãã®ããã« SSBM ã®ãã¼ã¿ãã¼ã¹ãSF=999ã§æ§ç¯ãã13æ¬ã®ã¯ã¨ãªã®å¿çé度ãè¨æ¸¬ãã¦ã¿ã¾ãã
æããµã¤ãºã®å¤§ããªlineorder
ãã¼ãã«ã®ãµã¤ãºã¯ 875GB ã¨ãªãã®ã§ãã¹ãã¬ã¼ã¸ä¸å¿ã®ãã³ããã¼ã¯ã«ã¯ååãªãµã¤ãºã§ãã
ã¾ãçµæã¯ãã®éãã
åãããããã®ããã¹ã«ã¼ããã表è¨ã§ã°ã©ãã«ãã¦ãã¾ãããããã¯åç´ã«ï¼875GB ÷ ã¯ã¨ãªå¿çæéï¼ã§ãã®ã§ã縦軸ã®ãQuery Execution Throughput [MB/s]ãã®å¤ã大ããã»ã©å¦çæ§è½ãé«ãäºã示ãã¦ãã¾ãã
Filesystem I/Oã®å ´åãå ã PCI-E 3.0ä¸ä»£ã§ãSSDæ§è½ãé¥ãã«ä¸åãå¦çæ§è½ããåºãã¦ãã¾ããã§ããããH/Wãæ°ãããªã£ã¦ãåããããªæ§è½å¤ã§ããã¨ããäºã¯ãã¹ãã¬ã¼ã¸èªã¿åºãã§ã¯ãªãããããã¡ã³ãã¼ã®ç¹°ãè¿ããªã©å¥ã®ã¨ããã«ããã«ããã¯ãããäºã示åãã¦ãã¾ãã
ä¸æ¹ãGPU-Direct SQLã®å ´åãã¯ã¨ãªã«ãã£ã¦ã¯19GB/sã«è¿«ãå¤ãåºãã¦ãã¾ãã
PCI-E3.0ã®Skylake-Spä¸ä»£ã§ã¯8.5GB/sï½9.0GB/sç¨åº¦ã§é æã¡ã«ãªã£ã¦ããäºãèããã¨ãä¸ã
ã®ã¹ã³ã¢ã¨ãããã§ããããã
ç¶ãã¦ãiostatã§æ¸¬å®ããnvme0ï½nvme3ã®åããã¤ã¹ã®ã¯ã¨ãªå®è¡ä¸ã®èªã¿åºãã¹ã«ã¼ããããè¨æ¸¬ããã¨ãããã¯ããã§ä¸ã
ã²ã©ãã
åããµã¤ãºã®ãã¼ã¿ãèªã¿åºãã«ããçæéã§ã¬ããï¼ã¨èªã¿åºãããæéãããã¦ãã³ã¿ã©èªã¿åºããã¨ããã®ãå¯è¦åããã¦ããã¨æãã¾ãã
ããã¾ã§ã¯åããã¼ã¿å½¢å¼ã®å ´åã
ã§ã¯ããã®875GB/60åè¡ã®lineorder
ãã¼ãã«ã Pg2Arrow ã§Apache Arrowå½¢å¼ã«å¤æããåãã¼ã¿ã¨ãã¦èªã¿åºãå ´åã§ããã°ã©ããï¼
è¡ãã¼ã¿ã¨åãã¼ã¿ã®å¦çæ§è½ãã¹ã«ã¼ãããã§æ¸¬ãã®ã¯é©åã¨ã¯è¨ãã¾ããã®ã§ãä»åº¦ã¯ï¼60å ÷ ã¯ã¨ãªå¿çæéï¼ããï¼ç§ãããå¦çããè¡æ°ãã¨ãã¦ãããããã¦ã¿ã¾ããããã¨ãI/Oã®å¹çãè¯ãåãããã«å·®ãåºããçµæã¨ãªãã¾ããã
ç´°ããã¨ããã§è¨ãã¨ããç§éã§10åè¡ããéæãã¦ããQ1_1ã¨ãQ1_2ããã³Q1_3ã®ãã¼ã¿èªã¿åºããµã¤ãºã¯çããã®ã§ãå¦çæéãããã¨åãç¨åº¦ã«ãªã£ã¦ãã¦ã»ããã®ã§ããããªããã£ãã¤ã¶ããã¾ä¸ã¤ãããã«ãªå®è¡è¨ç»ãä½ã£ã¦ãã¾ã£ãããã§ãããã®è¾ºã¯è¦æ¹åã¨ãã£ãã¨ããã§ããããã
Q1_1ã®EXPLAIN ANALYZEçµæ
postgres=# explain analyze select sum(lo_extendedprice*lo_discount) as revenue from flineorder,date1 where lo_orderdate = d_datekey and d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=9717634.19..9717634.20 rows=1 width=8) (actual time=4670.374..4781.510 rows=1 loops=1) -> Gather (cost=9717633.96..9717634.17 rows=2 width=8) (actual time=4387.503..4781.496 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Custom Scan (GpuPreAgg) (cost=9716633.96..9716633.97 rows=1 width=8) (actual time=4359.970..4359.976 rows=1 loops=3) Reduction: NoGroup Combined GpuJoin: enabled GPU Preference: GPU0 (NVIDIA A100-PCIE-40GB) -> Parallel Custom Scan (GpuJoin) on flineorder (cost=17101.66..9716355.33 rows=594409 width=8) (never executed) Outer Scan: flineorder (cost=17060.26..9711145.81 rows=4162493 width=12) (actual time=156.771..564.225 rows=5993990673 loops=1) Outer Scan Filter: ((lo_discount >= 1) AND (lo_discount <= 3) AND (lo_quantity < 25)) Rows Removed by Outer Scan Filter: 5209361385 Depth 1: GpuHashJoin(plan nrows: 4162493...1426582, actual nrows: 784629288...119025391) HashSize: 20.50KB (estimated: 63.28KB) HashKeys: flineorder.lo_orderdate JoinQuals: (flineorder.lo_orderdate = date1.d_datekey) GPU Preference: GPU0 (NVIDIA A100-PCIE-40GB) with GPUDirect SQL referenced: lo_orderdate, lo_quantity, lo_extendedprice, lo_discount files0: /opt/pgdata13/flineorder.arrow (read: 89.37GB, size: 681.05GB) -> Seq Scan on date1 (cost=0.00..78.95 rows=365 width=4) (actual time=0.060..0.317 rows=365 loops=1) Filter: (d_year = 1993) Rows Removed by Filter: 2191 Planning Time: 2.334 ms Execution Time: 4910.442 ms (24 rows)
ä¸è¨ã®ããã«GpuPreAggã®ç´ä¸ã«GpuJoinãå
¥ã£ã¦ãããå ãã¦ãCombined GpuJoin: enabledãã¨åºåããã¦ãã¾ãã
ãã®ã¨ããGpuJoinã®å¦ççµæã¯ä¸åº¦CPUã«æ»ãããäºãªãGPUä¸ã§éç´ããããããæãå¹çã®ãããã¿ã¼ã³ã¨ãªãã¾ãã
ã¾ããApache Arrowãã¡ã¤ã«ã¯å
¨ä½ã§681GBããããã®ãã¡89GBãèªã¿åºããäºãåããã¾ããã¤ã¾ãã20GB/sç¨åº¦ã®èªã¿åºãé度ãããã°ã5ç§ç¨åº¦ã§å¦çãçµããã®ã¯ä¸æè°ãªäºã§ã¯ãªãã¨ããäºã«ãªãã¾ãã
Q1_2ã®EXPLAIN ANALYZEçµæ
postgres=# explain analyze select sum(lo_extendedprice*lo_discount) as revenue from flineorder, date1 where lo_orderdate = d_datekey and d_yearmonthnum = 199401 and lo_discount between 4 and 6 and lo_quantity between 26 and 35; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Finalize Aggregate (cost=10085531.30..10085531.31 rows=1 width=8) (actual time=13043.719..13155.506 rows=1 loops=1) -> Gather (cost=10085531.08..10085531.29 rows=2 width=8) (actual time=12720.924..13155.496 rows=3 loops=1) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (cost=10084531.08..10084531.09 rows=1 width=8) (actual time=12709.551..12709.556 rows=1 loops=3) -> Hash Join (cost=17686.41..10084527.30 rows=757 width=8) (actual time=4078.065..12645.401 rows=1404451 loops=3) Hash Cond: (flineorder.lo_orderdate = date1.d_datekey) -> Parallel Custom Scan (GpuScan) on flineorder (cost=17607.07..10084283.79 rows=62438 width=12) (actual time=313.555..6825.444 rows=108984171 loops=3) GPU Filter: ((lo_discount >= 4) AND (lo_discount <= 6) AND (lo_quantity >= 26) AND (lo_quantity <= 35)) Rows Removed by GPU Filter: 5667038161 GPU Preference: GPU0 (NVIDIA A100-PCIE-40GB) with GPUDirect SQL referenced: lo_orderdate, lo_quantity, lo_extendedprice, lo_discount files0: /opt/pgdata13/flineorder.arrow (read: 89.37GB, size: 681.05GB) -> Hash (cost=78.95..78.95 rows=31 width=4) (actual time=0.521..0.522 rows=31 loops=3) Buckets: 1024 Batches: 1 Memory Usage: 10kB -> Seq Scan on date1 (cost=0.00..78.95 rows=31 width=4) (actual time=0.171..0.513 rows=31 loops=3) Filter: (d_yearmonthnum = '199401'::numeric) Rows Removed by Filter: 2525 Planning Time: 2.308 ms Execution Time: 13268.534 ms (20 rows)
ä¸æ¹ãQ1_2ã®çµæãè¦ã¦ã¿ãã¨ãGpuPreAggãé¸æããã¦ããããGpuScanã«ãããã£ã«ã¿ãå®è¡ããå¾ã¯CPUã§HashJoinã¨Aggregateãå®è¡ãã¦ãã¾ããGpuScanã®è¿ãè¡æ°ã®æ¨å®å¤ã62438è¡è¶³ãããªã®ã§ããããªç¨åº¦ã®å¦çã«GPUã使ãã¾ã§ããªããã¨ããã®ã¯åããã®ã§ãããå®éã«ã¯3åè¡ã¡ããï¼1åè¡Ã3ã¯ã¼ã«ã¼ï¼ãèªã¿åºãã¦ããäºã«ãªãã®ã§ãèªã¿ã大ããºã¬ã ã£ãã¨ããããã§ãã
ãã¼ãã¦ã§ã¢ã®å¢å¼·ã ãã§ã¯é«éåãéæããäºã¯ã§ããªãã¨ããããä¾ã§ããããã®è¾ºã¯ã追ã£ã¦èª¿ã¹ã¦ã¿ããã¨æãã¾ãã
ãªããä»åã®ãã®æ¸¬å®çµæãå«ãã大éãã¼ã¿ãå¦çããããã® PG-Strom ã®è«¸æ©è½ã«ã¤ãã¦ã¯ã11月12日(金)のPostgreSQL Conference Japan 2021ã«ã¦çºè¡¨ãè¡ãã¾ãã
ãã®ãæä¸ããªããªãé¡ãçªãåããããã£ã¹ã«ãã·ã§ã³ã®æ©ä¼ãå°ãªãã®ã§ããããã¡ãã¯ãªãã©ã¤ã³ã§ã®éå¬äºå®ã¨ãªã£ã¦ããã¾ãã
ãã²ã®ãåå ããå¾
ã¡ãã¦ããã¾ãã
HyperLogLogã使ã£ãã«ã¼ãã£ããªãã£ã®æ¨æ¸¬ï¼è£è¶³ï¼
å°ãããã¡ãã®ãã©ãã¼ã¢ããè¨äºã¨ãªãã¾ãã
ããã°å ¬éå¾ãä½ä»¶ãã³ã¡ã³ããããã ãã¾ããã
PG-Stromã§COUNT(distinct KEY) ã«HyperLogLogã使ãã話ãå°ç¨ã®é¢æ°ã使ãã®ãã¨æã£ããããã©ã¡ã¼ã¿ã®on/offã§COUNTèªä½ã®è¨ç®æ¹æ³ï¼å®è¡è¨ç»ï¼ãå¤ããã®ãã
— ãã° - as a DB Engineer - (@tzkb) 2021å¹´9æ25æ¥
Citusã«ãåæ§ã®ä»çµã¿ããã£ã¦ããã¡ãã¯error_rateãæå®ããã®ããªã https://t.co/Z4JCh8EoHQ
ãããã¨ããããã¾ããå人çã«ã¯hll_count(KEY) ã好ã¿ã§ããå¤ã人éãªã®ã§ãCOUNTã¯æ£ç¢ºãªçµæã ã¨æãè¾¼ã¿ãã¡ã§ãã
— ãã° - as a DB Engineer - (@tzkb) 2021å¹´9æ25æ¥
ã¡ãªã¿ã«å³ä¸ã«ããHLL Registersã¯ãsketchã¨è¨ããããã®ãªã®ã§ããããã
ãªãã»ã©ç¢ºãã«ãGUCãã©ã¡ã¼ã¿ã®å¤ã«å¿ãã¦COUNT(distinct KEY)
ãç½®ãæããæ§é ã ã¨ããã®ã¤ããããªãã®ã«ãHyperLogLogã使ã£ãã«ã¼ãã£ããªãã£ã®æ¨è¨ãè¡ã£ã¦ãã¾ãã»ã»ã»ã¨ããäºæ
ãçºçãã¦ãã¾ãããããã¬ã
ã¨ããäºã§ãååã®è¨äºã§èª¬æããpg_strom.enable_hll_count
ã¯å»æ¢ãã代ããã«ãã¦ã¼ã¶ãæ示çã«HyperLogLogã使ç¨ããäºãæããããã«ãhll_count(KEY)
ã¨ããéç´é¢æ°ã追å ãã¦ãã¾ãã
使ç¨æ³ã¨ãã¦ã¯ãããªæãã
=# select hll_count(lo_custkey) from lineorder ; hll_count ----------- 2005437 (1 row)
ããã«ããä¸ç¹ãhll_count(KEY)
ã¯HyperLogLogã使ã£ã¦ä½æããHLL Sketchï¼ååè¨äºã§HLL Registersã¨å¼ãã§ãããã®ãç¨èªãçµ±ä¸ãï¼ãå
ã«æ¨è¨å¤ãåºãé¢æ°ã§ãããæ¨è¨å¤ãåºãã®ã§ã¯ãªãããã®ã¾ã¾HLL Sketchãbyteaåã§ä¿åã§ããããã«ãã¾ããã
ãã¡ãã¯ãhll_sketch(KEY)
ã¨ããéç´é¢æ°ã«ãªãããã¨ã§ä¿åãã¦ãããHLL Sketchãhll_merge(SKETCH)
ã«é£ããã¦ãæ¹ãã¦æ¨è¨å¤ãåºåã§ããããã«ãªãã¾ãã
使ãæ¹ã¨ãã¦ã¯ãä¾ãã°ãäºãé±æ¬¡ãæ次ã®ãã¼ã¿ã§ HLL Sketch ãä½æãã¦ããã°ããã¨ã§å¿ è¦ãªç¯å²ã ãã® HLL Sketch ããã¼ã¸ãã¦ã«ã¼ãã£ããªãã£ã®æ¨è¨å¤ãåºåããã¨ãã£ã使ãæ¹ãèãããã¾ãã
使ç¨æ³ã¨ãã¦ã¯ãããªæãã
--- å¹´åä½ã§ HLL Sketch ãåºåãã =# select lo_orderdate / 10000 as year, hll_sketch(lo_custkey) as sketch into pg_temp.annual from lineorder group by 1; SELECT 7 --- HLL Sketchããã¹ãã°ã©ã ã«ãã¦åºåãã =# select year, hll_sketch_histogram(sketch) from pg_temp.annual order by year; year | hll_sketch_histogram ------+------------------------------------------------------- 1992 | {0,0,0,0,0,0,0,0,0,22,73,132,118,82,39,26,12,2,4,2} 1993 | {0,0,0,0,0,0,0,0,0,9,59,118,125,96,50,30,15,2,6,2} 1994 | {0,0,0,0,0,0,0,0,0,4,33,111,133,113,53,36,17,4,6,2} 1995 | {0,0,0,0,0,0,0,0,0,2,21,99,131,121,62,42,18,5,7,3,1} 1996 | {0,0,0,0,0,0,0,0,0,1,17,84,119,131,73,50,20,5,7,4,1} 1997 | {0,0,0,0,0,0,0,0,0,0,14,71,118,128,82,53,23,10,7,4,2} 1998 | {0,0,0,0,0,0,0,0,0,0,13,64,114,126,86,61,23,11,8,4,2} (7 rows) --- ç´¯ç©å¤ã§ lo_custkey ã®ã«ã¼ãã£ããªãã£ãæ¨æ¸¬ =# select max_y, (select hll_merge(sketch) from pg_temp.annual where year < max_y) from generate_series(1993,1999) max_y; max_y | hll_merge -------+----------- 1993 | 854093 1994 | 1052429 1995 | 1299916 1996 | 1514915 1997 | 1700274 1998 | 1889527 1999 | 2005437 (7 rows)
ä¾ãã°ãã¦ãã¼ã¯ã¦ã¼ã¶æ°ã®éè¨ãæ¥æ¬¡ã»é±æ¬¡ã§éè¨ããæãªã©ãæ¯å COUNT(distinct KEY)
ã§ãã£ã¦ãã¦ã¯é
ãã¦ãã¾ããªããã¿ãããªç¶æ³ã§ããã°ãå©ç¨ä¾¡å¤ã®ããææ³ããããã¾ããã
æ¬æ¥ãPostgreSQL Unconference (online) ã«ã¦ãã®è¾ºã®ãããã¯ã«ã¤ãã¦è©±ãã¾ãã®ã§ããæéããæ¹ã¯ãã²ã覧ãã ããã
pgunconf.connpass.com
HyperLogLogã使ã£ãã«ã¼ãã£ããªãã£ã®æ¨æ¸¬
é«æ ¡çã®é ã¾ã§ã¯æ»è³çã«ä½ãã§ããäºããããå¤ãåå¼·ã®åéã«ãKBS京é½ã§æ¾éããã¦ãããæ¥é«ã®ãåã®ã¯ãã±ããã¤ãããèãã¦ãããæ¥ã ãã¿ãèãã¦ã¯ãçªçµã¸ãã¬ããæ稿ãã常é£ã ã£ã*1ã®ã§ããï¼âåå¼·ã¯ã©ãããï¼ãä»åã¯ãPG-Stromã«å®è£ ãããã¯ãã±ãããªæ©è½ãç´¹ä»ãããã¨æãã¾ãã
SELECT COUNT(distinct KEY) ã¯çµæ§é£ãã
SELECT COUNT(KEY) FROM my_table;
ã
SELECT COUNT(distinct KEY) FROM my_table;
ã«ãªã£ãç¬éãç¹ã«ãµã¤ãºã®å¤§ããªãã¼ãã«ãã¹ãã£ã³ããå ´åã«ã¯ãé常ã«é£ããåé¡ã«ãªã£ã¦ãã¾ãã¾ãã
æåã®ä¾ã¯ãKEYãéNULLã§ããè¡æ°ãå ¨é¨ã«ã¦ã³ããã¦è¿ãã°è¯ãã®ã§ãããå¾è ã®å ´åã¯KEYãéè¤ããå ´åã«ã¯ã«ã¦ã³ãããªããããéè¤æé¤ãè¡ãããã®å·¥å¤«ãå¿ è¦ã«ãªãã¾ãããããã«ã¼ãã£ããªãã£ãè¨ç®ããã¨è¨ãã¾ãã
ãããDBã§å®è£ ããã«ã¯ï¼éãã®æ¹æ³ãèãããã¾ãã
æ¹æ³â
å
¥åã¹ããªã¼ã ãäºãKEYå¤ã§ã½ã¼ããã¦ãããKEYå¤ãå¤ãããã³ã«ã«ã¦ã³ã¿ãã¤ã³ã¯ãªã¡ã³ãããã
KEYå¤ã«ã¤ã³ããã¯ã¹ãå¼µããã¦ããå ´åãªã©ã«ã¯æå¹ãªæ¹æ³ã ããããã§ãªããã°ãå®è¡æã«ãã¼ãã«å
¨ä½ã®ã½ã¼ããå¿
è¦ã«ãªããããããé ååå²ã«ãã並åå¦çãä¸å¯è½ã§ããã®ã§ãä»®ã«å
¥åã¬ã³ã¼ããæ°åè¡ãã£ãã¨ããã¨ãå¾åã«COUNT(distinct KEY)
é¢æ°ãæ°ååå®è¡ããã°ãªããªãã
æ¹æ³â¡
éç´é¢æ°ãå®è¡ãã Agg ãã¼ãã§ããã·ã¥è¡¨ãæã£ã¦ãããKEYå¤ãããã¾ã§ã«ã¹ãã£ã³ããã¬ã³ã¼ãã«å«ã¾ãã¦ãããã©ãããå¤å®ãããæçµçãªCOUNT(distinct KEY)
ã®çµæã¯ããã®ããã·ã¥è¡¨ã®ã¨ã³ããªæ°ã¨ãªãã
ã½ã¼ãã¯å¿
è¦ãªãããã¡ã¢ãªæ¶è²»éãäºåã«äºæ¸¬ä¸å¯è½ã§ãããã·ã¥è¡¨ã®ãµã¤ãºã«ãã£ã¦ã¯ä¸¦åå¦çãé£ãããï¼é常ãã¡ã¢ãªæ¶è²»ãåé¡ã«ãªããããªç¶æ³ã§ã¯ãã¼ã¸å¦çã大å¤ãªè² è·ã«ãªãï¼
ãªã®ã§ã大éã®ãã¼ã¿ã»ããã®ä¸ããæ£ç¢ºãªã«ã¼ãã£ããªãã£ãåºåãããã¨ããã¨ããããã大å¤ï¼= å¦çæéãããã£ã¦ãã¾ãï¼ã¨ããäºã«ãªãã¾ãã
ããã£ãããã§ããããªãã§ããï¼
ãã ããã¯ãå³å¯ãªéè¤æé¤ãè¡ã£ãéè¨ãè¡ãä¸ã§ã®å¶éäºé ã§ãä¸ã®ä¸ã«ã¯ããã£ããã¨ããæ°ãç¥ããããã§ååãªã±ã¼ã¹ãåå¨ãã¾ããä¾ãã°ãã¢ã¯ã»ã¹ãã°ããã¢ã¯ãã£ããªã¦ã¼ã¶æ°ãéè¨ãã¦ã°ã©ãã«åºããããã¨ãã£ãå ´åãªã©ãå¤å°ã®èª¤å·®ã¯è¨±å®¹ã§ããã¦ã¼ã¹ã±ã¼ã¹ã§ãã
ãããæ¯è¼ç精度ããæ¨å®ã§ããæ¹æ³ã¨ãã¦ãHyperLogLogã¨ããææ³ãç¥ããã¦ãããããã¤ãã®ããã°ãã¼ã¿å¦çåããã¼ã¿ãã¼ã¹ã«å®è£ ããã¦ãããã®ãããã¾ãã
- Amazon RedShift ... HLL_CARDINALITY 関数 - Amazon Redshift
- Google BigQuery ... 標準 SQL の HyperLogLog++ 関数
- Microsoft CitusDB ... Distributed Distinct Count with HyperLogLog on Postgres — Citus 10.2 documentation
ä»åã¯ãGPUä¸ã§ã®GROUP BYå¦çãè¡ãGpuPreAggæ©è½ã®æ¡å¼µã¨ãã¦ãPG-Stromã«HyperLogLogãå®è£ ãã¦ã¿ã¾ããã
HyperLogLogã¢ã«ã´ãªãºã ã®èãæ¹
HyperLogLogã¢ã«ã´ãªãºã ã®èãæ¹ããã£ãã説æãã¾ãã
- åæâ ï¼
COUNT(distinct KEY)
ã®KEYå¤ãããã·ã¥é¢æ°ã«ãããã¨ãã©ã³ãã ãªãããåãçæãããããºã§ããã - åæâ¡ï¼KEYå¤ã®ã«ã¼ãã£ããªãã£ãé«ããã°ã
...10100000
ã®ããã«0ãé£ç¶ãããã¿ã¼ã³ãå«ã¾ããããºã§ããã
ãããã£ã¦ããã¼ãã«ãã¹ãã£ã³ãã¦KEYå¤ã®ããã·ã¥ãè¨ç®ãããã®ä¸ã§ä¸ä½ãããããé£ç¶ãã0ã®åæ°ã®æ大å¤ãè¨é²ãã¦ããã°ããã®éåã®ã«ã¼ãã£ããªãã£ã¯ç¨åº¦ã§ããã¨æ¨å®ããäºãã§ãã¾ãã
ãã¡ããããã®ããã«nã®å¤ã«å¿ãã¦ã§å¢ãã¦ããæ¨å®å¤ã¨ããã®ã¯ãã¾ãã«ã誤差ã大ããã§ãã®ã§ãããå°ã工夫ãå ãã¾ãã
ããã·ã¥å¤ã®ä¸ä½bãããåãï½äºãã«ç¬ç«ããã«ã¦ã³ã¿ã§ããHLLã¬ã¸ã¹ã¿ã®ã¤ã³ããã¯ã¹ã¨è¦ãªããæ®ãã®ãããåããé£ç¶ãã0ã®åæ°ãã«ã¦ã³ããã¦ãã¤ã³ããã¯ã¹ãããHLLã¬ã¸ã¹ã¿ã«ãã®æ大å¤ãè¨é²ãã¾ãã
æå¾ã«ãããã®å¹³åå¤ãè¨ç®ããäºã§ããã°ãã°æ··ãã£ã¦ãã¾ãä¾å¤çãªããã·ã¥å¤ã®å½±é¿ãæé¤ããããå°ãçã®å¤ã«è¿ãKEYå¤ã®ã«ã¼ãã£ããªãã£ãæ¨å®ããï¼ï¼ï¼ã¨ããæµãã«ãªãã¾ãã
ãã®ææ³ã®è¯ãã¨ããã¯ãå
¥åå¤ãã½ã¼ãããå¿
è¦ããªããã¾ãããã¼ãã«ãåå²çµ±æ²»ãã¦äºãã«ç¬ç«ãªHLLã¬ã¸ã¹ã¿ãä½ã£ãã¨ãã¦ããããã»ã©å¤§éã®ã¡ã¢ãªãæ¶è²»ããªãããã並åå¦çã«åãã¦ããã¨ããã§ãã
ä¾ãã°ã64bitã®ããã·ã¥å¤ã§ã¬ã¸ã¹ã¿ã®ã»ã¬ã¯ã¿ã«10bitã使ã£ãå ´åãåã¬ã¸ã¹ã¿ã¯8bitããã°ã«ã¦ã³ã¿ã¨ãã¦ååã«æ©è½ãããããHLLã¬ã¸ã¹ã¿ã¨ãã¦å¿
è¦ãªã®ã¯å
ã1.0kBã ãã¨ããäºã«ãªãã¾ãã
PG-Stromã«ãããHyperLogLog
ããã§ã¯ä¾ã¨ãã¦ãStar Schema Benchmarkãã¼ã¿ã»ããã® lineorder ãã¼ãã«ãããlo_custkey*2ã®ã«ã¼ãã£ããªãã£ã調ã¹ã¦ã¿ãäºã«ãã¾ãã
scale factorã¯100ãªã®ã§ããã¼ãã«ã®ãµã¤ãºã¯æ¦ã87GBã¨ãªãã¾ãã
nvme=# \d+ List of relations Schema | Name | Type | Owner | Persistence | Size | Description --------+-----------+-------+--------+-------------+--------+------------- public | customer | table | kaigai | permanent | 406 MB | public | date1 | table | kaigai | permanent | 416 kB | public | lineorder | table | kaigai | permanent | 87 GB | public | part | table | kaigai | permanent | 160 MB | public | supplier | table | kaigai | permanent | 132 MB | (5 rows)
nvme=# explain select count(distinct lo_custkey) from lineorder; QUERY PLAN ------------------------------------------------------------------------------ Aggregate (cost=18896094.80..18896094.81 rows=1 width=8) Output: count(DISTINCT lo_custkey) -> Seq Scan on public.lineorder (cost=0.00..17396057.84 rows=600014784 width=6) Output: lo_orderkey, ...(snip)..., lo_shipmode (4 rows)
ããã©ã«ãè¨å®ã§ã¯ããã®ããã« count(distinct ...)
ãå«ãã¯ã¨ãªãGPUã§å®è¡ã§ãã¾ããã
ããã¯ãHyperLogLogã«ããæ¨å®å¤ã§ count(distinct ...)
ã代æ¿ããäºã§çµæãå¤ãã£ã¦ãã¾ããããããã©ã«ãã§ã¯ç¡å¹åããã¦ããããã§ãã
nvme=# select count(distinct lo_custkey) from lineorder; count --------- 2000000 (1 row) Time: 409851.751 ms (06:49.852)
å®è¡ããã¨ãå³å¯ãªcount(distinct lo_custkey)
㯠2,000,000 ã§ããä¸æ¹ããã®å®è¡ã«ã¯ 409 ç§ãè¦ãã¦ããäºãåããã¾ãã
ï¼Sortã®é«éåãç®çã¨ããCPU並åã¯ã¨ãªããæå¹ã«ãªã£ã¦ããªãã®ã§ãå½ç¶ã¨è¨ãã°å½ç¶ã¨è¨ãã¾ããï¼
次ã«ãPG-Stromã®HyperLogLogæ©è½ã«ãã count(distinct ...)
ã®ç½®ãæããæå¹ã«ãã¾ãã
nvme=# set pg_strom.enable_hll_count = on; SET
å®è¡è¨ç»ãè¦ã¦ã¿ã¾ãããã
nvme=# explain verbose select count(distinct lo_custkey) from lineorder; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=7444397.37..7444397.38 rows=1 width=8) Output: pgstrom.hll_count((pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey)))) -> Gather (cost=7444397.14..7444397.35 rows=2 width=32) Output: (pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))) Workers Planned: 2 -> Parallel Custom Scan (GpuPreAgg) on public.lineorder (cost=7443397.14..7443397.15 rows=1 width=32) Output: (pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))) GPU Output: (pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))) GPU Setup: pgstrom.hll_hash(lo_custkey) Reduction: NoGroup Outer Scan: public.lineorder (cost=2833.33..7365270.22 rows=250006160 width=6) GPU Preference: GPU0 (Tesla V100-PCIE-16GB) Kernel Source: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_35128.1.gpu Kernel Binary: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_35128.2.ptx (14 rows)
GPUãç¨ããéç´é¢æ°ã§ããGpuPreAggãé¸æããã¦ããã»ããå
ã
count(DISTINCT lo_custkey)
ãåºåãã¦ãã Aggregate ãã¼ããã代ããã«pgstrom.hll_count((pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))))
ã®å®è¡çµæãåºåããããã«æ¸ãæãããã¦ãã¾ãã
å
å´ããé ã«èª¬æããã¨ãpgstrom.hll_hash(lo_custkey)
é¢æ°ã¯ãHyperLogLogã«ä½¿ç¨ããããã·ã¥å¤ãè¨ç®ããããã®é¢æ°ã§ãããã§ã¯è»½éãã¤æ¯è¼çã©ã³ãã ãª64bitã®ããã·ã¥å¤ãå¾ãããã¨ããäºã§SipHashアルゴリズムã使ç¨ãã¦ãã¾ãã
次ã«ãpgstrom.hll_pcount(HASH)
é¢æ°ã¯ãHLLã¬ã¸ã¹ã¿é
åãã»ããã¢ããããå¼æ°ã¨ãã¦ä¸ãããã64bitã®ããã·ã¥å¤ãå
ã«ããã次ã
ã¨æ´æ°ãã¦ããã¾ããéè¦ãªã®ã¯ãpgstrom.hll_pcount(HASH)
é¢æ°ã¯HLLã¬ã¸ã¹ã¿é
åã ããåºåãããããã©ãã ã巨大ãªãã¼ãã«ãã¹ãã£ã³ããäºã«ãªã£ãã¨ãã¦ããpgstrom.hll_pcount(HASH)
é¢æ°ããå¾ã®å·¥ç¨ã§ã¯ãã£ã1è¡ããï¼GROUP BYå¥ãæå®ããã¦ããå ´åã¯ã°ã«ã¼ãã®æ°ã ãããï¼è¿ããªãã¨ããäºã§ãã
ãããã£ã¦ãåã¯ã¼ã«ã¼ããã»ã¹ããè¿å´ããããã®ãå«ããCPUã§HLLã¬ã¸ã¹ã¿é
åããã¼ã¸ããäºã«ãªãpgstrom.hll_count()
é¢æ°ã¯ãå
ã1è¡ x 3ããã»ã¹åã®çµæãå¦çããã ãã§ãHyperLogLogã«ããlo_custkeyå¤ã®ã«ã¼ãã£ããªãã£ã®æ¨å®ãå¯è½ã«ãªãã¨ããäºã§ãã
ãã®ãããªæ§è³ªã«ãããGPU/CPUã®ä¸¦åå¦çã®æ©æµãæ大éã«åããäºãã§ãããããããã©ã¼ãã³ã¹ãè¯å¥½ã§ãã
å³å¯ãªéè¨å¤ãå°åºããããã«409ç§ãè¦ãã¦ããä¸æ¹ãå®éã®å¤ã¨ 0.3% ç¨åº¦ãããºã¬ã®ãªã 2,005,437 ã¨ããæ¨å®å¤ã9.2ç§ã§å°åºãã¦ãã¾ãã
nvme=# select count(distinct lo_custkey) from lineorder; count --------- 2005437 (1 row) Time: 9212.712 ms (00:09.213)
å®è¡è¨ç»ã®è©³ç´°ãè¦ã¦ã¿ã¾ãããã
nvme=# explain (verbose, analyze) select count(distinct lo_custkey) from lineorder; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4992387.95..4992387.96 rows=1 width=8) (actual time=9045.729..9081.690 rows=1 loops=1) Output: pgstrom.hll_count((pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey)))) -> Gather (cost=4992387.72..4992387.93 rows=2 width=32) (actual time=8892.195..9081.633 rows=3 loops=1) Output: (pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))) Workers Planned: 2 Workers Launched: 2 -> Parallel Custom Scan (GpuPreAgg) on public.lineorder (cost=4991387.72..4991387.73 rows=1 width=32) (actual time=8760.881..8760.885 rows=1 loops=3) Output: (pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))) GPU Output: (pgstrom.hll_pcount(pgstrom.hll_hash(lo_custkey))) GPU Setup: pgstrom.hll_hash(lo_custkey) Reduction: NoGroup Outer Scan: public.lineorder (cost=2833.33..4913260.79 rows=250006160 width=6) (actual time=159.316..2800.578 rows=600037902 loops=1) GPU Preference: GPU0 (Tesla V100-PCIE-16GB) GPUDirect SQL: load=11395910 Kernel Source: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_39266.2.gpu Kernel Binary: /var/lib/pgdata/pgsql_tmp/pgsql_tmp_strom_39266.3.ptx Worker 0: actual time=8694.640..8694.644 rows=1 loops=1 Worker 1: actual time=8699.829..8699.833 rows=1 loops=1 Planning Time: 0.129 ms Execution Time: 9194.200 ms (20 rows)
ãã®ã¯ã¨ãªã«ã¯å
¨ä½ã§9.2ç§ãè¦ãã¦ãã¾ããããã®ãã¡8.760ç§ã GpuPreAgg ã§ã®å®è¡ã«è¦ãã¦ãã¾ãã
ããã§ã¯ GPU-Direct SQL ãç¨ãã¦ã4å°ã®NVME-SSDãã 10GB/s ç¨åº¦ã®ã¹ã«ã¼ãããã§åè¨6åè¡ãèªã¿åºãã¦ãã¾ãããGpuPreAggãåºåãã¦ããã®ã¯HLLã¬ã¸ã¹ã¿é
åã®1è¡ã ãã§ããã®ã§ãé常ã«å¹ççãªãã¼ã¿è»¢éãè¡ããã¦ããã¨è¨ãã¾ãã
ä¸ã®å³ã§è¨ãã°ããã¼ãã«ï¼ãã£ã¹ã¯ï¼ãããã¼ã¿ãèªã¿åºããGPUä¸ã§å®è¡ããã hll_pcount() é¢æ°ã«ãã¼ãããã¨ããã¾ã§ããã¹ã«ã¼ãããçªé·ã§ãã PG-Strom ã®ç骨é ã§ãã¯ã¨ãªã®æ¸ãæãã¨ã¢ã«ã´ãªãºã ã®å·¥å¤«ã«ãããåä»ãªCOUNT(distinct KEY)
ããã®ãããªå½¢æ
ã®å¦çã«æ¸ãæããã¨ããã HyperLogLog ã®æ©æµã¨è¨ããã§ãããã
çµè«
COUNT(distinct KEY)
é¢æ°ã§ã大ã¾ããªæ¨å®å¤ããå¾ãã°ååã§ããå ´åãHyperLogLogã使ã£ã¦ç¸å¿ã«ç²¾åº¦ã®è¯ãæ¨å®å¤ãå¾ãäºãã§ãããCOUNT(distinct KEY)
é¢æ°ããdistinctå¥ã®ä»ããªãéç´é¢æ°ã«æ¸ãæããäºã§ãé ååå²ã¨ä¸¦åå¦çãå¯è½ãªå½¢å¼ã«å¤æã§ããããã®ãã¿ã¼ã³ã«è½ã¨ãäºãã§ããã°ãGPU-Direct SQLã§ã»ã¼ã»ã¼ãã¼ãã¦ã§ã¢ã®éçã«è¿ãé度ã§éè¨å¦çãåããã¨ãã§ããã
ã²ã¨ã¾ããç¾ç¶ã§ã¯è«æã«æ¸ããã¦ããå 容ããã®ã¾ã¾ä½ãèããå½¢ãªã®ã§ãä¾ãã°ã«ã¼ãã£ããªãã£ãå°ããæã®æ¨å®å¤ã®ãºã¬ããããæ£ç¢ºãªæ¨å®å¤ãå¾ãããã®è£æ£ï¼é¢é£ç 究ã§ããããã®ããããããï¼ã«ã¤ãã¦ã¯ãå ¨ãä½ãå ¥ã£ã¦ãã¾ããã誰ãããããã®ã«å¼·ã人ãããããæ¸ãã¦ããããããã¨å©ããã¾ãï¼ãã½ã