ããã«ã¡ã¯ããã¼ã±ãã£ã³ã°ãµãã¼ãäºæ¥é¨ãã¼ã¿ã¤ã³ããªã¸ã§ã³ã¹ã°ã«ã¼ãã®äºä¸å¯ä¹ï¼@inohiroï¼ã§ããæ®æ®µã¯ãã¼ã±ãã£ã³ã°ã«ä½¿ããããã©ã¤ãã¼ãDMPï¼ãã¼ã¿ããã¸ã¡ã³ããã©ãããã©ã¼ã ï¼ã®éçºãè¡ã£ã¦ãã¾ããæ¬ç¨¿ã§ã¯ããã®éç¨ã§å¾ãããåªçãªãã¼ã¿å¦çã¸ã§ãã®æ¸ãæ¹ã«é¢ãã工夫ãç´¹ä»ãããã¨æãã¾ããä»åã¯ãRDBMSä¸ã§ SQL ã«ãããã¼ã¿å¦çãåæã«ç´¹ä»ãã¾ããããã®èãæ¹ã¯ä»ã®è¨èªãç°å¢ã«ããããã¼ã¿å¦çã«ã¤ãã¦ãå¿ç¨ã§ããã¯ãã§ãã
ã¾ãã¯ãã¯ãããã®DMPã¨ãåªçãªã¸ã§ãã«ã¤ãã¦ç°¡åã«èª¬æããã¸ã§ããåªçã«ãããã¤ã³ããæãã¾ããã¾ããSQL ãããã¸ã§ããã¬ã¼ã ã¯ã¼ã¯ã§ãã bricolage ã使ã£ããåªçãªã¸ã§ãã®å®è£ ä¾ã示ãã¾ãã
ã¯ãã¯ãããã®DMPã¨åªçãªã¸ã§ã
ã¯ãã¯ãããã®ãã©ã¤ãã¼ãDMPã¯ããã¼ã¿ã¦ã§ã¢ãã¦ã¹ï¼ç¤¾å ã®å·¨å¤§ãªåæç¨ãã¼ã¿ãã¼ã¹ã§ãã¯ãã¯ãããã§ã¯ Amazon Redshift ã使ã£ã¦ãããä»¥ä¸ DWH) ä¸ã§æ§ç¯ããã¦ããã主㫠cookpad.com ä¸ã®ã¿ã¼ã²ããåºåãã社å ã®ãã¼ã¿åæã«æ´»ç¨ããã¦ãã¾ããææã¨ãªããã¼ã¿ã¯ãåºåã®ã¤ã³ãã¬ãã·ã§ã³ãã°ããã¯ãã¯ãããä¸ã§ã®æ¤ç´¢ã»ã¬ã·ãé²è¦§ãã°ã§ããã¾ãä»ç¤¾ããå¾ããã¼ã¿ã DWH ã«åãè¾¼ãã§ãæ´»ç¨ããããã¦ãã¾ãã
ãããã®ãã¼ã¿ãæ´»ç¨ãããããã¸ã§ã群ã¯ã社å ã§ãæ¯è¼ç大ããã®ãµã¤ãºã«ãªã£ã¦ãããéä¸ã§ã¸ã§ããæ¢ã¾ã£ã¦ãã¾ããã¨ãèæ ®ãã¦ãåºæ¬çã«ããããã®ã¸ã§ããåªçãªçµæãçæããããã«éçºããã¦ãã¾ãã
åªçã«ã¤ãã¦ã®è©³ãã説æã¯çç¥ãã¾ãããç°¡åã«è¨ãã¨ãããã¸ã§ããä½åº¦å®è¡ãã¦ããåãçµæãå¾ããããã¨ãããã¨ã§ããç¹ã«ãã¼ã¿å¦çã®æèã«ããã¦ã¯ããéä¸ã§éè¨ã¸ã§ãã失æãã¦ãã¾ã£ããããã«ãããæ¥ã®ãã¼ã¿ãéè¤ã»æ¬ æãã¦çæããã¦ãããã¨ãããã¨ã¯ãã£ã¦ã¯ãªãã¾ãããã¸ã§ããåªçã«ãªãããã«éçºããã¦ããã°ã失æããå ´åã®ãªãã©ã¤ãæ¯è¼çç°¡åã«ãªãã¾ããã¾ããã¸ã§ãã失æããªãã£ãã¨ãã¦ããï¼ãªããã¹çã§ï¼ãã¾ãã¾è¤æ°åå®è¡ãããããããã¾ããããæ¯ååãçµæãçæãããã¹ãã§ãã
ããã«ãã¸ã§ããåªçã«ãªãããã«éçºããã¨ãéçºæã«æå ã§è©¦ãã«å®è¡ãã¦ã¿ãã¨ããæ¤è¨¼ãç°¡åãªãããããããã§ãã
åªçãªã¸ã§ãã«ãããã¤ã³ã
ãã©ã¤ãã¼ã DMP ãéçºãã¦å¾ããããã¸ã§ããåªçã«ããããã®ãã¤ã³ãã¯ãºããªããã©ã³ã¶ã¯ã·ã§ã³ã使ããã§ãã
ãã©ã³ã¶ã¯ã·ã§ã³ã使ã£ã¦ãã¼ã«ããã¯
大éã®ãã¼ã¿ããé·æéï¼Næéï¼ããã¦æ¸ãè¾¼ããããªãããã¸ã§ããèããã¨ããéä¸ã§æ¢ã¾ã£ã¦ãã¾ã£ããããããã復æ§ï¼ãªãã©ã¤ï¼ããã¨ããç¶æ³ã¯äºãèæ ®ããã¦ããã¹ãã§ãããã®ã¨ããæ¸ãè¾¼ãå ããã©ã³ã¶ã¯ã·ã§ã³ããµãã¼ããããããªãã¼ã¿ãã¼ã¹ï¼ä¸è¬çãªRDBMSãªã©ï¼ãªãã°ããã©ã³ã¶ã¯ã·ã§ã³ãå©ç¨ãã¾ããããä¸ã¤ã®ãã©ã³ã¶ã¯ã·ã§ã³ã¨ãã¦ã¾ã¨ããä¸é£ã®å¦çã¯ãããã¹ã¦æåããç¶æ ãããããã¹ã¦å¤±æããç¶æ ï¼ãã¼ã«ããã¯ï¼ãã®ã©ã¡ããã«ãªããã¨ãä¿è¨¼ãããä¸éå端ãªç¶æ ã«ã¯ãªãã¾ãããéä¸ã§å¤±æãã¦ããæåããããã¶æ¸ãç´ããã¨ã«ãªãã¾ãããåªçæ§ã¯ä¿ããã¦ãã¾ãã
ã¯ãã¯ãããã® DMP ã¯ä¸¦å忣 RDB ã§ãã Amazon Redshift ä¸ã«æ§ç¯ããã¦ããã®ã§ããã©ã³ã¶ã¯ã·ã§ã³ããã«ã«æ´»ç¨ãã¦ãã¾ãã
èªåã§ãã¼ã«ããã¯
ä¸åº¦å®è¡ãããéè¨ã¸ã§ããå度å®è¡ããå ´é¢ãèãã¦ã¿ã¾ããå度å®è¡ãããçç±ã¯ããããèãããã¾ããããæå³ããééã£ã¦å®è¡ããã¦ãã¾ã£ããã¨ããã®ãåããããªç¶æ³ã¨èãããã¾ããååå®è¡ããã¨ãã¨åãçµæãå¾ãããã°åé¡ããã¾ããããéè¨ããçµæãéè¤ãã¦ãã¾ãã¨ãå¾ç¶ã®ã¸ã§ãã失æããããææªã®å ´åæ£ãããªãåæçµæãç¨ãã¦ãä½ããã®æææ±ºå®ãè¡ããã¦ãã¾ãããããã¾ããã
ã¤ã¾ããç¾å¨å®è¡ä¸ã®ã¸ã§ããæ¸ãè¾¼ããã¼ãã«ã«ãä»ããæ¸ãè¾¼ããã¨ãã¦ããæ¡ä»¶ã§ãæ¢ã«ãã¼ã¿ãæ¸ãè¾¼ã¾ãã¦ãããããããªãã®ã§ããããã§ãæ°ããªçµæããæ¸ãè¾¼ãåã«ãæ¢åã®è¡ãåé¤ï¼èªåã§ãã¼ã«ããã¯ï¼ãããã¨ã§éè¤ã®çºçãé¿ãã¾ããããã«ããåé¤ãã¨ãæ°ããçµæã®æ¸ãè¾¼ã¿ããä¸ã¤ã®ãã©ã³ã¶ã¯ã·ã§ã³ã«ã¾ã¨ãããã¨ã§ããã®ã¸ã§ãã¯åªçã«ãªãã¾ãã
åªçãªãã¼ã¿æ§é ãå©ç¨ãã
䏿¹ã§ããã©ã³ã¶ã¯ã·ã§ã³ããµãã¼ãããªããã㪠NoSQL ãã¼ã¿ãã¼ã¹ã使ã£ã¦ããã¨ããã¸ã§ããåªçã«ããã®ã¯æ¯è¼çç°¡åã§ã¯ããã¾ããããã®ãããªç¶æ³ã§èããããä¸ã¤ã®è§£æ±ºçã¨ãã¦ãä½åº¦æ¸ãè¾¼ã¾ãã¦ãçµæãå¤ãããªããã¼ã¿æ§é ã®å©ç¨ãæãããã¾ããéåï¼Setï¼ãããã·ã¥ãã¼ãã«ã§ãããããã®ãã¼ã¿æ§é ã¯ããã¼ã¿ã®é åºã¯ä¿è¨¼ãããªããã®ã®ãæ¢ã«åå¨ããå¤ï¼ãããã¯ãã¼ï¼ãæ¸ãè¾¼ãã§ããè¦ç´ ãéè¤ãã¾ããã
ã¯ãã¯ãããã® DMP ã§ä½æããã¿ã¼ã²ããåºåç¨ã®ãã¼ã¿ã¯ãæçµçã« Amazon DynamoDB *1 ã«æ¸ãè¾¼ã¾ããåºåé ä¿¡ãµã¼ãã¼ããã®ãã¼ã¿ã使ã£ã¦ãã¾ããã¿ã¼ã²ããåºåç¨ã®ãã¼ã¿ã¯ãä¸åº¦ã«æ°åä¸è¦ç´ ããããã¸ã§ãã並åã§æ¸ãè¾¼ã¿ã¾ããããã®ã¸ã§ããç¨ã«å¤±æãããã¨ããã£ãããéå»ã«æ¸ãè¾¼ã¾ãã¦ããè¦ç´ ãæãçµã¦å度æ¸ãè¾¼ã¾ãããã¨ããããããSSï¼æååã®ã»ããï¼åã使ã£ã¦ãã¾ããéå»ã«ã¯ Redis ã®ã»ããåã使ã£ã¦ãããã¨ãããã¾ããã
bricolage ã«ããåªçãªã¸ã§ãã®å®è£ ä¾
ã¯ãã¯ãããã® DMP ã ãã§ãªãã社å ã§ SQL ãããã¸ã§ããæ¸ãã¨ãã®ããã¡ã¯ãã¹ã¿ã³ãã¼ãã«ãªã£ã¦ãã bricolage ã«ã¯ãé »åºãã¿ã¼ã³ã®ã¸ã§ããæ¸ãéã«ä¾¿å©ãªãã¸ã§ãã»ã¯ã©ã¹ããããã¤ããããããã使ããã¨ã§åªçãªã¸ã§ããç°¡åã«å®è£ ãããã¨ãã§ãã¾ãããã®ç¯ã§ã¯ bricolage ã使ã£ãããã©ã³ã¶ã¯ã·ã§ã³ã§ãã¼ã«ããã¯ããã¿ã¼ã³ã¨ããèªåã§ãã¼ã«ããã¯ããã¿ã¼ã³ã®å®è£ ä¾ã示ãã¾ãã
bricolage ã«ã¤ãã¦ã¯ãããã§ã¯è©³ãã説æãã¾ãããã詳細ã«ã¤ãã¦ã¯éå»ã®è¨äºã巨大ãªããããåå²ãã¦æ§æãã ãSQLããããã¬ã¼ã ã¯ã¼ã¯BricolageããããRubyKaigi 2019 ã§ã®LTãWrite ETL or ELT data processing jobs with bricolage.ãããåç §ãã ãããã¾ã inohiro/rubykaigi2019_bricolage_demo ã«ãã¢ããã¸ã§ã¯ããç½®ãã¦ããã¾ãã
ããã©ã³ã¶ã¯ã·ã§ã³ã§ãã¼ã«ããã¯ããã¿ã¼ã³
rebuild-drop
ããã㯠rebuild-rename
ã¸ã§ãã»ã¯ã©ã¹ã使ãã¨ããç¾è¡ã®ãã¼ãã«ãåé¤ããæ°è¦ã®ãã¼ãã«ã«éè¨çµæãæ¸ãè¾¼ããã¾ãã¯ãæ°è¦ã«ãã¼ãã«ãä½ããéè¨çµæãæ¸ãè¾¼ã¿ãç¾è¡ã®ãã¼ãã«ã¨ããæ¿ãããã¨ããæä½ããä¸ã¤ã®ãã©ã³ã¶ã¯ã·ã§ã³ã§è¡ãã¸ã§ããç°¡åã«å®è£
ãããã¨ãã§ãã¾ããrebuild-drop
ã¯å¯¾è±¡ã®ãã¼ãã«ãä½ãç´ãåã« drop table
ããrebuild-rename
ã¯ããæ¿ããããå¤ããã¼ãã«ããå¥åã§æ®ãã¦ããã¾ãã
以ä¸ã¯ãæ¯æ¥ä½ãå¤ãããããããªãµããªã¼ãã¼ãã«ã rebuild-drop
ã¸ã§ãã»ã¯ã©ã¹ã§å®è£
ããä¾ã§ãã
/* class: rebuild-drop -- ã¸ã§ãã»ã¯ã©ã¹ã®æå® dest-table: $public_schema.articles_summary table-def: articles_summary.ct src-tables: pv_log: $public_schema.pv_log analyze: false */ insert into $dest_table select date_trunc('day', logtime)::date as day , id_param::integer as article_id , count(*) as pv from $pv_log where controller = 'articles' and action = 'show' and logtime < '$today'::date group by 1, 2 ;
ãã®ã¸ã§ãã¯ã以ä¸ã® SQL ã«å¤æããã¦å®è¡ããã¾ãã
\timing on begin transaction; -- ãã©ã³ã¶ã¯ã·ã§ã³éå§ drop table if exists public.articles_summary cascade; -- æ¢åãã¼ãã«ã®åé¤ /* /Users/hiroyuki-inoue/devel/github/rubykaigi2019_bricolage_demo/demo/articles_summary.ct */ create table public.articles_summary ( day date , article_id integer , pv bigint ) ; /* demo/articles_summary-rebuild.sql.job */ insert into public.articles_summary select date_trunc('day', logtime)::date as day , id_param::integer as article_id , count(*) as pv from public.pv_log where controller = 'articles' and action = 'show' and logtime < '2019-07-13'::date group by 1, 2 ; commit; -- ãã©ã³ã¶ã¯ã·ã§ã³çµäº
ã¸ã§ãå
¨ä½ã begin transaction;
㨠commit;
ã§å²ããã¦ããã®ã§ãä»®ã«éè¨ã¯ã¨ãªã«åé¡ããã失æããå ´åã¯ãå
ã®ãã¼ãã«ã¯åé¤ãããã«æ®ãã¾ãã
ãèªåã§ãã¼ã«ããã¯ããã¿ã¼ã³
insert-delta
ã¸ã§ãã»ã¯ã©ã¹ã¯æ¢åã®ãã¼ãã«ã«å·®åãæ¸ãè¾¼ãããã«å©ç¨ãããå·®åãæ¸ãè¾¼ãç´åã«æå®ããæ¡ä»¶ã§delete
ãå®è¡ãã¾ããã¾ããä¸é£ã® SQL ã¯ä¸ã¤ã®ãã©ã³ã¶ã¯ã·ã§ã³ã®ä¸ã§è¡ãããã®ã§ãdelete
ç´å¾ã®å·®åãéè¨ããã¯ã¨ãªã失æãã¦ãå®å¿ã§ãã
以ä¸ã¯ãæ¥æ¯ã«åºåã¤ã³ãã¬ãã·ã§ã³ãèç©ãã¦ãããã¼ãã«impressions_summary
ã«ã忥ï¼$data_date
ï¼*2ã®éè¨çµæãæ¸ãè¾¼ãã¸ã§ãã®ä¾ã§ããdelete-cond:
ã«å餿¡ä»¶ãæå®ãã¾ããä»åã®ä¾ã§ã¯ãéç´æ¡ä»¶ã®ä¸ã¤ã§ããæ¥ä»ãæå®ãã¦ãã¾ãã
/* class: insert-delta -- ã¸ã§ãã»ã¯ã©ã¹ã®æå® dest-table: $public_schema.impressions_summary table-def: impressions_summary.ct src-tables: impressions: $ad_schema.impressions delete-cond: "data_date = '$data_date'::date" -- å餿¡ä»¶ã®æå® analyze: false */ insert into $dest_table select '$data_date'::date as data_date , platform_id , device_type , count(*) as impressions from $impressions group by 1, 2, 3 ;
ãã®ã¸ã§ãã¯ä»¥ä¸ã®ãã㪠SQL ã«å¤æãããå®è¡ããã¾ãã
\timing on begin transaction; -- ãã©ã³ã¶ã¯ã·ã§ã³éå§ delete from impressions_summary where data_date = '2019-07-12'::date; -- æ¢åè¡ãæå®ããæ¡ä»¶ã§åé¤ /* demo/impressions_summary-add.sql.job */ insert into impressions_summary select '2019-07-12'::date as data_date , platform_id , device_type , count(*) as impressions from ad.impressions group by 1, 2, 3 ; commit; -- ãã©ã³ã¶ã¯ã·ã§ã³çµäº
ãã¼ãã«ã«æ¸ãè¾¼ãåã«æå®ããæ¡ä»¶ï¼delete-cond: "data_date = '$data_date'::date"
ï¼ã§ delete
ã¯ã¨ãªãå®è¡ããã"æé¤"ãã¦ããæ¸ãè¾¼ãã¯ã¨ãªãå®è¡ãããã®ã確èªã§ããã¨æãã¾ãã対象ã®è¡ããªããã°ä½ãåé¤ããã¾ãããã対象ã®è¡ãåå¨ããã°ãæ°ããªçµæãæ¸ãè¾¼ãåã«åé¤ããã¾ãã
ã¾ã¨ã
æ¬ç¨¿ã§ã¯ãã¯ãã¯ãããã® DMP éçºã«ããã¦ãåªçãªãã¼ã¿å¦çã¸ã§ãããæ¸ãããã«è¡ããã¦ããããã¤ãã®å·¥å¤«ã«ã¤ãã¦ç´¹ä»ãã¾ãããã¾ããbricolage ã使ã£ã¦ãããã®ã¸ã§ããå®è£ ããä¾ã示ãã¾ããã
ãã®ããã«ããã©ã³ã¶ã¯ã·ã§ã³ã®ãããã¼ã¿ãã¼ã¹ãå©ç¨ããå ´åã¯ããªãã¹ããã®æ©æµã«ä¹ã£ããã®ããæè»½ã§ããã¾ããä¸ã¤ã®ã¸ã§ãã«è²ã ãªãã¨ãè©°ãè¾¼ã¾ããã¸ã§ããå°ããä¿ã¤ãã¨ã§ããã¼ã«ããã¯ã®å¯¾è±¡ãå°ãããªãã失æããå ´åã®ãªãã©ã¤ãªã©ãã·ã³ãã«ã«è¡ããã¨æãã¾ããbricolage ã®ã¸ã§ãã»ã¯ã©ã¹ã䏿ã«ä½¿ããã¨ã§ããã©ã³ã¶ã¯ã·ã§ã³ãå©ç¨ããåªçãªãã¼ã¿å¦çã¸ã§ããç°¡åã«å®è£ ãããã¨ãã§ãã¾ãããã²ã試ããã ããã
*1:ãã®è¨äºãæ¸ãã¦ãã¦æãåºãã¾ããããAmazon DynamoDB ã¯ãã©ã³ã¶ã¯ã·ã§ã³ããµãã¼ãããã®ã§ãã https://aws.amazon.com/jp/blogs/news/new-amazon-dynamodb-transactions/
*2:夿°ã«ã¯åæ¥ã®æ¥ä»ãå ¥ãããã«ä»®å®ãã¦ããããã¸ã§ãã®ãªãã·ã§ã³ã§ä¸æ¸ããå¯è½