ããã«ã¡ã¯ãæè¡é¨ãã¼ã¿åºç¤ã°ã«ã¼ãã®ä½è¤ã§ãããã®è¨äºã§ã¯æè¿æ¥åã¨ãã¦ä¸»ã«åãçµãã§ããDWHããå¤é¨ã¸ã®ãã¼ã¿è»¢éåºç¤ã§ããQueueryï¼ãã ããï¼ã«ã¤ãã¦ãOSSã¨ãã¦GitHubã¸ã®å ¬éãã¾ããã®ã§ãã®è¨äºã§ãç´¹ä»ããã¾ãã
Queueryã¨ããã·ã¹ãã ã¯2017å¹´ã®æ¥é ã«id:koba789ã®æã«ããä½ãããã¯ãã¯ãããã®ãã¼ã¿åºç¤ã«ãããéè¦ãªç«ã¡ä½ç½®ãæ ã£ã¦ãã¾ãã
èæ¯
徿¥ãRedshiftã§SELECTæãªã©ã®åå¾ç³»ã¯ã¨ãªãå®è¡ããããã«ã¯Redshiftã«ç´æ¥æ¥ç¶ãã¦ã¯ã¨ãªãçºè¡ãã¦ãã¾ããããã®æ¹æ³ã§ã¯ã¯ã¨ãªçµæã巨大ãªå ´åã«ã¯ã©ã¤ã¢ã³ãå´ã®ãªã½ã¼ã¹ãé¼è¿«ããããã¨ãããã¾ããã
ãããããããé¿ããããã«ã«ã¼ã½ã«ã使ãããã®ãªãä»åº¦ã¯ãã¡ã¾ã¡Redshiftã®ãªã¼ãã¼ãã¼ãã®å ·åãæªããªã£ã¦ãã¾ãã¾ããRedshiftãã巨大ãªçµæãå¾ãã¯ã¨ãªãå¤é¨ããå®è¡ããããã«ã¯æ§ã ãªå·¥å¤«ãå¿ è¦ã§ããã
ããã«é常ã®ï¼PostgreSQLãããã³ã«ã使ã£ãï¼æ¥ç¶æ¹å¼ã§ã¯é éå°ï¼å¥AWSãªã¼ã¸ã§ã³ï¼ããã®æ¥ç¶ãé£ããã£ãããããã³ãã¯ã·ã§ã³ãåããããã³ãã¯ã·ã§ã³ãåããã¨çµæãåå¾ã§ããªãã£ãããã¾ããAWSã®Security Groupã®è¨å®ãå¿ããã¡ã§ãã ã¾ããã»ããã¢ãããActiveRecordçµç±ã«ãªãããåç´ã«è¨å®ãé¢åã§ãããããActiveRecordã使ã£ãããããä½¿ãæ¹ãã§ãã¦ãã¾ããããæ¨æºåãå°é£ã§ãã
Queueryã¯ãããã®åé¡ã解決ããããã«ããã¾ããQueueryã使ããã¨ã§ãã¯ã©ã¤ã¢ã³ãã¯Redshiftã«ç´æ¥æ¥ç¶ããHTTP APIã§åå¾ç³»ã¯ã¨ãªãå®è¡ã§ããããã«ãªãã¾ãã

ä»çµã¿
Queueryã¯Redshiftã¸Unloadæãæããå½¹å²ãæã¤APIãµã¼ãã¼ã¨ãUnloadçµæãS3ããåå¾ããã¯ã©ã¤ã¢ã³ãã«åããã¦ãã¾ããã¯ã©ã¤ã¢ã³ãå´ããæããããSELECTã¯ã¨ãªãHTTP APIå´ã§åãåããUnloadæã¸ã©ãããã¦Redshiftã«æãã¾ããã¯ã©ã¤ã¢ã³ãå´ã¯ãã®çµæããã¼ãªã³ã°ãç¶ããUnloadãå®äºãããS3ã¸ã¢ã¯ã»ã¹ãã¦çµæãåå¾ããããã«ãªã£ã¦ãã¾ãã
ã§ãããéãQueueryå©ç¨è ã®éçºãåç´åãããããã¯ã©ã¤ã¢ã³ãã¯gemåããã¦ãããGemfileã«è¿½å ãã¦è¨å®ãã¡ã¤ã«ã追å ããã°ããå©ç¨ã§ããããã«ãªã£ã¦ãã¾ãã
ã¯ã©ã¤ã¢ã³ãã®ãµã³ãã«ã³ã¼ã
ä¸è¨ã®ã³ã¼ããã¯ã©ã¤ã¢ã³ãå´ã§ã¸ã§ãã«æ¸ããå¿ è¦ãªã¿ã¤ãã³ã°ã§ãããå®è¡ããã ãã§Redshiftã«ãããã¼ã¿ãæ±ããããã«ãªãã¾ãã
Queueryã®è¨å®ãã¡ã¤ã«
# configuration RedshiftConnector.logger = Logger.new($stdout) GarageClient.configure do |config| config.name = "queuery-example" end QueueryClient.configure do |config| config.endpoint = 'queuery_api_server_host' config.token = 'XXXXXXXXXXXXXXXXXXXXX' config.token_secret = '*******************' end
Queueryã®ã¯ã©ã¤ã¢ã³ãã³ã¼ã
select_stmt = 'select column_a, column_b from the_great_table; -- an awesome query shows amazing fact up' bundle = QueueryClient.query(select_stmt) bundle.each do |row| # do some useful works p row end
ã³ã³ã½ã¼ã«
ã¾ããç°¡æçãªãã®ã§ã¯ããã¾ããWebã³ã³ã½ã¼ã«ãä»å±ã§ç¨æãã¦ãããã³ã³ã½ã¼ã«ã§ã¯ã¯ã©ã¤ã¢ã³ãå´ã®èªè¨¼ã«å¿ è¦ãªãã¼ã¯ã³ãçºè¡ã»ç¡å¹åããããç´è¿ã§Queueryã«æããããã¯ã¨ãªã®æ§åã確èªã§ãã¾ãã

Queueryãµã¼ãã¼ã®APIå´ã¯ã·ã³ãã«ãªRailsã§ä½ããã¦ãããã³ã³ã½ã¼ã«ã®ããã³ãã¨ã³ãã¯TypeScriptã¨Reactã§SPAã«ãã¦ãã¾ãã
æè¿ã®æ¹ä¿®å 容
Queueryãç´¹ä»ããã¤ãã§ã«ä»å¹´èªåãæ¹ä¿®ãè¡ã£ãç®æã«ã¤ãã¦æ¸ãã¦ããã¾ãã
Queueryã¢ã«ã¦ã³ãã¨Redshiftã¦ã¼ã¶ã¼ã®ç´ä»ã
以åã¯Queueryã®ã³ã³ã½ã¼ã«ãã好ããªååã®Queueryã¢ã«ã¦ã³ãã誰ã§ãä½ããã¨ãã§ããæ¢åQueueryã¢ã«ã¦ã³ãã®èªè¨¼ç¨ãã¼ã¯ã³ã誰ã§ãæå¹ã»ç¡å¹åãæ¿ããã§ãã仿§ã«ãªã£ã¦ãã¾ãããã¾ããRedshiftã§ã®Unloadæå®è¡ã¯Queueryå°ç¨ã«ç¨æããã1ã¤ã®Redshiftã¦ã¼ã¶ã¼ã«ãã£ã¦è¡ããã¦ãã¾ããã
ãã®ã¾ã¾ã§ã¯ç¤¾å¡ã®èª°ããä»ãã¼ã ã®Queueryã¢ã«ã¦ã³ãã«æãå ãã¦ãã¾ãæããããã¾ããã¾ããUnloadã«ä½¿ãã¦ã¼ã¶ã¼ã®æ¨©éãQueueryã¢ã«ã¦ã³ãæ¯ãåå¥ã«åãããã¨ãã§ãã¾ãããDWHã«é¢ããDevOpsãé²ãã¦ããä¸ç°ã¨ãã¦ãå©ç¨è ã®æ¨©éããã¡ãã¨åé¢ããQueueryã¢ã«ã¦ã³ãããã®ã¢ã«ã¦ã³ãææè ã»ææãã¼ã ã®Redshiftã¦ã¼ã¶ã¼ã«ããæ±ããªãããã«ããå¿ è¦ãããã¾ããã
ããã§ãQueueryã¢ã«ã¦ã³ãã«ã¤ãã¦ã使ã»èªè¨¼ç¨ãã¼ã¯ã³ä½æãåé¤ã®ã¿ã¤ãã³ã°ã§Redshiftã¦ã¼ã¶ã¼ã®èªè¨¼ãæ±ããããã«ãã¾ãããèªè¨¼ä½æ¥èªä½ã¯Redshiftã«ãã§ãã¯ç¨ã®åç´ãªã¯ã¨ãªãç´æ¥ãæããã®ã¿ã¨ããæ¬äººç¢ºèªãã¨ããã°ã¦ã¼ã¶ã¼åã®ã¿è¨é²ãããã¨ã¨ãã¾ãã
ãã®å¾ãå®éã®Unloadæå®è¡æã«ã¯ç»é²ãããã¦ã¼ã¶ã¼åã使ã£ã¦GetClusterCredentials APIã§ä¸æçãªã¦ã¼ã¶ã¼ã使ãããã¨ã«ãã¾ããã
temporal_credential = Aws::Redshift::Client.new.get_cluster_credentials({ db_user: redshift_user, db_name: database_name, cluster_identifier: cluster_identifier, auto_create: false }) ds.config.merge!(username: temporal_credential.db_user, password: temporal_credential.db_password) export_execute(datasource: ds, query_statement: sql, logger: logger)
ãããããã¨ã§Queueryã¢ã«ã¦ã³ãã®ç®¡çã¯ææè ã§ããRedshiftã¦ã¼ã¶ã¼ã®ã¿ãè¡ããã¢ã«ã¦ã³ãæ¯ã®ã¯ã¨ãªå®è¡ã¯ãã®ã¢ã«ã¦ã³ãã«ç´ä»ããããRedshiftã¦ã¼ã¶ã¼ã«åºã¥ãã¦å®è¡ãããããã«ãªãã¾ããã
ãã ããç¾ç¶ã§ã¯Redshiftã¦ã¼ã¶ã¼ã¨Queueryã¢ã«ã¦ã³ãã¨ã®2é管çã«ãªã£ã¦ãããæ¨©é管çãç¡ç¨ã«è¤éåãã¦ããã¨ããåé¡ãæ±ãã¦ãã¾ãããã®ç¹ã«ã¤ãã¦ã¯ä»å¾Queueryå´ã§ã®ã¢ã«ã¦ã³ã管çããããRedshiftã¦ã¼ã¶ã¼ããã®ã¾ã¾Queueryå´ã®ã¢ã«ã¦ã³ãã¨ãã¦æ±ããããã«ããããã¨æ¤è¨ãã¦ãã¾ãã
Unloadæã®manifest.jsonã使ã£ãåãã£ã¹ã
ããã¾ã§Queueryã«ããåºåããããã¡ã¤ã«ã¯å ¨ã¦å§ç¸®&åå²ãããCSVã¨ãã¦S3ã«åºåããã¦ãããQueueryã¯ã©ã¤ã¢ã³ãã§ã¯ãã®åãèªåå¤å¥ãããã¨ãã§ãã¾ããã§ããããã®ãããQueueryã¯ã©ã¤ã¢ã³ããå©ç¨ããéçºè ã¯åå¾ããçµæã«å¯¾ãã¦æåã§åãã£ã¹ããè¡ãã³ã¼ããæ¸ãå¿ è¦ãããã¾ããã
Redshiftã®Unloadæã«ã¯æ§ã
ãªãªãã·ã§ã³ãããããã®ä¸ã«ã¯Unloadçµæã«é¢ããã¡ã¿æ
å ±ãåºåãããMANIFESTãªãã·ã§ã³ãããã¾ãã
https://docs.aws.amazon.com/ja_jp/redshift/latest/dg/r_UNLOAD.html
ãã®ãªãã·ã§ã³ã«ããåºåãããJSONå½¢å¼ã®ãããã§ã¹ããã¡ã¤ã«ã®ä¸ã«ã¯ååã¨ãã¼ã¿åã«é¢ããæ å ±ãå«ã¾ãã¦ãã¾ãããã®ãããã§ã¹ããã¡ã¤ã«ãèªã¿ãèªåã§åã«ã©ã ã®åãå¤å¥ãã¦åãã£ã¹ããã§ãããããQueueryãµã¼ãã¼ã¨Queueryã¯ã©ã¤ã¢ã³ãã®ä¸¡æ¹ã«æ¹ä¿®ãè¡ãã¾ããã
sql = "selectt 1, 1::bigint, 1.0, 'hoge', false, date '2021-01-01', timestamp '2021-01-01 00:00:00', null" bundle1 = QueueryClient.query(sql) # 徿¥ bundle1.each do |row| p row # => ["1", "1", "1.0", "hoge", "f", "2021-01-01", "2021-01-01 00:00:00", ""] end bundle2 = QueueryClient.query(sql, enable_cast: true) # åãã£ã¹ããªãã·ã§ã³è¿½å bundle2.each do |row| p row # => [1, 1, 1.0, "hoge", false, Fri, 01 Jan 2021, "2021-01-01 00:00:00", nil] end
ã¾ããå¯ç£ç©ã¨ãã¦å¾æ¥ã§ã¯æåååã®ç©ºæååã¨åºå¥ãã¥ããã£ãnullããã¡ãã¨åºå¥ã§ããããã«ãªãã¾ããã
BarbequeããRedshift DataAPIã¸ã®éåæå¦çç§»è¡
Queueryã§ã¯SQLãåãä»ãã¦ããUnloadæã®å®è¡çµæãè¿å´ããã¾ã§ãå¦çæéã¯SQLã®å 容ã«ä¾åãã¦ãã¾ããSQLã«ãã£ã¦ã¯éå¸¸ã«æéãããã£ã¦ãã¾ããããéåæåãããå¿ è¦ãããã¾ãããããã§ãå ã ã¯Barbequeã¨ãããã¥ã¼ã·ã¹ãã ãå©ç¨ãã¦ã¸ã§ãã®éåæåããã¦ãã¾ãããBarbequeã¯Dockerã¨SQSãå©ç¨ããã¸ã§ããã¥ã¼ã·ã¹ãã ã§ãã
以åã¯ããã§ãã¾ããã£ã¦ããã®ã§ããã2020å¹´4æã«èµ·ããSQSé害ã§å½±é¿ãåãããã¨ããQueueryã®æ§æãè¤éåãã¦ãããã¨ãªã©ãããããã£ã¨ã·ã³ãã«ã§é 奿§ã®é«ãä»çµã¿ã«ã§ããªããã¨èãããã¦ãã¾ããã
ããã§ã2020å¹´ã«Redshift Data APIãçºè¡¨ããããã®APIã«å«ã¾ããexecuteStatementã¨describeStatementãå©ç¨ããã°Barbequeä¾åãå¤ãããã ã¨ããæ¡ãä¸ããã¾ããã調æ»ããã¨ããéåæå¦çã®å¨è¾ºããã¡ãã§ä¿ã¤å¿
è¦ããªããQueueryã®æ§æãã·ã³ãã«åã§ãããã ã¨ãããã¨ããããã¾ããã

ç§»è¡å¾ã¯ç¹ã«åé¡ãããåé¡ãçºçãããã¨ç¡ãå®å®ãã¦ç¨¼åããç¡äºBarbequeããã®ä¾åãåãé¤ããã¨ãã§ãã¾ããã
Queueryã¨å¼ç¤¾ãã¼ã¿åºç¤ã®æ§æ
ããããRedshift Data APIãæ±ããã®ã§ããã°ãåéçºè
ãèªç±ã«executeStatementãããåèªãUnloadãããã°ããã®ã§ã¯ãªããï¼ ããããã°ãã®ã·ã¹ãã ã¨éç¨ã¯ä¸è¦ã«ãªãã®ã§ã¯ãªããã¨ããæè¦ããããã¨æãã¾ãã
ãèæ¯ãã«æ¸ãããããªçç±ããåã«ãã¼ã¿åå¾ãUnloadæã«çµãããã¨ããã®ãçç±ã«ããã¾ãããæ¬å½ã¯ãã£ã¨æ ¹æ¬çãªçç±ãããã¾ãã å¼ç¤¾ãã¼ã¿åºç¤ã§ã¯æ¨©é管çããã¼ã¿ã¬ããã³ã¹ãªã©ã®éç¨è¦³ç¹ãããè¨è¨ææ³ã«ãã¨ã¥ãããã¤ãã®ããªã·ã¼ãããã¾ãã(ä¸è¨ã¯ä¸é¨æç²ã§ãä»ã«ããããã£ãããªã·ã¼ãããã¾ã)
- Redshiftå é¨ãã«ãªã¹åããã®ãé¿ãããããRedshiftã¸ã®æ¸ãè¾¼ã¿ã¯DWHãã¼ã ã管çããããããææ®µãéå®ãã
- Redshiftã¸ã®ãã«ã¯&ã¹ããªã¼ãã³ã°ãã¼ããDWHå é¨ã®ETLãããï¼éè¨å¦çãªã©ï¼ãå¤é¨ã¸ã®ãã¼ã¿è»¢éã¯å種å°ç¨ãã¼ã«ã使ã£ã¦ã¯ã¼ã¯ããã¼ãåãã
- ã§ããéãèªååãé²ããæ¨©éãç§»è²ã§ããé¨åã¯ã§ããéãå¼·ãæ¨©éãåãã¼ã ã«ç§»è²ããåèªã§ãã£ã¦ããã
å¼ç¤¾ãQueueryãBricolageã¨ãã£ãDWHç¨ãã¼ã«ãä½ããéç¨ãã¦ããçç±ã¯ããã«ããã¾ããDWHãã¼ã ã«ããä¸å¤®é権ã§ã¯ãªããã§ããéãæ°ä¸»çãªãã¼ã¿æ´»ç¨ãæ¨é²ãã¦ããã«ããã£ã¦ãç¡ç§©åºãæ··æ²ãé¿ããããã®å¿ è¦ãªæ½çãDWHå¨è¾ºãã¼ã«ã®å å®ã§ãããQueueryãã¾ããã®1ã¤ã§ãã
Queueryãæ±ããã¨ã§ç¤¾å ã®éçºè 誰ããæ°è»½ã«Redshiftãæ´»ç¨ã§ããããã«ãã¤ã¤ããDWHãã¼ã ã«ãããã¼ã¿ããã¼ææ¡ãé害æå¯¾å¿ããããããªãã¾ããRedshiftããã®ãã¼ã¿åå¾ææ®µãQueueryã«çµã£ã¦ãã¾ããã¨ã§ä½ãä¸ä¾¿ã§ãã£ããåé¡ãçºçãããããªãã¨ãããã°ããã®é½åº¦ä¸è¨ã®ããªã·ã¼ãèæ ®ãã¤ã¤ãã¼ã ã§è§£æ±ºçãèããå®è£ ãã¦ããã°ããã¨ããæ¹éã§ãã
ã¤ã¾ããå©ç¨è ã®æ¨©éãç·©ãã¦èªç±ã«å©ç¨ãã¦ãããã¤ã¤ããå¿ è¦ãªã¨ããã¯ææ®µãåºå®ããDWHãã¼ã ã«ããéç¨è² è·ãæ¸ããããã«å¿ è¦ã ã£ãã¨ãããã¨ã§ãã
DWHåºç¤ãæ´ããããã®ã¨ã³ã·ã¹ãã ã¨Queuery
2021å¹´ãéãã¦ä¸è¨ã®ãããªæ¹ä¿®ä½æ¥ãç¶ããæ´»çºãªéçºãè¡ããã¦ããQueueryã§ãããOSSã¨ãã¦GitHubã«å ¬éããã¦ããã®ã¯ã¯ã©ã¤ã¢ã³ãå´ã®å®è£ ã®ã¿ã§ãããéçºãç¶ãã¦ãã¦æ§æãã·ã³ãã«åã§ãããã¨ããããä»åOSSã¨ãã¦ãµã¼ãã¼å´ã®å®è£ ãå ¬éãããã¨ã¨ãã¾ãããããã§ãRedshiftã«å¯¾ããbatchã·ã¹ãã ç¨ãã¼ã«ãã¡ããªã¼ bricolages以ä¸ã«Queueryå¨è¾ºãã¼ã«ãå ¨ã¦æãã¾ããã
- QueueryAPIãµã¼ãã¼: https://github.com/bricolages/queuery
- Queueryã¯ã©ã¤ã¢ã³ã: https://github.com/bricolages/queuery_client
- redshift_connectorç¨gem: https://github.com/bricolages/redshift_connector-queuery
(â» redshift_connectorã¯Redshiftãããã¼ã¿ãåå¾ããå¾ãActiveRecordãå©ç¨ãã¦RDBMSã®ãã¼ãã«ãç°¡åã«æ´æ°ã§ããããã«ããgemã§ã)
[2021-12-09 追è¨] Pythonçã¯ã©ã¤ã¢ã³ããå ¬éããã¾ãããPyPiããå©ç¨ã§ãã¾ãã https://github.com/bricolages/queuery_client_python
Techlifeã§ãä½åº¦ããç´¹ä»ãã¦ããï¼2017å¹´çã2019å¹´çã2020å¹´çï¼éããå¼ç¤¾ãã¼ã¿åºç¤ã°ã«ã¼ãã¯Redshiftãä¸å¿ã¨ãã¦DWHã¨ãã®å¨è¾ºã·ã¹ãã ãæ§æãã¦ãã¾ããRedshiftãæ´»ç¨ãããã¼ã¿åºç¤æ§ç¯ããããã«å¿ è¦ãªãã¼ã«ç¾¤ã®ã»ã¨ãã©ã¯å 製ã§ããããã¼ã«ãçµã¿åããã¦éç¨ãã¦ãã¾ãã
ä»åãã¾ã1ã¤Queueryã¨ãããã¼ã¿åºç¤ãæ§ç¯ããã¨ã³ã·ã¹ãã ã®ä¸é¨ãæ°ãã«å ¬éãããã¨ãã§ãã¾ãããã¯ãã¯ãããã§ã¯DWHã ãã«çã¾ãããbdash-serverãDmemoãªã©ã®å¤ãã®ãã¼ã¿é¢é£ãã¼ã«ãOSSã¨ãã¦éçºããå ¬éãã¦ãã¾ãããããã®ãã¼ã«ãããå¤ãã®äººã«ä½¿ãããæ´»çºãªéçºã®ãã¨ç¸äºã«é£æºãæ±ããããã¨ã³ã·ã¹ãã ãå½¢æããæªæ¥ã訪ããã°è¯ãã¨èãã¦ãã¾ãã