2021-09-01ãã1ã¶æéã®è¨äºä¸è¦§
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/29/131101 ã§ãAWS ä¸ã§ã Apache Airflowï¼NWAAï¼Amazon Managed Workflow for Apach Airflowï¼ã åãããéã«ãã¡ãã»ãæãã¦ããé¨åãå¤ãã£ãã®ã§ åºæ¬çãªTipéãã¡ã¢ãã¦ããã ç®æ¬¡ ãâ¦
â ã¯ããã« MWAAï¼Amazon Managed Workflow for Apache Airflowï¼ ã«é¢ãã¦ã触ããã®ã§ã¡ã¢ ç®æ¬¡ ãï¼ãMWAAï¼Amazon Managed Workflow for Apache Airflowï¼ ï¼ï¼ã¡ãªãã ï¼ï¼ãã¡ãªãã ï¼ï¼ãµãã¼ããã¼ã¸ã§ã³ ãï¼ãæè¡ã¡ã¢ ï¼ï¼DAG ã®ä½æ ï¼ï¼DAG â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/07/18/004531 https://dk521123.hatenablog.com/entry/2021/07/24/233012 https://dk521123.hatenablog.com/entry/2021/07/28/234319 ã§ãAirflowãäºç¿ãã¦ããã®ã ãããã£ããå¿ãã¦ãã¾ã£ãã ã¨â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/08/13/120956 ã«ããããã«ãCSVã®å¤ãé£æ³é åã«è©°ãã¦ã ããããä½æãããã¨ãããã©ã失æããã®ã§ã ãã®è§£æ±ºçã«ã¤ãã¦ãè¨ãã¦ããã ç®æ¬¡ ãï¼ããã©ãã« ãï¼ãåå ãï¼ãè§£æ±ºæ¡ è§£æ±ºæ¡ï¼â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/18/232556 ã§ãGlue3.0 ãæ±ã£ããããã®ä¸ã§ã Apache Arrow (v2.0)ã使ã£ã¦ãããããªã®ã§ã ã©ããªãã®ã調ã¹ã¦ã¿ãã ç®æ¬¡ ãï¼ãApache Arrow ï¼ï¼å ¬å¼ãµã¤ã ï¼ï¼ç¹å¾´ ï¼ï¼ãµãã¼ãè¨èª ãâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/08/19/130118 ã§ãAWS Glue2.0 ãæ±ã£ããã 2021/08/19ã« Glue3.0ããªãªã¼ã¹ããããããã®ã§ã 調ã¹ã¦ã¿ãã ã¾ãã¯ããã£ããç¥ããããªãã以ä¸ã®å ¬å¼ãµã¤ãåç §ã®ãã¨ã https://aws.amazon.comâ¦
â ã¯ããã« Hive 㨠Redshift 㧠ã«ã¦ã³ããªã©ã®éè¨æ å ±ããã®ã« å ¨ãé¢ä¿ãªããã¼ãã«ãçµåããã®ã§ã ãã®ãã¨ã«ã¤ãã¦ãã¡ã¢ã£ã¦ããã ï¼ãã£ã¨ããæ¹æ³ãããã°ãéæãæ´æ°ãã¦ããï¼ ç®æ¬¡ ãï¼ãçµåæ¹æ³ æ¹æ³ï¼ï¼JOINã使ã£ãçµå ãï¼ããµã³ãã« â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/23/223401 ã§è§¦ãããset -- xxxxãã«ã¤ã㦠ããåãããªãã£ãã®ã§èª¿ã¹ã¦ã¿ãã ãï¼ãset ã³ãã³ã ãï¼ãã·ã§ã«ã®è¨å®ã確èªãå¤æ´ãã ï¼ï¼ã¨ã©ã¼ããã£ãå ´åãããã§æã¡æ¢ãã ãï¼ãç¾å¨â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/11/000000 https://dk521123.hatenablog.com/entry/2021/09/12/000000 https://dk521123.hatenablog.com/entry/2021/09/23/223401 ã§ããã¤ãã©ã¤ã³ã«é¢ãã¦ãå°ã触ãããã ä»åã¯ãæãä¸ãã¦â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/23/223401 ã§è§¦ãããshopt -s lastpipeãã«ã¤ã㦠ããåãããªãã£ãã®ã§èª¿ã¹ã¦ã¿ãã ç®æ¬¡ ãï¼ãshoptã³ãã³ã ãï¼ãæ§æ ãï¼ã主ãªãªãã·ã§ã³ ãï¼ãshoptã³ãã³ã * bashã®ã·ã§ã«ãªãã·ã§â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/08/11/000000 ã®ç¶ãã ä»åã¯ãé£æ³é åãæ±ãã ç®æ¬¡ ãï¼ãé£æ³é å - ãã£ã¯ã·ã§ã㪠ãï¼ãæ§æ ï¼ï¼å®ç¾©æ¹æ³ ï¼ï¼ãã¼ãåå¨ãã¦ãããç¢ºèª ãï¼ããµã³ãã« ä¾ï¼ï¼Hello world ä¾ï¼ï¼ãã¼ã®åâ¦
â ã¯ããã« SQL ã® Windowé¢æ° (åæé¢æ°) ã«ã¤ãã¦æ±ãã PostgeSQLã ãã§ãªããRedshiftã§ã使ããã ä»åã¯ããã®ä¸ã§ã LAG / LEAD ã«çµãã ãªããROW_NUMBER / RANK ã«ã¤ãã¦ã¯ã 以åãã£ã以ä¸ã®é¢é£è¨äºãåç §ã®ãã¨ã ROW_NUMBER / RANK + PARTITIâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/08/202004 ã®ç¶ãã Redshiftã¯ãPostgreSQLäºæã¨è¨ããã¦ããã ã¡ããã¡ããç°ãªã£ã¦ãããã¨è¦ã¤ããä»æ¥ãã®é ã ä»åã¯ãæååçµåã«ã¤ãã¦æ±ãã PostgreSQLã¨ã®å·®ç°ã«ã¤ãã¦ãæ±ãã ç®â¦
â ã¯ããã« å°ãã¿ã Amazon Redshiftã使ã£ã¦ãã¦ã ãã£ã¹ãã®ä»æ¹ãè²ã ãããããªã®ã§ãçºãã¦ã¿ãã ç¹ã«ããï¼ï¼ã::ã ã使ç¨ãããã£ã¹ãããç¥ããªãã£ãã®ã§ã èªæã®æå³ãè¾¼ãã¦ãè¨ãã¦ããã ç®æ¬¡ ãï¼ãå ¬å¼ãµã¤ã ãï¼ãAmazon Redshiftã§ã®â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/08/21/000000 ã®ç¶ãã ä»åã¯ãsplit_part ã«ã¤ãã¦ãæ±ãã ç®æ¬¡ ãï¼ãsplit_part ãï¼ãæ§æ ãï¼ããµã³ãã« ä¾ï¼ï¼Hello world ä¾ï¼ï¼S3ãã¹ã®ãã¼ã¹ ä¾ï¼ï¼Emailã®ãã¼ã¹ ãï¼ãsplit_part * â¦
â ã¯ããã« ãã¡ã¤ã«ãçµåããå¿ è¦ãã§ã¦ããã®ã§ã catã³ãã³ã ã§ç°¡åã«ã§ããã®ã§ãã¡ã¢ãã¦ããã ç®æ¬¡ ãï¼ãcatã³ãã³ã ãï¼ã使ãæ¹ ï¼ï¼é²è¦§ ï¼ï¼ãã¡ã¤ã«çµå ï¼ï¼ç©ºãã¡ã¤ã«ãä½æãã ãï¼ããµã³ãã« ä¾ï¼ï¼ãã¡ã¤ã«çµå ä¾ï¼ï¼æå®ãã©ã«ãã®â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/04/172021 ã«ãã㦠ãï¼ï¼ã¤ã³ã¿ã¼ãªã¼ãã½ã¼ããã¼ï¼Interleaved SortKeyï¼ã ã®ããã¡ãªããããã ~~~~~~~~~~~ * ã¤ã³ã¿ã¼ãªã¼ãã½ã¼ããã¼ã®æ§è½ãç¶æããããã«ã¯ å®æçã« VACUUM REINDEXâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/08/29/000000 ã§ãPostgreSQL 㨠Amazon Redshift ã¨ã®éãã«è§¦ããã ããï¼ããµã³ãã«ãã¼ãã«ããPostgreSQLã§å®è¡ããã ããã¤ãã¨ã©ã¼ã«ãªã£ãã ãã®ä¸ã§ããdistkey(listid)ããcompound sorâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/09/01/200818 ã«ããã¦ãHiveãã¼ãã«ãã¼ã¿ ã Redshift ã«ç§»è¡ããéã« COPYã³ãã³ãã«ããã¦ããã¤ããã©ãã«ãçºçããã®ã§ ã¾ã¨ãã¦ããã ç®æ¬¡ ãï¼ãCOPYã³ãã³ãæã«ã¨ã©ã¼ãForbidden: HTTPâ¦
â ã¯ããã« ã·ã§ã«ã§ãsplit ããå¿ è¦ãã§ã¦ããã®ã§ãã¡ã¢ãã¦ããã ç®æ¬¡ ãï¼ãã·ã§ã«ã§ split ããã«ã¯ ãï¼ããµã³ãã« ä¾ï¼ï¼ã«ã³ãåºåã ä¾ï¼ï¼ãããåºåãï¼IPã¢ãã¬ã¹ï¼ ä¾ï¼ï¼export XXX1=YYY1;export XXX2=YYY2;... ãï¼ãè£è¶³ï¼é åã®é·ããæ±â¦
â ã¯ããã« Hiveãã¼ãã«ãã¼ã¿ ã Redshift ã«ç§»è¡ãããã¨ããã£ãã®ã§ ãã£ããæ¹éãªã©ãã¡ã¢ã ç®æ¬¡ ãï¼ãHiveãã¼ãã«ãã¼ã¿ ã Redshift ã«ç§»è¡ããã«ã¯ ï¼ï¼ç§»è¡æé æ¡ ï¼ï¼ãã©ãã«ã«ã¤ã㦠ãï¼ããµã³ãã« ï¼ï¼Hiveã®External Tableã§S3ä¸ã«ä¿åâ¦