2021-05-01ãã1ã¶æéã®è¨äºä¸è¦§
â ã¯ããã« https://dk521123.hatenablog.com/entry/2019/12/01/003455 ã§è¡ã£ãã¯ãã¼ã©ã§ã®åä½å¾ã« AWS Glue 㧠[Databases]-[Tables] ã«é·ç§»ã㦠ã¯ãã¼ãªã³ã°çµæã確èªããéã«è¡¨ç¤ºé ç®ãªã©ã«ä¸æç¹ããã£ãã ããã§ãä»åã¯ããã®ãã¼ã¸ããã³ãã®æ´¾â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/05/20/195621 ã§ãPySpark ã® UDF (User Defined Function) å®ç¾©æ¹æ³ ã«ã¤ãã¦ãæ±ã£ãã Udacityï¼ã¦ãã·ãã£ï¼ã® Freeã³ã¼ã¹ãSparkã㧠å¥ã®æ¹æ³ãåãæ±ã£ã¦ããã https://www.udacity.com/courâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/02/11/233633 ã§è¨è¼ããè¦åæ ~~~~~~~ unix_timestamp(void) is deprecated. Use current_timestamp instead. ~~~~~~~ ã«å¯¾å¿ããã¯ã¨ãªãä¿®æ£ããã®ã ãã ä¿®æ£åã¨ä¿®æ£å¾ã®ç¢ºèªã§ãSQLæãMINUSâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/01/04/150942 ã®ç¶ãã ä»åã¯ããã¼ãã«ãã¼ã¿ã®éè¨ã«é¢ãã¦æ±ãã ç®æ¬¡ ãï¼ãagg (éè¨) ãï¼ãmin/max (æå°/æ大) ãï¼ãcount (ã«ã¦ã³ã) ãï¼ãcountDistinct (éè¤ã«ã¦ã³ã) ä»ã«ããsum (â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/01/04/150942 https://dk521123.hatenablog.com/entry/2020/05/18/154829 https://dk521123.hatenablog.com/entry/2020/08/28/183706 ã®ç¶ãã ä»åã¯ããã¼ãã«ãã¼ã¿ãéè¨ããã«ãããã ãã¼ãã«â¦
â ã¯ããã« PySpark ã§ãã¡ã¤ã«ãåºåããéã«ã ãµã¤ãºã0Byteã®ç©ºãã¡ã¤ã«ãåºåãããã®ã§ 対å¿ã«ã¤ãã¦èª¿ã¹ã¦ã¿ãã ã¤ãã§ã«ããã¡ã¤ã«ã1ã¤ã«ã¾ã¨ãããã¨ãè¼ãã¦ããã ç®æ¬¡ ãï¼ã対å¿æ¹æ³ ãï¼ãåºåãã¡ã¤ã«ã空ãã¡ã¤ã«ã«ãªã ï¼ï¼åºåãããåâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/05/20/195621 ã®ç¶ãã PySpark ã® UDF (User Defined Function) 㧠ãã¹ã£ãç¹ã注æç¹ãªã©ãããã¦ããã ç®æ¬¡ ãï¼ãã¡ã¢ãªæ¶è²»ã«ã¤ã㦠ãï¼ããã³ã¬ã¼ã¿ã«ããå®è£ æ¹æ³ã«é¢ãã注æç¹ ãï¼ãå¼â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/04/06/001709 ããååããã³è¿½è¨ã RDD <=> DataFrame ã®ç¸äºå¤æã«ã¤ãã¦æ±ãã ç®æ¬¡ ãï¼ãRDD => DataFrame ï¼ï¼createDataFrame() ï¼ï¼spark.read.csv() è£è¶³ï¼TSVãªã©åºåãæåãå¤æ´ãã¦å¤æ´â¦
â ã¯ããã« AWS Glue ä¸ã§ RDD.saveAsTextFile() ã使ã£ãã ã¨ã©ã¼ãDirectOutputCommitter not foundããçºçããã®ã§ ãã®éã®ãã©ãã«ã·ã¥ã¼ããã¡ã¢ããã ç®æ¬¡ ãï¼ãã¨ã©ã¼å 容 ãï¼ãçºçããã³ã¼ãï¼ä¸é¨ï¼ ãï¼ãè§£æ±ºæ¡ æ¡ï¼ï¼DirectFileOutputComâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/05/15/130604 ã調æ»ãã¦ããéã«ã以ä¸ãåèãµã¤ãã㧠ãGlueã®Sparkãã¼ã¸ã§ã³ã2.3.0ã«ãªãã°ãã£ã¦è¨è¼ããã¦ããã ã§ã以ä¸ã®AWS Glue ã®å ¬å¼ãµã¤ã https://docs.aws.amazon.com/ja_jp/glueâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2021/05/14/095125 ã®ç¶ãã Glue Job ãããã¼ãã£ã·ã§ã³æ´æ°ãè¡ããã¨ãèããã ç®æ¬¡ â Job ãããã¼ãã£ã·ã§ã³æ´æ°å®è£ æ¡ â æ¡ï¼ï¼GlueContext ã¯ã©ã¹ãé§ä½¿ãã¦å®è£ ãã æ¹æ³ 1ï¼write_dynamic_framâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2019/12/01/003455 ã®ç¶ãã ä»åã¯ãGlue ã§ä½æãããã¡ã¤ã«ãå¤é¨ãã¼ãã«ã«ããéã« å¾ãç¥èã»ãã¦ãã¦ãã¡ã¢ãã¦ããã 軽ãæ¸ãã¤ããããçµæ§ãªããªã¥ã¼ã ã«ãªã£ã¦ãã¾ã£ãããã ç®æ¬¡ ãï¼ãGlueâ¦
â ã¯ããã« PySpark ã«é¢ãã¦ã ãã¼ãã£ã·ã§ã³ (Partition) ä»ãã§ãã¡ã¤ã«åºåããã£ãã®ã§ ãã®ãã¨ãå«ãã¦ããã¼ãã£ã·ã§ã³ã«ã¾ã¤ããTipsãã¾ã¨ãã¦ãã cf. Partition = ä»åãå£ãåå²ãåé ç®æ¬¡ ãï¼ããã¼ãã£ã·ã§ã³ã®åºæ¬æä½ ï¼ï¼ç¾å¨ã®ãã¼ãâ¦
â ã¯ããã« PySpark 㧠ãã¡ã¤ã«ãåºåããå ´å åºåå ãã¹ã¯æå®ã§ãããããã¡ã¤ã«åã¯åæã«æ±ºããããã ãã®ãã¡ã¤ã«åãå¤æ´ããããæ¹ã調ã¹ã¦ã¿ãã ç®æ¬¡ ãï¼ãPySpark ã§ã® ãªãã¼ã æ¹æ³ ãï¼ããµã³ãã« ãï¼ãè£è¶³ï¼ï¼æ¡å¼µå CRCãã¡ã¤ã« ã«ã¤ãâ¦
â ã¯ããã« Glue ãã DataCatalogãã¼ãã« ã«å¯¾ã㦠Spark SQLãå®è¡ããéã«ãããã¤ã注æç¹ãããã®ã§ ã¡ã¢ãã¦ãã ç®æ¬¡ ãï¼ã使ç¨ä¸ã®æ³¨æ ï¼ï¼Glue Job 㧠Glue DataCatalog ãæå¹ã«ãã ï¼ï¼ãselect * from [DB].[Table] ...ãã§ã¯ãªããuse [DB]â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2019/10/25/232155 https://dk521123.hatenablog.com/entry/2020/10/12/152659 https://dk521123.hatenablog.com/entry/2021/02/16/145848 ã®ç¶ãã ä»åããAWS Glue ã® job ã§çºçãããã©ãã«ã«ã¤ãã¦â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/04/12/145237 https://dk521123.hatenablog.com/entry/2020/04/03/000000 https://dk521123.hatenablog.com/entry/2020/04/05/000000 ã®ç¶ãã Goè¨èªã«ãããæ°ã«ãªãææ³äºé ã§ã以ä¸ã®éãã ~~~~â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/04/12/145237 https://dk521123.hatenablog.com/entry/2020/04/03/000000 https://dk521123.hatenablog.com/entry/2020/04/05/000000 https://dk521123.hatenablog.com/entry/2021/05/01/000000 ã®ç¶â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/04/12/145237 https://dk521123.hatenablog.com/entry/2020/04/03/000000 https://dk521123.hatenablog.com/entry/2020/04/05/000000 ã®ç¶ãã ä»åã¯ããé¢æ°ããæ±ãã ãªããå®è¡ç°å¢ã¯ã以ä¸ã®ãµâ¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/04/12/145237 https://dk521123.hatenablog.com/entry/2020/04/03/000000 https://dk521123.hatenablog.com/entry/2020/04/05/000000 ã®ç¶ãã ä»ã«æ°ã«ãªãææ³äºé ã§ã以ä¸ã®éãã ~~~~~~~~~~~~~~â¦
â ã¯ããã« https://dk521123.hatenablog.com/entry/2020/04/12/145237 https://dk521123.hatenablog.com/entry/2020/04/03/000000 https://dk521123.hatenablog.com/entry/2020/04/05/000000 ã®ç¶ãã ä»ã«æ°ã«ãªãææ³äºé ã§ã以ä¸ã®éãã ~~~~~~~~~~~~~~â¦