深澤 (@qluto) ã§ãã
ç§ã¯ç¾å¨ããã¯ã©ã¯ã®AI-OCRãã¼ã ã§ããã¼ã¸ã£ã¼ã¨ãã¦éçºãé²ãã¦ãã¾ãã
ãããã¯ãã®é²åãã¹ã±ã¼ãªã³ã°ãå³ãããã§ããã¼ã¿åºç¤ã®æ¹åãã©ãã©ãã¨é²ãã§ãããæ¥ã ãã®æ©æµãæãããªããéçºããã¦ãã¾ãã æè¿ã¯BigQueryããSnowflakeã¸ã®ç§»è¡ãé²è¡ä¸ã§ãã
Google BigQueryããSnowflakeã¸ããã¯ã©ã¯ã®ãã¼ã¿åºç¤æè¡ç§»ç®¡äºä¾ - Findy Tools
æ¬è¨äºã§ã¯ã移è¡ä½æ¥ã®ä¸ã§å¦ãã Snowflakeã®åæ§é åãã¼ã¿ã«ã¤ãã¦ç´¹ä»ãã¾ãã
åæ§é åãã¼ã¿ãç¥ããªãã£ãç§
Snowflakeã触ãå§ããå½åã¯ãç¥ããã«json_extract_path_textãªã©ã使ã£ãé ãã¯ã¨ãªãéç£ãã¦ããã®ã§ãããSnowflakeã®åæ§é åãã¼ã¿ãç解ãæ´»ç¨ããã¨æ ¼æ®µã«å¹çãä¸ãããã¨ã«æ°ã¥ãã¾ããã
åæ§é ãã¼ã¿åã¨ãã¦æ ¼ç´æ¸ã¿ã®JSONãã¼ã¿ãå ¥ã£ããã£ã¼ã«ãã«å¯¾ãã¦ã以ä¸ã®ãããªã¯ã¨ãªãæ¸ãã¦ãã¾ããã
with extracted as ( select json_extract_path_text(json_data, 'payload.a_nested_value_list') from my_table ) select -- 以ä¸æ½åºãããã¼ã¿ã«å¯¾ããå å·¥ã»éè¨å¦çãªã©
ããã¯ç¢ºãã«åä½ã¯ãã¾ãããä½ãç¥ãããã¦json_extract_path_textã使ãã®ã¯ã ãã¶å¿ä½ç¡ãã§ãã ãã¼ã¿éã«ãã£ã¦ã¯ã¯ã¨ãªå®è¡æéããããã¾ãããã財å¸ã«ãåªãããªãã§ãã
ããããªããããªãã®ãã¨ãããã¨ããåæ§é åãã¼ã¿ã解説ããªãã解ãæããã¦ããã¾ãã
BigQueryã«ãjson_extract_scalarã¨ããä¼¼ããããªé¢æ°ããããSnowflakeã§ãåããããªèãæ¹ã§è¦ã¤ããjson_extract_path_textã使ã£ã¦ãã¾ã£ã¦ãã¾ããã BigQueryã«ã¯ãã¤ãã£ãJSONåãããããããé§ä½¿ããã°ã¯ã¨ãªéãå°ãªãè¨ç®å¹çãè¯ãJSONãã¼ã¿ãå¦çãããã¨ãã§ãã¾ãããSnowflakeã«ã¯ãã以ä¸ã«æè»ãã¤è¨ç®å¹çãé«ããããã¨ãã§ãã工夫ãããã¾ãã
ããã¯åæ§é åãã¼ã¿åã§ãã
Snowflakeã«ãããåæ§é åãã¼ã¿
ä¸è¬çãªæ§é åãã¼ã¿ã¯ãã¡ããã®ãã¨ãSnowflakeã§ã¯ åæ§é åãã¼ã¿ãæ±ãããã®å°ç¨ã®åï¼VARIANT, OBJECT, ARRAYï¼ãæä¾ããã¦ãã¾ãããããã®åã使ããã¨ã§ãJSONãXMLãAvroãParquetãªã©ã®ãããããå³å¯ãªã¹ãã¼ããæããªãï¼ãããã¯ç·©ãããªï¼ãã¼ã¿ãããã®ã¾ã¾æ±ããã®ã§ãã
åæ§é åãã¼ã¿ã®ãã¼ãã®æ¦è¦ | Snowflake Documentation
Snowflakeã§ã¯ä¸è¨ãã¡ã¤ã«å½¢å¼ã®æ§é ãå ¨ãæ示çã«æå®ããã¨ãããã®ã¾ã¾éæ§é åãã¼ã¿ã¨ãã¦åãè¾¼ããã¨ãå¯è½ã§ãã åãè¾¼ã¾ãããã¼ã¿ã¯ã主ã«VARIANT, OBJECT, ARRAYã®3種é¡ã®ãã¼ã¿åã§è¡¨ç¾ãããåæ§é ãã¼ã¿ã¨ãªãã¾ãã
åæ§é åãã¼ã¿ã®å©ç¹
åæå x ãã¤ã¯ããã¼ãã£ã·ã§ã³ã§é«éå
Snowflakeã¯å é¨ã§ãã¼ã¿ãåæåãã¤ãµã¤ãºæ大16MBã®ãã¤ã¯ããã¼ãã£ã·ã§ã³åä½ã§ç®¡çãã¦ãã¾ãã
ãã¤ã¯ããã¼ãã£ã·ã§ã³ã¨ãã¼ã¿ã¯ã©ã¹ã¿ãªã³ã° | Snowflake Documentation
æ§é åãã¼ã¿ã®éã¯ãã¡ãããªãããåæ§é åãã¼ã¿ã®ãã¼ãæã«ã¯ãJSONãXMLãªã©ã®åæ§é åãã¼ã¿ã解æããããªã¼æ§é çã«æã¤ãã¼ããã¹ãã¡ã¿ãã¼ã¿ã¨ãã¦åæåå½¢å¼ã§ãã¤ã¯ããã¼ãã£ã·ã§ã³ã«æ ¼ç´ãã¾ãã ã¯ã¨ãªå®è¡æã«ã¯ãå¿ è¦ãªãã¹ã ããåç §ããä¸è¦ãªãã¼ãã£ã·ã§ã³ãã¹ããããããã¨ã§é«éåãå¯è½ã«ãªãã®ã§ãã
以ä¸ã®ãããªã¯ã¨ãªã§ãJSONã®ãã¹ãæ§é ãç´æ¥æå®ãã¦åãåºãã¾ãã ãã¼ãããã段éã§Snowflakeãå é¨çã«åã¨ãã¦ã¡ã¿æ å ±ã管çãã¦ãããããæååããé½åº¦JSONã¨ãã¦ãã¼ã¹ããå¿ è¦ãããã¾ããã
select data:object.key1::varchar as col1, data:object.key2::varchar as col2 from my_table;
ããã«ããããã¼ã¹å¦çããã«ã¹ãã£ã³ãåé¿ããªããå¹ççã«ãã¼ã¿ã¸ã¢ã¯ã»ã¹ã§ãã¾ãã
json_extract_path_textã¯ãã¡ãªã®ãï¼
Snowflakeãæä¾ããjson_extract_path_textã¯ãå®ã¯ä¸è¨ã®ãããªåä½ã«ç¸å½ãã¾ãã
TO_VARCHAR( GET_PATH( PARSE_JSON(JSONæåå), 'PATH' ) )
ã¤ã¾ããæ¯åã¯ã¨ãªã®å®è¡æã«æååãJSONã¨ãã¦ãã¼ã¹ãããã®å¾ã«ãã¹ããã©ã£ã¦å¤ãåãåºãã¦ããã®ã§ããããã§ã¯ããã£ããSnowflakeãæ㤠âåæ§é åãã¼ã¿ãåæåã¹ãã¢ã§æ ¼ç´ããå©ç¹â ãã¾ã£ããæ´»ããã¾ããã
åºæ¬çã«ã¯æåãã VARIANTåã«ãã¼ãããè¨è¨ã«ãã¦ãSnowflakeã®åæåã¨ã³ã¸ã³ããã«ã«æ´»ç¨ããã»ããå§åçã«å¹ççã§ãããã
variantåã使ç¨ããã«ããã£ã¦ãåæåã¹ãã¢ã«æ½åºããããã©ããã¯ãå°ãæ¡ä»¶ãããã®ã§æ³¨æãå¿ è¦ã§ãã
å ¨ã¦ã«ããã£ã¦nullå¤ããåå¨ããªãé ç®ããæååã¨æ°å¤ã¨ãæ··å¨ããé ç®ã200以ä¸ã®é ç®ãè¶ ãããã®ãªã©ã¯åæåå½¢å¼ã§æ½åºããã¾ããããä¸è¬çãªå¤ãã®ã±ã¼ã¹ã§ã¯é©ç¨ãããã§ãããã
ã¾ã¨ã
- Snowflakeã® âVARIANTåã«ããåæ§é åãã¼ã¿â 㯠åæåå½¢å¼ã«æé©åãã¦æ ¼ç´ãããã
- json_extract_path_text ã®ãããªé¢æ°ã¯æååããæ¯åJSONããã¼ã¹ãããããããã©ã¼ãã³ã¹ã»ã³ã¹ãé¢ã§éå¹çã
- å ¥åãã¡ã¤ã«ãã§ããã ãæåãã VARIANT åã«ãã¼ããããã¼ãã£ã·ã§ã³ã¹ããããªã©ã®Snowflakeã®å¼·ã¿ãæ´»ããè¨è¨ãæã¾ããã
Snowflakeã§JSONãXMLãªã©ãæ±ãéã¯ããã² VARIANT/OBJECT/ARRAY ãªã©ã®åæ§é åãã¼ã¿åãç©æ¥µçã«æ¤è¨ãã¦ã¿ã¦ãã ããã ãã®ããã§ãã¯ã¨ãªããã©ã¼ãã³ã¹ã®ç¶æ³ããã¤ã¯ããã¼ãã£ã·ã§ã³ã®åå²ç¶æ³ã観測ããå¿ è¦ã«å¿ãã¦ãã¼ãã«ã®ã¯ã©ã¹ã¿ãªã³ã°ããã¼ãã£ã·ã§ãã³ã°ãæé©åãããã¨ã§ãSnowflakeã®å¼·åãªå¦çè½åãæ大éå¼ãåºããã¯ãã§ãã
ãããã«
LayerXã§ã¯ããã¼ã¿ã¨æ©æ¢°å¦ç¿æè¡ãæ大éé§ä½¿ãã¦ã客æ§ã®ä½é¨ããã¯ã©ã¯ã«ããããã®ä»²éãã¾ã ã¾ã å¿ è¦ã§ãã ä¸ç·ã«åãã¦ããã仲éã大åéãã¦ããã¾ãï¼
å°ãã§ãèå³ãæã£ã¦ãã ãã£ãæ¹ï¼ãå¿åã»ã«ã¸ã¥ã¢ã«é¢è«ããå¾ ã¡ãã¦ããã¾ãï¼