ãã¡ãã¯ã¨ã ã¹ãªã¼ Advent Calendar 2024ã®20æ¥ç®ã®è¨äºã§ãã
ãã¸ã¹ããã¼ã ã®ç°å£ã§ãã ãã¸ã¹ã診çï¼ä»¥éãã¸ã¹ãï¼ã¯QRã³ã¼ãã«ãããã§ãã¯ã¤ã³ãèªåå¾æãããªã³ã©ã¤ã³è¨ºçãªã©æ°ããå»çä½é¨ãæä¾ãããµã¼ãã¹ã§ãã
ããããããã¨ã«ãµã¼ãã¹ã¯æé·ãç¶ãã¦ãããããã«ä¼´ã£ã¦KPIã®æ·±æããæ½çã®å¹ææ¤è¨¼ã®ããã®ãã¼ã¿åæãããç©æ¥µçã«è¡ãããããã«ãªã£ã¦ãã¾ããã
ãã¸ã¹ããã¼ã ã§ã¯ãã¼ã¿åæåºç¤ã«BigQueryãå©ç¨ãã¦ãããAmazon Auroraãå種ãã°ãã¼ã¿ãBigQueryã«é£æºããæ§ã ãªåæããã¦ãã¾ãã å¤ç¨®å¤æ§ãªãã¼ã¿ãé£æºãã¦ããé½åä¸ããã¼ã¿ã®ä¸ã«ã¯JSONå½¢å¼ã®ã«ã©ã ã®ãã®ãããããã®å 容ãæ¤ç´¢ãéè¨ãããã±ã¼ã¹ãåºã¦ãã¾ãã åæããä¸ã§JSONå½¢å¼ã®ãã¼ã¿ããã¼ãã«ã«å¤æããããªã£ããããJSONã®ã«ã©ã ããã¼ãã«ã«å¤æããæ¹æ³ãèãã¦ã¿ã¾ããã
- ã¯ããã«
- ãµã³ãã«ãã¼ã¿
- JSONé¢æ°ãç¨ããåæ
- JSONããã¼ãã«ã«å¤æ
- ãããã«
- We are hiring!!
ã¯ããã«
ãã¸ã¹ãã§BigQueryã«é£æºãã¦ãããã¼ã¿ã®ä¸ã«ã¯ãæ§é çãªãã¼ã¿ãé åã表ç¾ããããã®ãJSONå½¢å¼ã§ä¿åãã¦ãããã®ãããã¾ãã ããã¯ãã¼ã¿ä¿åã®è¦³ç¹ã§ã¯æè»ã§ãããããæ ã«SQLã§åæããéã¯å·¥å¤«ãå¿ è¦ã§ãã
ãµã³ãã«ãã¼ã¿
æ¬è¨äºã§ã¯ããã¡ãã® 1.7m users (504.3 MB)
ãã¼ã¿ããµã³ãã«ã¨ãã¦å©ç¨ãã¾ãã
Exploring the power of JSON: a real-life JSON file example collection
[ { "id": 0, "name": "Elijah", "city": "Austin", "age": 78, "friends": [ { "name": "Michelle", "hobbies": [ "Watching Sports", "Reading", "Skiing & Snowboarding" ] }, { "name": "Robert", "hobbies": [ "Traveling", "Video Games" ] } ] }, { "id": 1, "name": "Noah", "city": "Boston", "age": 97, "friends": [ { "name": "Oliver", "hobbies": [ "Watching Sports", "Skiing & Snowboarding", "Collecting" ] }, ... ] }, ... ]
ãµã³ãã«ã®ãã¼ã¿ã¯ä¸è¨ã®ãããªæ§é ã«ãªã£ã¦ãã¾ãã
BigQueryã«ã¯ãä¸è¨ãã¼ã¿ã1ã¤ã㤠user_json
ã«ã©ã ã«æ ¼ç´ãã users
ãã¼ãã«ãä½æãã¾ãã
ã«ã©ã å | ãã¼ã¿å | å¤ã®ä¾ |
---|---|---|
user_json | JSON | {"id":0,"name":"Elijah","city":"Austin","age":78,...} |
JSONé¢æ°ãç¨ããåæ
BigQueryã§ã¯JSONåããµãã¼ããã¦ãããJSONãæä½ããé¢æ°ãæ§ã ãªãã®ãæä¾ããã¦ãã¾ãã
ä¾ãã°ãå¹´é½¢ã50æ³ä»¥ä¸ã®ã¦ã¼ã¶ã¼ãåå¾ããã«ã¯æ¬¡ã®ãããªã¯ã¨ãªã«ãªãã¾ãã
SELECT JSON_VALUE(user_json, '$.id') AS id, JSON_VALUE(user_json, '$.name') AS name, JSON_VALUE(user_json, '$.age') AS age, FROM test.users -- `test` ãã¼ã¿ã»ããã«ãã¼ãã«ãä½æ WHERE CAST(JSON_VALUE(user_json, '$.age') AS int64 ) >= 50 ORDER BY CAST(id AS int64)
ãã®ã¯ã¨ãªã®çµæï¼ä¸é¨ï¼ã¯æ¬¡ã®ããã«ãªãã¾ãã
JSON_VALUEé¢æ°ãç¨ãããã¨ã§ãJSONã®å¤ããç¹å®ã®ãã£ã¼ã«ãã®å¤ãæ½åºã§ãã¾ãã ãã£ã¼ã«ãã¯JSONPathå½¢å¼ã§æå®ãã¾ãã
ãã®ããã«JSONé¢æ°ãç¨ãããã¨ã§ãJSONã®ãã£ã¼ã«ãã§åæã§ãã¾ãã
JSONããã¼ãã«ã«å¤æ
JSONé¢æ°ã«ãã£ã¦ããç¨åº¦JSONãã£ã¼ã«ãã®å¤ãå©ç¨ããåæãå¯è½ã«ãªãã¾ãããå¥ã®ãã¼ãã«ã¨joinããå ´åãªã©ãè¾ãã±ã¼ã¹ãåºã¦ãã¾ãã ããã§ãJSONãã£ã¼ã«ãããã©ãããªãã¼ãã«ã§è¡¨ç¾ã§ããªããèãã¾ãã
SELECT JSON_VALUE(user_json, '$.id') AS id, JSON_VALUE(user_json, '$.name') AS name, JSON_VALUE(user_json, '$.age') AS age, JSON_VALUE(user_json, '$. friends.name') AS friend_names, ... FROM test.users
ä¸è¨ã®ããã«ãã£ã¼ã«ãã1ã¤ãã¤æå®ãããã¨ã«ãã£ã¦ãã©ãããªå½¢å¼ã«å¤æãããã¨ã¯ã§ãã¾ããããã£ã¼ã«ãæ°ãå¤ãå ´åãæ§é ãè¤éãªå ´åã¯æå®ãæéã«ãªãã¾ããã¾ãJSONã®æ§é ãå¤ãã£ãæã«å¤æã®ã¯ã¨ãªãä¿®æ£ããå¿ è¦ãããã¾ãã
ããã§ãJSONã®å®ãã¼ã¿ããã¹ãã¼ããèªåçã«æ½åºãããã©ãããªãã¼ãã«ã«ããSQLãèãã¾ããã
-- 1. UDF: JSONãã¼ã¨å¤ãåæã«åå¾ CREATE TEMP FUNCTION extract_keys_and_values(input STRING) RETURNS ARRAY<STRUCT<key STRING, value STRING>> LANGUAGE js AS """ function flattenObj(obj, parent = '', res = []) { for (let key in obj) { const propName = parent ? parent + '.' + key : key; if (typeof obj[key] === 'object' && obj[key] !== null) { flattenObj(obj[key], propName, res); } else { res.push({ key: propName, value: String(obj[key]) }); } } return res; } return flattenObj(JSON.parse(input)); """; -- 2. ä¸æãã¼ãã«ã®ä½æ: JSONããã¼ã¨å¤ã®ãã¢ã«å±é CREATE TEMP TABLE temp_table AS SELECT row_num, kv.key, kv.value FROM ( SELECT *, ROW_NUMBER() OVER () AS row_num, FROM test.users ), UNNEST(extract_keys_and_values(TO_JSON_STRING(user_json))) AS kv; -- 3. ããããã¯ã¨ãªã®åççæ EXECUTE IMMEDIATE ( SELECT FORMAT(""" SELECT * EXCEPT(row_num) FROM temp_table PIVOT ( ANY_VALUE(value) FOR REPLACE(key, '.', '__') IN (%s) ) ORDER BY CAST(id AS INT64) """, STRING_AGG(FORMAT('"%s"', REPLACE(key, '.', '__')), ',')) FROM ( SELECT DISTINCT key FROM temp_table ) );
ãã®ãããªSQLãå®è¡ãããã¨ã«ãã£ã¦ãuser_json
ã次ã®ãããªãã¼ãã«ã«å¤æã§ãã¾ãã
ãã®SQLã«ã¤ãã¦è©³ããè¦ã¦ããã¾ãã
1. UDF: JSONãã¼ã¨å¤ãåæã«åå¾
ã¯ããã«ãJSONã®ãã¼ã¨å¤ã®ãã¢ãåºåããé¢æ°ãä½æãã¾ãã
BigQueryã§ã¯ã¦ã¼ã¶ã¼å®ç¾©ã®é¢æ°ãä½æãããã¨ãã§ããSQLã¾ãã¯JavaScriptã³ã¼ãã§å¦çãå®ç¾©ã§ãã¾ãã
ããã§ã¯ãJSONæååãå
¥åã¨ãããã¼ã¨å¤ã®ãã¢ãåºåãã¾ãããã¹ãããããã¼ã¯ parent.child
å½¢å¼ã«å¤æãã¾ããã¾ããå¤ã¯ãã¹ã¦STRINGåã§è¿å´ããã¾ãã
[ { key: "id", value: "0" }, { key: "name", value: "Elijah" }, { key: "age", value: "78" }, { key: "friends.0.name": "Michelle" }, { key: "friends.0.hobbies.0": "Watching Sports" }, ... ]
2. ä¸æãã¼ãã«ã®ä½æ: JSONããã¼ã¨å¤ã®ãã¢ã«å±é
次ã«ããã®é¢æ°ãå©ç¨ãã¦ãã¼ã¨å¤ã®ãã¢ã«å±éããä¸æãã¼ãã«ã«ä¿åãã¾ãã ä¸æãã¼ãã«ã®æ§é ã¯æ¬¡ã®ããã«ãªãã¾ãã
row_num | key | value |
---|---|---|
1 | id | 0 |
1 | name | Elijah |
1 | age | 78 |
1 | friends.0.name | Michelle |
1 | friends.0.hobbies.0 | Watching Sports |
... | ||
2 | id | 1 |
2 | name | Noah |
2 | age | 97 |
2 | friends.0.name | Oliver |
2 | friends.0.hobbies.0 | Watching Sports |
å¾ã§åã¦ã¼ã¶ã¼æ¯ã«ã°ã«ã¼ãã³ã°ã§ããããä¸æã«ãªãå¤ãæã£ã¦ããå¿
è¦ãããã¾ããä»å㯠ROW_NUMBER
é¢æ°ã§è¡çªå·ãä»ä¸ããããã«ãã¾ããã
3. ããããã¯ã¨ãªã®åççæ
æå¾ã«ã2.ã§ä½æããä¸æãã¼ãã«ã«ã¤ãã¦ããã¼æ¯ã«åãä½æãããããããã¼ãã«ãåçã«çæãã¾ãã
SELECT DISTINCT key FROM temp_table
ã§JSONã®ãã¼ã®ä¸è¦§ãåå¾ãããããããããã®åã¨ãã¦æå®ãã¾ãã
ã¾ããEXECUTE IMMEDIATEãå©ç¨ããINå¥ã®æ¡ä»¶ãåçã«çµã¿ç«ã¦ã¦ãã¾ãã
REPLACE(key, '.', '__')
ã¯ããã¼ãã«ã®ååã« .
ã使ããªãããã¢ã³ãã¼ã¹ã³ã¢ __
ã«æ¸ãæãã¦ãã¾ãã
ãã®ãããªã¯ã¨ãªãå®è¡ãããã¨ã§ãæçµçã«JSONãã¼ã¿ããã©ããåãããã¼ãã«ãåºåãããã¨ãã§ãã¾ãã
id | name | age | friends__0__name | friends__0__hobbies__0 | ... |
---|---|---|---|---|---|
0 | Elijah | 78 | Michelle | Watching Sports | ... |
1 | Noah | 97 | Oliver | Watching Sports | ... |
ãããã«
BigQueryã®JSONã«ã©ã ã«å¯¾ãã¦ãJSONé¢æ°ã§åæããæ¹æ³ãããã©ãããªãã¼ãã«ã¨ãã¦åºåããæ¹æ³ãç´¹ä»ãã¾ããã
ããã¾ã§æ¸ãã¦ããã¦ã«ãªãã¾ãããBigQueryã§ã¯å ¬å¼ã§ãã¡ã¤ã«ããJSONãã¼ã¿ãèªã¿è¾¼ãæ¹æ³ãæä¾ããã¦ããããããã¼ã¿ãBigQueryã«é£æºããåã«åå¾ã§ããã®ã§ããã°ããã®ãããªæ©è½ãå©ç¨ããæ¹ãåã§ãã ç¬èªã®ã¯ã¨ãªãã¡ã³ããã³ã¹ããå¿ è¦ããªããã¾ãä»åã®ãããªSQLã¯JSONã«ã©ã ãã¹ãã£ã³ããã®ã«ã³ã¹ãããããããã奥ã®æã¨ãã¦ä½¿ãã®ãè¯ãã§ãããã
We are hiring!!
ã¨ã ã¹ãªã¼ã§ã¯çµ¶è³ã¨ã³ã¸ãã¢ãåéä¸ã§ãï¼ ãã¸ã¹ã診ç以å¤ã«ãæ§ã ãªãããã¯ããããã¾ãã®ã§ããèå³ããæ¹ã¯æ¯éã«ã¸ã¥ã¢ã«é¢è«çãå¿åãã ããï¼