æ¤ç´¢ç·¨æé¨ã®å ¼å±±(@PENGUINANA_)ã§ãã
ã¯ãã¯ãããã§ã¯ã¦ã£ã³ãã¦é¢æ°ããã¼ã¿ãããã¯ãã®éçºã«æ´»ç¨ãã¦ãã¾ãã ã¦ã£ã³ãã¦é¢æ°ãå©ç¨ããã¨ãè¡åãã°ãå©ç¨ãããããã¯ãã®éçºãè©ä¾¡ãç°¡åã«å§ãããã¾ãã ä»åã¯ãé¢é£ãã¼ã¯ã¼ãããä¾ã«ã¨ãã¦ã£ã³ãã¦é¢æ°ãã©ã®ããã«æ´»ç¨ã§ããã®ããç´¹ä»ãã¾ãã
é¢é£ãã¼ã¯ã¼ãã¨ã¯
æ¤ç´¢ç»é¢ã«æ¤ç´¢ãè£å©ããããã®é¢é£ãã¼ã¯ã¼ããæ示ããã¦ãã¾ãã
ãæãã¯ããã®æ¤ç´¢ç»é¢ã®é¢é£ãã¼ã¯ã¼ãï¼
ã¬ã·ããæ¤ç´¢ãã¦ããã¨ãèªåã®æ¬²ããã¬ã·ããç°¡åã«è¦ã¤ãããªãæãããã¾ãã
çç±ã¯æ§ã ã§ããä¸ä¾ã¨ãã¦ä»¥ä¸ã®æ§ãªç¶æ³ãããã¨æãã¾ãã
- ãå ¥åããã®ãããã©ãããã
- ãããããããã®ããã¾ãã¤ã¡ã¼ã¸ã§ãã¦ããªãã
- ãããããããé©åãªæ¤ç´¢èªãæãã¤ããªãã
é¢é£ãã¼ã¯ã¼ãã¯ããããç¶æ³ã§ã以ä¸ã®æ å ±ãã¦ã¼ã¶ã¼ã«æä¾ãã¾ãã
ãæãã¯ããã¨æ¤ç´¢ãã人ã¯ã ãæãã¯ã åé£ãããããã±ã¼ããããã¬ã³ããã¼ã¹ãããæ¤ç´¢ãã¦ãã¾ã
ã¦ã¼ã¶ã¼ã¯ã¿ããªã®æé£ã¢ã¤ãã¢ã®å ±é解ãç¥ããã¨ã§ã å ¥åã®æéãçããããæ°ããã¢ã¤ãã¢ãå¾ããã¨ãã§ããã®ã§ã¯ãªãã§ããããã
ã¦ã£ã³ãã¦é¢æ°ã§å®è£ ãã¦ã¿ãã
æ©éå®è£ ãã¦ã¿ã¾ããããã¦ã£ã³ãã¦é¢æ°ã使ãã¨ã·ã³ãã«ã«æ¸ãã¾ãã
å¿ è¦ãªãã®:
- 以ä¸ã®ãã£ã¼ã«ããæã¤ã¢ã¯ã»ã¹ãã°
- time(ã¢ã¯ã»ã¹æå»)
- unique_id(ã¢ã¯ã»ã¹ããã¦ã¼ã¶ã¼ãä¸æã«èå¥ããID)
- keyword(æ¤ç´¢èª)
- controller, action(ã¢ã¯ã»ã¹ãã°ããæ¤ç´¢ãã°ã«çµãè¾¼ãããã«å©ç¨ãrailsã®controller, actionãæ³å®ãã¦ãã¾ã)
- Presto(Redshift, PostgreSQLãHiveãªã©ãå¯ï¼
å®è£
with query_transitions as ( select time, unique_id, keyword as origin_query, lead(keyword) over (partition by unique_id order by time) as next_query -- ãããã¦ã£ã³ãã¦é¢æ°ã§ã! from access_log where time >= 'yyyy-mm-dd' and time < 'yyyy-mm-dd' and controller = 'search' and action = 'show' order by unique_id, time ) select origin_query, next_query, count(distinct unique_id) as cnt from query_transitions where origin_query != '' and next_query != '' and origin_query != next_query group by origin_query, next_query order by cnt desc
å®è£ å®äºã§ãï¼
ãããå®è¡ããã¨ä»¥ä¸ã®æ§ãªçµæãå¾ããã¾ãã
ã¯ãã¯ãããã§ã¯ãã£ã±ãTreasureDataãRedshiftãããã°ãåç §ããããã ãã¼ã¿ã®ã¹ã±ã¼ã«ã«æ©ã¾ããããã¨ãç¨ã§ãã
ã¦ã£ã³ãã¦é¢æ°ã§é·ç§»ãæ´ç
ãã®SQLã¯ä½ããã¦ããã®ã§ããããï¼
ã¦ã£ã³ãã¦é¢æ°ãä»äºããã¦ããã®ã¯æåã®with
ã§å®ç¾©ããã¦ããquery_transitions
ã®å®ç¾©ã®é¨åã§ãã
以ä¸ã®é¨åã«æ³¨ç®ãã¦ãã ããã
select time, unique_id, keyword as origin_query, lead(keyword) over (partition by unique_id order by time) as next_query -- 次ã®è¡ã®keywordãå ¥ã
lead()ã¯ã¦ã£ã³ãã¦é¢æ°ã®ã²ã¨ã¤ã§ãã5.13. Window Functions â Presto 0.130 Documentation
unique_idãã¨ã«ã°ã«ã¼ãåãããä¸ã§ãç¾å¨åç §ãã¦ããè¡ã®ã次ã®è¡ãããä»»æã®ãã£ã¼ã«ãã®å¤ãåã£ã¦ãããã¨ãã§ãã¾ãã
time | unique_id | keyword | lead(keyword) |
---|---|---|---|
time | id1 | æãã¯ã | æãã¯ã ãã¼ã¹ã |
time | id1 | æãã¯ã ãã¼ã¹ã | ãã¬ã³ããã¼ã¹ã |
time | id1 | ãã¬ã³ããã¼ã¹ã | N/A |
time | id2 | ããã«ãã¨ãã° | ããã«ãã¨ãã° ç°¡å |
time | id2 | ããã«ãã¨ãã° ç°¡å | N/A |
åç §ãã¦ããè¡ã«å¯¾ãã¦ãlead()ã®ã«ãã³ã®ä¸ã ãæéãé²ã(ãªã¼ããã)ã¨èããã¨åããããããã¨æãã¾ãã ã¦ã£ã³ãã¦é¢æ°ã§ãæ¤ç´¢èªãã¨ã次ã®æ¤ç´¢èªãã1è¡ã«éãããããªããã¨ã¯ç°¡åãªSQLã§éè¨ããã ãã§ãã 注æç¹ã¨ãã¦ã¯ã次ã®è¡ãè¦ã¤ãããªãæã¯ãã©ã³ã¯ã«ãªãã®ã§ãããã£ãè¡ã¯ä»åã¯ç¡è¦ãã¦ãã¾ãã
ãã詳細ãªè§£èª¬ã¯ä»¥ä¸ãåèã«ãªãã¾ãã
Sessionization in SQL, Hive, Pig and Python - Dataiku
- PostgreSQLã¨Hiveã«ãããã¦ã£ã³ãã¦é¢æ°ãã»ãã·ã§ã³åã«ã¤ãã¦è§£èª¬ãã¦ãã¾ãã
10å¹´æ¦ãããã¼ã¿åæå ¥é - éæ¨å³°é è
- 8,10ç« ã該å½ãã¾ã
ãã¼ã¿ãããã¯ããæ¹åãã
ã·ã³ãã«ã«å§ããã¨ãã観ç¹ããããã¨ä¸ã ã§ãã *1 ãããå®éã«ã¯æ¹åã®ä½å°ãããã¾ããä¾ãã°ã
é¢é£ã®ãªããã¼ã¯ã¼ããææ¡ãã¦ãã¾ãï¼ä¾ï¼ãå¤é£âã«ã¬ã¼ãï¼
ã仮説1ã ãå¤é£ããæ¤ç´¢ããç¿æ¥ããã«ã¬ã¼ãã¨æ¤ç´¢ããããã°ãæ¾ã£ã¦ãã
- 対ç1: ã»ãã·ã§ã³åãã¦30åã§ã¿ã¤ã ã¢ã¦ããã¦ã¿ã¾ããããå¥ã ã®æ¤ç´¢ã»ãã·ã§ã³ã¨ãã¦æ±ããã¨ãã§ãã¾ãã
ã仮説2ã å³å¾å·¦å¾ãã¦ãã人ã®ã»ãã·ã§ã³ãæ¾ã£ã¦ãã
- 対ç2: æç¨ãªã»ãã·ã§ã³ã®ã¿ãå©ç¨ãã¦å®è¡ãã¦ã¿ã¾ãããããã»ãã·ã§ã³ã®é¸å¥ããã¦ã£ã³ãã¦é¢æ°ãå©ç¨ãã¦SQLã ãã§è¡ããå ´åãããã¾ãã
ãããã¯SQLã¨ã¦ã£ã³ãã¦é¢æ°ã ãã§å®ç¾ã§ãã¾ãã SQLã«ãã ãããã¨ã¯ãªãã§ããããã¼ã¿ãä¸ç®æã«éãã SQLã§è©¦è¡é¯èª¤ã§ãããã¨ãå¢ããã¦ããã¨ãããã¿ã¤ãã³ã°ã容æã«ãªãã¾ãã
ãã®è¨äºã§ã¯è©³ããã¯è§¦ãã¾ãããã ã»ãã·ã§ã³åã¯ã¦ã¼ã¶ã¼ã®è¡åãã°ããã²ã¨ã¤ã®ç®çããã£ãã¾ã¨ã¾ããã¨ãã¦åå²ã§ãã¾ãã ããã«ãã»ãã·ã§ã³ã®éå§ã¨çµç«¯ã解éãããã¨ã§ã»ãã·ã§ã³ã®é¸å¥ãå¯è½ã«ãªãã¾ãã é¢é£ãã¼ã¯ã¼ãã§ã¯å¤ç¨ãããææ³ã§ãè«æãå¤æ°ããã¾ãã®ã§ãèå³ã®ããæ¹ã¯æ¢ãã¦èªãã§ã¿ã¦ãã ããã ã¦ã¼ã¶ã¼ã«å ¬éããã¬ãã«ã«ãããã¨æãã¨ã»ãã·ã§ã³åã¯å¹æçã§ãã
ã¾ã¨ã
ä»åã¯ã¦ã£ã³ãã¦é¢æ°ãå©ç¨ãã¦é¢é£ãã¼ã¯ã¼ããå®è£ ãã¦ã¿ã¾ããã ãé¢é£ãã¼ã¯ã¼ãããé¢é£ååããé¢é£ã¦ã¼ã¶ã¼ããªã© ã¦ã¼ã¶ã¼ã®é·ç§»ãæ´çãããã¨ã§å®ç¾ã§ãããã®ã¯å¤ãããã¾ãã
ãã¦ã£ã³ãã¦é¢æ°ã»ãã¼ã¿ãããã¯ãé¢ç½ããªããã¨æã£ãæ¹ãããããªã®å¸¸èï¼ãã£ã¨ããããã»ããããï¼ãã¨æã£ãæ¹ããã¡ãã¾ã§ãããããé¡ããã¾ãã
ã¯ãã¯ãããã«ã¯ããªããå¾ ã£ã¦ãã課é¡ã¨ãã¼ã¿ããã£ã±ãã§ãï¼
*1:- Talk Summary: Building Great Data Products · Coding VC (æ¥æ¬èªè¨³)
ã©ããªãã¨ãããã«ãã¦ããã®è¨äºã«ç¤ºããã¦ããæè¨ãä½é¨ãããã¨ã«ãªãã¨æãã¾ããæ¯æèªã¿è¿ãããã§ãã