ãµã¼ãã¼ãµã¤ãã¨ã³ã¸ãã¢ã®è¹æ²¢ã§ãã
ãã¬ã¿ã¯æ¤ç´¢ç¨ã®ãã¼ã¿ã¹ãã¢ã¨ãã¦BigQueryã使ç¨ãã¦ãã¾ãã å¥æãªä½¿ãæ¹ã¯ãã¦ãã¾ãããããã¬ã¿ã«ãããBigQueryæ´»ç¨æ³ãç´¹ä»ãã¾ãã
ã·ã¹ãã æ§æ
BigQueryå¨ãã®ã·ã¹ãã æ§æãï¼æã®å³ã«ã¾ã¨ããã¨ãããªæãã§ããçãããã®ã¯ä½¿ã£ã¦ãã¾ããããã®åå®å®ããæ§æãã¨æãã¾ãã
BigQueryã«importãã¦ãããã¼ã¿
大ããåãã¦ä»¥ä¸2種é¡ã®ãã¼ã¿ãBigQueryã«importãã¦ãã¾ãã
1.APIãåç §ãã¦ããRDBã®ãã¼ã¿
APIãåç §ãã¦ããRDB(Amazon Aurora)ã®slaveãããã¼ã¿ãimportãã¦ãã¼ã¿åæã調æ»ç¨ã®ãã¼ã¿æ¤ç´¢æ¥åã«ä½¿ã£ã¦ãã¾ãã
2.å種ãã°
以ä¸ã®ãã°ãfluentdã§BigQueryã«ä¿åãã¦ãã¾ãã
- nginxã®accessãã°
- railsã§1ãªã¯ã¨ã¹ãåä½ã§åºåãã¦ããã«ã¹ã¿ã ãã°
- sidekiqã®ãã°(Sidekiq.loggerãåºåãããã°)
- shoryukenã®ãã°(Shoryuken.loggerãåºåãããã°)
- sendgridã®ã¡ã¼ã«éä¿¡ãã°
sendglidã®ãã°ã«ã¤ãã¦ã¯éå»ã«ä½éãã«ãããã解説è¨äºãæ¸ãã¦ãã¾ãã®ã§ãã¡ãã®è¨äºãåç §ãã ããã
ãã¼ã¿å¯è¦åãã¼ã«ã¨ãã¦ã®æ´»ç¨
社å åãã®ãã¼ã¿ã®å¯è¦åã«ã¯Redashã使ããç¨éå¥ã®Dashboardãä½æãã¦ã»ã¼ã«ã¹ãã«ã¹ã¿ãã¼ãµãã¼ãããã¼ã±ãã£ã³ã°ãã¼ã ãæ®æ®µãã追ã£ã¦ããæ°å¤ãã¨ã³ã¸ãã¢ã«é ¼ããæ¤ç´¢ã§ããããã«ãªã£ã¦ãã¾ãã
ã¾ããBigQueryã«å ¥ãããã°ãã°ã©ãã§æç»ãããªã©ã®ã¨ã³ã¸ãã¢ãªã³ã°æ¹é¢ã§ã使ç¨ãã¦ãã¾ãã
Redashã®å°å ¥ã«ã¤ãã¦ã¯accountã®æ¨©é管çãªã©è²ã ã¨ãããã©ããããã£ããããªã®ã§ããããã®è¾ºã®è©±ã¯Redashå°å ¥ããªã¼ããããµã¼ãã¼ãµã¤ãã¨ã³ã¸ãã¢ã®ä¸æåããã®ããã°ã«æ¸ãã¦ãããã¨æãã¾ãã
ãã¼ã¿éè¨åºç¤ã¨ãã¦ã®æ´»ç¨
æè¿ãã¬ã¿ã®RDBã«ãã¾ã£ããã¼ã¿ãæ´»ç¨ããæ©éãé«ã¾ãããã¼ã¿ã®éè¨ãåæãè¡ãããã«ãªãã¾ãããï¼ä»ã¯ã©ã¡ããã¨è¨ãã°éè¨ã¨å¯è¦åãã¡ã¤ã³ã§ããï¼
ä½ããã¼ã¿ã使ã£ã¦ãããããã¨ãããæã試è¡é¯èª¤æ®µéã§ã¯RãPython + Jupyter Notebookã§å°éã®ãã¼ã¿ã対象ã«åå¦çãéè¨ãããã®ã§ããããããproductionã®å ¨ãã¼ã¿ã«é©ç¨ããã¨ãå¦çæéããããããã¦ä¸¸ï¼æ¥ããã£ã¦ãã¾ãã¨ãã£ãã±ã¼ã¹ãå¢ãã¦ãã¾ããã
ããã§ãè¤éãªéè¨å¦çãéããå¦çã¯ãªãã¹ãSQLã«è½ã¨ãè¾¼ãã§BigQueryã§å¦çããããã«ãããã¨ã§è§£æ±ºãå³ã£ã¦ãã¾ãã
ä¾ãã°ãæ®éã«ã³ã¼ãã§æ¸ãã¨å
¨ä»¶ã«ã¼ãããããå¾ãæ°åç§ããããããªéè¨å¦çã§ããWINDOWé¢æ°ãªã©ãé§ä½¿ãã¦SQLã«è½ã¨ãã¦ããã§ã¯ã¨ãªãæããã¨ãä½äºããªãã£ãããã«æ°ç§ã§çµæãè¿ã£ã¦ãã¾ãã1åã®ã¯ã¨ãªã§å¦çã§ãããã¼ã¿ã®ãµã¤ãºã«ä¸éãããããããããããã¨ã1åã®ã¯ã¨ãªã§è§£æ±ºãã訳ã§ã¯ããã¾ããã大æµã¯åé¡ãªãå®è¡ã§ãã¦ãã¾ãã
ä½è«ã§ãããpythonã®ãã¼ã¿åæã©ã¤ãã©ãªã®pandasã«ã¯ãBigQueryã«ã¯ã¨ãªãæãã¦çµæãdataframeã«å¤æãã¦è¿ãå¦çãread_gbqã¨ããé¢æ°ä¸ã¤ã§å®ç¾ã§ããã®ã§å¤§å¤ä¾¿å©ã§ãã
SQLã«å¯ããæé©åãé²ããçµæãbqã³ãã³ãã©ã¤ã³ãã¼ã«ã§ã¯ã¨ãªãæãã¦çµæãå¥ã®ãã¼ãã«ã«importããã ãã§å®äºããå¦çãåºã¦ãã¾ãããBigQueryã®UDFã¯ã¾ã 使ã£ã¦ããªãã®ã§ãããUDFãæ´»ç¨ããã°ããã«script lessåãå éããã®ã§ã¯ãªãã§ããããã
ãªãããã¬ã¿ã§ã¯ãããã£ããã¼ã¿ã¨ã³ã¸ãã¢ãªã³ã°ãããã¸ãã¹æ¹é¢ã®ã¢ããªãã£ã¯ã¹ãå¾æãªãã¼ã¿ãµã¤ã¨ã³ãã£ã¹ããåéãã¦ãã¾ãããã¡ãããããããé¡ããã¾ãã
ãã°æ¤ç´¢åºç¤ã¨ãã¦ã®æ´»ç¨
sidekiqã¨shoryukenã®ãã°æ¬æã«ã¯ãWorkerã®ååãã¸ã§ãã®IDãã¹ãã¼ã¿ã¹ãå®è¡ããã¹ã¬ããã®IDãªã©ãå«ã¾ãã¦ãã¾ãããããã®æ å ±ããã¼ã¨ãã¦å©ç¨ããããã«ããã«ãparseãã¦æ§é åããå¿ è¦ãããã¾ããã¾ããå人æ å ±ãåºåãããã¨ãããããããã¹ãã³ã°ãå¿ é ã§ãã
ãããã®èª²é¡ã解決ãããããLogã®Formatterãç¬èªã®ãã®ã«ç½®ãæãã¦éç¨ãã¦ãã¾ããç¾å¨ã¯ä»¥ä¸ã®ãããªjsonã®formatã§ãã°ãåºåãã¦BigQueryã«importãã¦ãã¾ãã
{ "datetime":1472553986, "hostname":"sidekiq-server", "pid":"17863", "tid":"owww1vp08", "severity":"INFO", "worker":"TestWorker", "jid":"JID-08102742edd82acc5b698f7e", "job_action":"done", "processing_time":10.000, "formatter_err":null, "message":"{\"class_name\":\"String\",\"body\":\"done\"}", "log_version":1 }
Sidekiq.logger
ã®info
, error
çã®ã¡ã½ããã®å¼æ°ã«æ¸¡ããããã¼ã¿ã¯ message
ã¨ãããã¼ã«jsonã¨ãã¦æ ¼ç´ãããã®ã§ãBigQueryã®JSONé¢æ°ã使ã£ã¦ã«ã©ã ã«å±éããä¸æVIEWãä½æãããã®VIEWã«å¯¾ãã¦ã¯ã¨ãªãå®è¡ãããã¨ã§ã¡ãã»ã¼ã¸ã®æ¬æã§çµãè¾¼ããã¨ãå¯è½ã§ãã
ä¾ãã°ãéå»10æ¥éã®ãã°ã®ä¸ããerror
ã¨ããæååãmessageã«å«ã¾ãããã®ãæ¤ç´¢ãããæã¯ããããªæãã®SQLã§åå¾ã§ãã¾ãã
#standardSQL WITH MESSAGE_EXTRACTED AS( SELECT TIMESTAMP_SECONDS(datetime) AS UTC_DATETIME, worker, JSON_EXTRACT_SCALAR(message, '$.class_name') AS mesasge_class_name, JSON_EXTRACT_SCALAR(message, '$.body') AS message_body FROM `sidekiq_logs*` WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE("%Y%m%d", DATE_SUB(DATE(CURRENT_TIMESTAMP(), 'Asia/Tokyo'), INTERVAL 10 day)) AND FORMAT_DATE("%Y%m%d", DATE(CURRENT_TIMESTAMP(), 'Asia/Tokyo')) ) SELECT * FROM MESSAGE_EXTRACTED WHERE message_body LIKE '%error%' ORDER BY UTC_DATETIME DESC LIMIT 1000
ã¾ã¨ã
ç°¡åã§ã¯ããã¾ãããã¬ã¿ã«ãããBigQueryã®æ´»ç¨æ¹æ³ã«ã¤ãã¦ç´¹ä»ããã¦ããã ãã¾ããã
ãã¸ãã¹ã«ããããã¼ã¿æ´»ç¨ã®éè¦åº¦ãæ¥ã
é«ã¾ãæ¨ä»ããã¬ã¿ããã¼ã¿ããå¾ãç¥èãæ´»ç¨ãã¦ãã¸ãã¹ãå éããã¦ããããã¨æãã¾ãã