ALBã®ãã°ãAthenaã§åæããæ¹æ³ã¯ãå ¬å¼ããã¥ã¡ã³ãã«è¨è¼ãããã®ã§ããããã®æé ã§ãã¼ã¿ãã¼ã¹ã使ããã¨ãæ¥ä»ã®æ¡ä»¶ãå ¥ãã¦æ¤ç´¢ãããã¨ãã大éã®ãã¼ã¿ãã¹ãã£ã³ãã¦ãã¾ããé ãä¸ã«ãéããããã¨ããåé¡ã«ééãã¾ãã
ããã§ãããããããã®ãããã¼ã¿ãã¼ã¹ä½ææã«ãã¼ãã£ã·ã§ã³ãä½ã£ã¦ãããã¨ã§ãã
ã»ã»ã»ã¨ããè¨äºãæ¸ããã¨æã£ã¦ããã®ã§ããããã®ããã¥ã¡ã³ãã§ããã¼ãã£ã·ã§ã³ãå©ç¨ããããã«ãªã£ã¦ããã®ã§ãè¨äºã®å½¹å²ãæ»ã«ã¾ããã
ãªã®ã§ããã®è¨äºã¯åé¤ããããã¨ãæã£ãã®ã§ããããã¼ãã£ã·ã§ã³ã®åãæ¹ãå ¬å¼ã¨éã£ã¦ããã®ã§ãä¸å¿æç¨¿ãã¦ä¾é¤ãã¦ããã¾ãã
Athenaãã¼ãã«ã®ä½æ
Athenaã§é©å½ãªãã¼ã¿ãã¼ã¹ã鏿ãã¾ãã¯ä½æãããã以ä¸ã®ã¯ã¨ãªãå®è¡ãã¦ãã¼ãã«ã使ãã¾ãã
å®éã«ä½æããå ´åã¯ãSQLã®ãããã®ç®æãä¿®æ£ãã¦ãã ããã
- LOCATIONï¼ 's3://
ALBã®ãã°ãä¿åããã¦ããS3ãã±ããå/AWSLogs/AWSã¢ã«ã¦ã³ãID/elasticloadbalancing/ãªã¼ã¸ã§ã³/' - TBLPROPERTIES
- projection.date_time.rangeï¼ãã°ãåãå§ããæ¥ããï¼ãµã³ãã«ã¯ã¨ãªã§ã¯1å¹´åã«ãã¦ããï¼
- storage.location.templateï¼LOCATIONã¨åããã±ããã®ãã¹ï¼
${date_time}
ãã¼ãã«ä½æã¯ã¨ãª
CREATE EXTERNAL TABLE IF NOT EXISTS alb_logs ( type string, time string, elb string, client_ip string, client_port int, target_ip string, target_port int, request_processing_time double, target_processing_time double, response_processing_time double, elb_status_code string, target_status_code string, received_bytes bigint, sent_bytes bigint, request_verb string, request_url string, request_proto string, user_agent string, ssl_cipher string, ssl_protocol string, target_group_arn string, trace_id string, domain_name string, chosen_cert_arn string, matched_rule_priority string, request_creation_time string, actions_executed string, redirect_url string, lambda_error_reason string, target_port_list string, target_status_code_list string, classification string, classification_reason string) PARTITIONED BY ( `date_time` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' WITH SERDEPROPERTIES ( 'serialization.format' = '1', 'input.regex' = '([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*):([0-9]*) ([^ ]*)[:-]([0-9]*) ([-.0-9]*) ([-.0-9]*) ([-.0-9]*) (|[-0-9]*) (-|[-0-9]*) ([-0-9]*) ([-0-9]*) \"([^ ]*) ([^ ]*) (- |[^ ]*)\" \"([^\"]*)\" ([A-Z0-9-]+) ([A-Za-z0-9.-]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^\"]*)\" ([-.0-9]*) ([^ ]*) \"([^\"]*)\" \"([^\"]*)\" \"([^ ]*)\" \"([^\s]+?)\" \"([^\s]+)\" \"([^ ]*)\" \"([^ ]*)\"') LOCATION 's3://my_log_bucket/AWSLogs/1111222233334444/elasticloadbalancing/ap-northeast-1/' TBLPROPERTIES ( 'classification'='csv', 'projection.date_time.format'='yyyy/MM/dd', 'projection.date_time.interval'='1', 'projection.date_time.interval.unit'='DAYS', 'projection.date_time.range'='2020/06/16,NOW', 'projection.date_time.type'='date', 'projection.enabled'='true', 'projection.pvid.type'='injected', 'storage.location.template'='s3://my_log_bucket/AWSLogs/1111222233334444/elasticloadbalancing/ap-northeast-1/${date_time}') ;
ãã®ããæ¹ã®ã¡ãªãã
ãã®ãã¼ãã«ãå©ç¨ããå ´åã®ã¡ãªããã¯å¤§ãã2ã¤ããã¾ãã
- S3ãã±ããã®ãã¼ã¿ã¹ãã£ã³ç¯å²ã®éå®
- ãã¼ãã«ã®å®ææ´æ°ãä¸è¦
解説
ãã®ã¯ã¨ãªã§ä½æãããã¼ãã«ã«ã¯date_time (string) (ãã¼ãã£ã·ã§ã³å)ã¨ããã«ã©ã ã追å ããã¦ãã¾ãã

ãã®ã«ã©ã ãæ¤ç´¢ã¯ã¨ãªã«å«ããå ´åãAthenaãã¹ãã£ã³ããS3ãã±ããã®ãã©ã«ãã¯ãã«ã©ã æ¡ä»¶ã®ç¯å²ã ãã«ãªãã¾ãã
ä¾ãã°ã2021/06/15ã®ãã°ãã調æ»ãããå ´åã¯ã以ä¸ã®ãããªã¯ã¨ãªï¼6/15ã§ã¢ã¯ã»ã¹ã®å¤ãIPã¢ãã¬ã¹ã確èªï¼ã«ãããã¨ãã§ãã¾ãã
SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_logs WHERE date_time = '2021/06/15' GROUP BY request_verb, client_ip;
注æãããã®ã¯ãALBã§æ±ãæ¥ä»ã¯UTCã«ãªã£ã¦ããã®ã§ãJSTã§ç¢ºèªãããå ´åã¯æ¥ãè·¨ãã§æå®ããå¿ è¦ããããã¨ã§ãã ç¯å²ã®æå®ã¯INå¥ã§ãBETWEENã§ã大ä¸å¤«ãªã®ã§ãã好ããªæ¹ã§OKã§ãã
æéãå³å¯ã«æå®ãããå ´åã¯ãtimeã«ã©ã ãå©ç¨ãã¦ãã ããã
ä¾ãã°ãå
ç¨ã®ã¯ã¨ãªãJSTã®6/15ã¨ããå ´åã¯ä»¥ä¸ã®ããã«ãªãã¾ãã
SELECT COUNT(request_verb) AS count, request_verb, client_ip FROM alb_logs WHERE date_time in ('2021/06/14', '2021/06/15') AND (parse_datetime(time,'yyyy-MM-dd''T''HH:mm:ss.SSSSSS''Z') BETWEEN parse_datetime('2021-06-14-15:00:00','yyyy-MM-dd-HH:mm:ss') AND parse_datetime('2021-06-15-14:59:59','yyyy-MM-dd-HH:mm:ss')) GROUP BY request_verb, client_ip;
ã¾ã以åã¾ã§ã®ãã¼ãã«ã§ããã°ã使å¾ã¯ãã¼ã¿ã®ã¹ãã£ã³ç¯å²ãèªåçã«å¢ããªãã£ãã®ã§ããã
ãã®ãã¼ãã«ã¯projection.date_time.rangeã§ç¯å²ã®çµç«¯ã«NOWãæå®ãã¦ããããããã¼ãã«ãæ´æ°ãç´ããªãã¦ãå¸¸ã«ææ°ã®ãã°ã¾ã§æ¤ç´¢ãããã¨ãã§ãã¾ãã
å ¬å¼ããã¥ã¡ã³ãã®ãã¼ãã£ã·ã§ã³ã¨ã®å·®å
å ¬å¼ããã¥ã¡ã³ãã§ã¯ãã©ã«ãã®ãã¼ãã£ã·ã§ã³ããå¹´ãæãæ¥ã§æ°å¤ç®¡çã«ãã¦ããã®ã§ãã¯ã¨ãªãæ¸ãéã¯å¹´ææ¥ãããããåãã¦æå®ãããã¨ã«ãªãã¾ãã ã¾ããrangeã2020,2021ã¨ãã¦ããããã2022年以éã®ãã°ã¯ãã¼ãã«ãæ´æ°ãç´ãå¿ è¦ãããã¾ãã
ãããã£ãéããèããã¨ããã®è¨äºã«ãä¸å®ã®æå³ããã£ãã®ããªã¨æããªãããªãã§ããã