AWSã®ALB(Application Load Balancer)ã®ãã°ã¯S3ã«ç½®ããããããã®ä¸èº«ããµã¯ãã¨èª¿ã¹ããã¨ããAthenaã使ãæ¹æ³ãæ¨æºçã§ãä¸è¨ã§æ¡å ããã¦ããããã«ãã¼ãã£ã·ã§ã³å°å½±(Partition Projection)ã§ãã¼ãã«ãä½ã£ã¦Athenaããã¯ã¨ãªããã
ç§ãå¾æ¥ã¯ãã®æ¹æ³ã使ã£ã¦ããããAthenaã¯ãã©ã¦ã¶ãã使ãã¨åä½ããã£ãããã¦ãããã決ã¾ã£ãã¯ã¨ãªã1åããå®è¡ãã¦çµæãåå¾ãããã ãã®ã¨ããªãã¾ã ãããæ¢ç´¢çã«ã¯ã¨ãªãä½çºãå®è¡ãããã¨ãã«ã¯ä½¿ãåæãæªãã
æè¿ä»ã®ããã¸ã§ã¯ãã§DuckDBã使ãããã«ãªã£ã¦ã使ãåæã®è¯ãã«æåãã¦ããããDuckDBã¯ALBã®ãã°ãæ¢ç´¢çã«èª¿ã¹ããã¨ãã«ããã£ã¡ã使ããã¨æã£ãã
DuckDBã®ã¤ã³ã¹ãã¼ã«ã¯å ¬å¼ ããããåç §ãS3ã®ãã¡ã¤ã«ãã¯ã¨ãªããæºåã¨ãã¦ã¯AWS Extension â DuckDBãS3 API Support â DuckDBãåç §ã 端çã«ããã¨ãã·ã§ã«ã§S3ã«ã¢ã¯ã»ã¹ã§ããã¯ã¬ãã³ã·ã£ã«ã®ç°å¢å¤æ°ããã¼ãããã¦ããç¶æ ã§DuckDBãèµ·åãã¦ã
INSTALL aws; LOAD aws; INSTALL httpfs; LOAD httpfs; CREATE SECRET ( TYPE S3, PROVIDER CREDENTIAL_CHAIN );
ããã§ãã¼ããã¦ããã¯ã¬ãã³ã·ã£ã«ãã¢ã¯ã»ã¹æ¨©éãæã¤S3ã®ãã¡ã¤ã«ã«å¯¾ãã¦ã¯ã¨ãªã§ããããã«ãªãã ä¾ãã°2024å¹´11æã®ãã°ããã¼ããããªãä¸è¨ã®ããã«ãããBlobã®ãã¿ã¼ã³ãå¤ããã°ä»»æã®æéã®ãã¼ã¿ããã¼ãã§ããã ãã®å®ç¾©ã«ããã£ã¦ã¯ãã¼ãã£ã·ã§ã³å°å½±ã使ç¨ã㦠Athena 㧠ALB ã¢ã¯ã»ã¹ãã°ç¨ãã¼ãã«ãä½æãã - Amazon Athena ã«ããå®ç¾©ãåèã«ããã
CREATE TABLE alb_log_202411 AS SELECT * FROM read_csv( 's3://[YOUR_S3_BUCKET_NAME]/AWSLogs/[YOUR_ACCOUNT_ID]/elasticloadbalancing/[YOUR_REGION]/2024/11/**/*.log.gz', columns={ 'type': 'VARCHAR', 'timestamp': 'TIMESTAMP', 'elb': 'VARCHAR', 'client_ip_port': 'VARCHAR', 'target_ip_port': 'VARCHAR', 'request_processing_time': 'DOUBLE', 'target_processing_time': 'DOUBLE', 'response_processing_time': 'DOUBLE', 'elb_status_code': 'INTEGER', 'target_status_code': 'VARCHAR', 'received_bytes': 'BIGINT', 'sent_bytes': 'BIGINT', 'request': 'VARCHAR', 'user_agent': 'VARCHAR', 'ssl_cipher': 'VARCHAR', 'ssl_protocol': 'VARCHAR', 'target_group_arn': 'VARCHAR', 'trace_id': 'VARCHAR', 'domain_name': 'VARCHAR', 'chosen_cert_arn': 'VARCHAR', 'matched_rule_priority': 'VARCHAR', 'request_creation_time': 'TIMESTAMP', 'actions_executed': 'VARCHAR', 'redirect_url': 'VARCHAR', 'error_reason': 'VARCHAR', 'target_port_list': 'VARCHAR', 'target_status_code_list': 'VARCHAR', 'classification': 'VARCHAR', 'classification_reason': 'VARCHAR', 'conn_trace_id': 'VARCHAR' }, delim=' ', quote='"', escape='"', header=False, auto_detect=False );
S3ã®å¯¾è±¡ãã¡ã¤ã«æ°ãå¤ãå ´åãã¼ãã«ãã§ããã¾ã§ã«ããããæéã¯ããããããã£ãããã¼ã«ã«ã«ãã¼ããããã°ãã¨ã¯éãã
- ãã¼ãã«ã®åå¨ã確èª
D show tables; ââââââââââââââââââ â name â â varchar â âââââââââââââââââ⤠â alb_log_202411 â ââââââââââââââââââ
- ã¹ãã¼ãã確èª
D describe alb_log_202411; ââââââââââââââââââââââââââââ¬ââââââââââââââ¬ââââââââââ¬ââââââââââ¬ââââââââââ¬ââââââââââ â column_name â column_type â null â key â default â extra â â varchar â varchar â varchar â varchar â varchar â varchar â ââââââââââââââââââââââââââââ¼ââââââââââââââ¼ââââââââââ¼ââââââââââ¼ââââââââââ¼âââââââââ⤠â type â VARCHAR â YES â â â â â timestamp â TIMESTAMP â YES â â â â â elb â VARCHAR â YES â â â â â client_ip_port â VARCHAR â YES â â â â â target_ip_port â VARCHAR â YES â â â â â request_processing_time â DOUBLE â YES â â â â â target_processing_time â DOUBLE â YES â â â â â response_processing_time â DOUBLE â YES â â â â â elb_status_code â INTEGER â YES â â â â â target_status_code â VARCHAR â YES â â â â â received_bytes â BIGINT â YES â â â â â sent_bytes â BIGINT â YES â â â â â request â VARCHAR â YES â â â â â user_agent â VARCHAR â YES â â â â â ssl_cipher â VARCHAR â YES â â â â â ssl_protocol â VARCHAR â YES â â â â â target_group_arn â VARCHAR â YES â â â â â trace_id â VARCHAR â YES â â â â â domain_name â VARCHAR â YES â â â â â chosen_cert_arn â VARCHAR â YES â â â â â matched_rule_priority â VARCHAR â YES â â â â â request_creation_time â TIMESTAMP â YES â â â â â actions_executed â VARCHAR â YES â â â â â redirect_url â VARCHAR â YES â â â â â error_reason â VARCHAR â YES â â â â â target_port_list â VARCHAR â YES â â â â â target_status_code_list â VARCHAR â YES â â â â â classification â VARCHAR â YES â â â â â classification_reason â VARCHAR â YES â â â â â conn_trace_id â VARCHAR â YES â â â â ââââââââââââââââââââââââââââ´ââââââââââââââ´ââââââââââ´ââââââââââ´ââââââââââ´âââââââââ⤠â 30 rows 6 columns â ââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââââ
ãã¨ã¯ããããã¯ã¨ãªãã¦ã好ããªããã«ã
select * from alb_log_202411 where elb_status_code != 200 LIMIT 1;
select timestamp, request, elb_status_code, target_status_code, domain_name from alb_log_202411 where elb_status_code != 200 LIMIT 100
select timestamp, request, elb_status_code, target_status_code, domain_name from alb_log_202411 where elb_status_code != 200 and domain_name != 'foobar.com' LIMIT 100
ã¿ããã«æ¢ç´¢çã«èª¿ã¹ãã®ãé«éã«ãããã®ãæé«ã
â»è£è¶³
30ã«ã©ã ãããã®ã§ãããã©ã«ãã§ã¯ã³ãã³ãã©ã¤ã³ã§å
¨é¨ã®ã«ã©ã ã表示ã§ããªãã
ãªã®ã§ãSELECTå¥ã§ã«ã©ã ãçµãããã.mode line
ã¨ã .mode box
ã¨ãã§å
¨ã«ã©ã 表示ã§ããã
ããã©ã«ãã«æ»ãã«ã¯ .mode duckbox
ã
ã©ãããé¸æè¢ããããã¯Output Formats â DuckDBè¦ãã¨åããã
â»è£è¶³2
ã°ã°ã£ããä¼¼ããã¨ãã¦ããè¨äºãã£ãã®ã§ãªã³ã¯ãã¨ã
Analysing AWS Application Load Balancer Logs with DuckDB: Unleashing Performance Insights