ããã«ã¡ã¯ãAWS CLIã好ããªç¦å³¶ã§ãã
ã¯ããã«
ä»åã¯ãDynamoDBã®ãã¼ã¿ãAthenaã§åæããä»çµã¿ãä½ã£ã¦ã¿ã¾ãã
è£è¶³ã§ããããã®ä»çµã¿ãä½ã£ãããã«ã±ã¯ãRAGã¢ããªã®ã·ã¹ãã ã«ããã¦ã ä¼è©±å±¥æ´ããã£ã¼ãããã¯çµæãä¿åããDynamoDBãåæãããã¨æã£ããã¨ãããã«ã±ã¨ãªãã¾ãã
å½åã¯DynamoDBãScanãã¦åå¾ãããã¼ã¿ãRAGã¢ããªä¸ã§ç¡çããéè¨ãã¦ç»é¢ã«è¡¨ç¤ºãã¦ããã®ã§ããã é »åº¦ã«ãããã¨æãã¾ãããåæãã度ã«Scanããã®ã¯ã¤ã±ããã¤ããªã¨æããAthenaã§åæããä»çµã¿ã«ãã¾ããã
æ¦è¦å³
解説
- DynamoDBã®ã¨ã¯ã¹ãã¼ãã¯ãã¹ã±ã¸ã¥ã¼ãªã³ã°ãã§ããªããããEventbridge Schedulerãæ´»ç¨ãå®æå®è¡ããããã«ãã¦ãã¾ãã
- DynamoDBã®ã¨ã¯ã¹ãã¼ãã¯ãæ¯åç°ãªããªãã¸ã§ã¯ããã¼ã«ãªããããã¨ã¯ã¹ãã¼ããã度ã«ãAthenaã®ãã¼ãã«ãåç §ããS3ã®ãã¹ãæ´æ°ããå¿ è¦ãããã¾ããã ãã®ãããä½ããããªãã£ãã®ã§ãããä¸è¨å¦çãè¡ãLambdaãä½æãã¦ãã¾ãã
- DynamoDBã®ã¨ã¯ã¹ãã¼ãã«ã¯æ§ã ãªãã¡ã¤ã«ãä½æãããã®ã§ãããdataé ä¸ã®json.gzãå©ç¨ãã¾ãã ãã®ãããæ«å°¾ãjson.gzã®ãªãã¸ã§ã¯ãã®ä½æãããªã¬ã¼ã«Lambdaãå®è¡ããä»çµã¿ã«ãã¦ãã¾ãã
åè: DynamoDBã®ã¨ã¯ã¹ãã¼ãæã®S3ã®ãªãã¸ã§ã¯ãæ§é
DestinationBucket/DestinationPrefix . âââ AWSDynamoDB âââ 01693685827463-2d8752fd // the single full export â âââ manifest-files.json // manifest points to files under 'data' subfolder â âââ manifest-files.checksum â âââ manifest-summary.json // stores metadata about request â âââ manifest-summary.md5 â âââ data // The data exported by full export â â âââ asdl123dasas.json.gz âï¸ãã®ãã¡ã¤ã«ããã¼ã¿ã®å®æ â â ... â âââ _started // empty file for permission check
â»https://docs.aws.amazon.com/ja_jp/amazondynamodb/latest/developerguide/S3DataExport.Output.html ããå¼ç¨
æé
â ãªã½ã¼ã¹ã®ãããã¤
- ã½ã¼ã¹ã³ã¼ãã®ãã¦ã³ãã¼ã
git clone https://github.com/kazuya9831/blog-sample.git
- ãã£ã¬ã¯ããªã®ç§»å
cd blog-sample/athena-dynamodb/
- ã½ã¼ã¹ã³ã¼ãã®ãã«ã
sam build
- ãªã½ã¼ã¹ã®ãããã¤
sam deploy
â¡Athena(Glue)ã®ã»ããã¢ãã
- å¤æ°ã®è¨å®
export ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) export S3_BUCKET_FOR_ATHENA_QUERY_LOG="sample-athena-query-results-${ACCOUNT_ID}"
- ãã¼ã¿ãã¼ã¹ã®ä½æ
aws athena start-query-execution \ --query-string "$(cat sql/create-database.sql)" \ --result-configuration OutputLocation=s3://"${S3_BUCKET_FOR_ATHENA_QUERY_LOG}"
- ãã¼ãã«ã®ä½æ
aws athena start-query-execution \ --query-string "$(cat sql/create-table.sql | sed -e s/\${ACCOUNT_ID}/${ACCOUNT_ID}/g)" \ --result-configuration OutputLocation=s3://"${S3_BUCKET_FOR_ATHENA_QUERY_LOG}"
Athenaã®ã³ã³ã½ã¼ã«ãã確èªããã¨ä»¥ä¸ã®ããã«sampleã¨ãããã¼ãã«ãä½æããã¦ãããã¨ãåããã¨æãã¾ãã
â¢DynamoDBã¸ã®ãµã³ãã«ãã¼ã¿ã®ç»é²
aws dynamodb batch-write-item \ --request-items file://sample-data/request-items.json
ä»åã¯ä»¥ä¸ã®ãããªãµã³ãã«ãã¼ã¿ãç»é²ãã¾ãã
ChatId | ConversationAt | UserId | Question | Answer | Feedback |
---|---|---|---|---|---|
1 | 1 | user1 | 質å1 | åç1 | Good |
1 | 2 | user1 | 質å2 | åç2 | Good |
1 | 3 | user1 | 質å3 | åç3 | Good |
2 | 1 | user2 | 質å1 | åç1 | Good |
2 | 2 | user2 | 質å2 | åç2 | Bad |
3 | 1 | user3 | 質å1 | åç1 | Good |
3 | 2 | user3 | 質å2 | åç2 | Good |
3 | 3 | user3 | 質å3 | åç3 | Bad |
3 | 4 | user3 | 質å4 | åç4 | Good |
3 | 5 | user3 | 質å5 | åç5 | Bad |
3 | 6 | user3 | 質å6 | åç6 | Good |
â£DynamoDBã®ã¨ã¯ã¹ãã¼ã
- å¤æ°ã®è¨å®
S3_BUCKET_FOR_EXPORT_DYNAMODB="sample-export-dynamodb-${ACCOUNT_ID}" DYNAMODB_TABLE_ARN=arn:aws:dynamodb:ap-northeast-1:${ACCOUNT_ID}:table/sample-chat-history-table
- DynamoDBã®ã¨ã¯ã¹ãã¼ã
export_arn=$(aws dynamodb export-table-to-point-in-time \ --table-arn "${DYNAMODB_TABLE_ARN}" \ --s3-bucket "${S3_BUCKET_FOR_EXPORT_DYNAMODB}" \ --query "ExportDescription.ExportArn" \ --output text )
- DynamoDBã®ã¨ã¯ã¹ãã¼ãã®ã¹ãã¼ã¿ã¹ç¢ºèª
aws dynamodb describe-export \ --export-arn "${export_arn}" \ --query 'ExportDescription.ExportStatus' \ --output text
ã¨ã¯ã¹ãã¼ãç´å¾ã¯ããIN_PROGRESSãã¨è¡¨ç¤ºãããã¨æãã¾ãã ç´5åç¨åº¦ã§å¦çãå®äºããã¨æãã®ã§ããCOMPLETEDãã表示ãããã¾ã§å¾ ã¡ã¾ãã
DynamoDBã®ã³ã³ã½ã¼ã«ãã確èªããã¨ä»¥ä¸ã®ããã«ã¨ã¯ã¹ãã¼ããå®è¡ããã¦ãããã¨ãåããã¨æãã¾ãã
â¤åä½ç¢ºèª
æå¾ã¯åä½ç¢ºèªã¨ãã¦ãAthenaã®ã³ã³ã½ã¼ã«ããSQLãå®è¡ãã¾ãã
https://ap-northeast-1.console.aws.amazon.com/athena/home
- ãµã³ãã«ãã¼ã¿ã®ç¢ºèª
select item.Chatid.s as Chatid, item.ConversationAt.n as ConversationAt, item.UserId.s as UserId, item.Question.s as Question, item.Answer.s as Answer, item.feedback.s as feedback from dynamodb.sample
- Good 㨠Badã®éè¨
select COUNT(*) AS Total, COUNT(CASE WHEN item.Feedback.s = 'Good' THEN 1 END) AS Good, COUNT(CASE WHEN item.Feedback.s = 'Bad' THEN 1 END) AS Bad from dynamodb.sample
- å©ç¨è ã©ã³ãã³ã°
SELECT item.UserId.s AS UserId, COUNT(*) AS Count FROM dynamodb.sample GROUP BY item.UserId.s ORDER BY Count DESC;
çµããã«
ä»åã¯ãDynamoDBã®ãã¼ã¿ãAthenaã§åæããä»çµã¿ãä½ã£ã¦ã¿ã¾ããã ã©ãªããã®ãå½¹ã«ç«ã¦ãã°å¹¸ãã§ãã