ã¡ãã£ã¢ãããã¯ãéçºé¨ãã¼ã±ãã£ã³ã°ãµã¼ãã¹éçºã°ã«ã¼ãã®æå¦»è¬æ¨¹ã§ããã¯ãã¯ãããã«ãããåºåéçºã·ã¹ãã å ¨è¬ã®æ°è¦éçºã»ä¿å®ã»éç¨ãæ å½ãã¦ãã¾ãã
ãã¼ã±ãã£ã³ã°äºæ¥å ¨è¬ããã¼ã ä½å¶ã«ã¤ãã¦ã¯ãååã®è¨äºã§ãç´¹ä»ãã¾ããããã¡ããèªãã§é ããã°ãã¡ãã£ã¢ãããã¯ãäºæ¥é¨ããããçµç¹ä½å¶ãããã¼ã±ãã£ã³ã°ãµã¼ãã¹éçºã°ã«ã¼ãã®æè¡ã¹ã¿ãã¯ã«ã¤ãã¦æ¦è¦ãæ´ãã§ããã ããã¨æãã¾ãã
ä»åã¯ããã®è¨äºã§ã触ããåºåé ä¿¡ãµã¼ãã¼ã®æè¡çãªåãçµã¿ã«ã¤ãã¦ãç´¹ä»ãã¾ãããã®ä¸ã§ãç¹ã«ãAmazon DynamoDB Accelerator (DAX) ã®æ´»ç¨ã«ç¦ç¹ãçµã£ã¦ãä¼ããã¾ãã
èæ¯
徿¥ãåºåãã¢ããªå´ã§è¡¨ç¤ºãããããã«ã¯ããã¼ã±ãã£ã³ã°ãµã¼ãã¹éçºã°ã«ã¼ãããªã¼ãã¼ã¨ãã¦éçºãã¦ããåºå SDK ããã¯ãã¯ãããæ¬ä½ã¢ããªã«çµã¿è¾¼ã¿ãéåæã«åºåé ä¿¡ãµã¼ãã¼ã«ãªã¯ã¨ã¹ããè¡ããã¨ã§å®ç¾ãã¦ãã¾ããã
ä»åãiOS ã¢ããªã«ã¦å¤§ããªä»æ§å¤æ´ãè¡ããããã¨ã«ãªãã¾ããããã®æ°ãããã¼ã¸ã§ã³ã§ã¯ã"ã¢ãã³BFFãæ´»ç¨ããæ¢åAPIãµã¼ãã¼ã®åæ§ç¯" ã§ç´¹ä»ããã¦ãããOrcha ã¨å¼ã°ãã BFF Server ãéãã¦ãå¿ è¦ãªã¬ã¹ãã³ã¹ãã¯ã©ã¤ã¢ã³ãã«è¿ããã¨ã決ã¾ã£ã¦ãã¾ãããããã§ãåºåã表示ãããã«ããã£ã¦ã徿¥ã®ããã«éåæã§ç´æ¥åºåé ä¿¡ãµã¼ãã¼ã«ãªã¯ã¨ã¹ãããã®ã§ã¯ãªããBFF Server ã¨ãã¦ã® Orcha ã®ç«ã¡ä½ç½®ãå©ç¨ããOrcha ããåºåé ä¿¡ãµã¼ãã¼ãåãåãããããã«ã§ããªãããã¨ããè°è«ãè¦ä»¶å®ç¾©ãã§ã¼ãºã§çãã¦ãã¾ããã

åºåé ä¿¡ã·ã¹ãã æ¦è¦³ã«ã¤ãã¦ã¾ã¨ããååã®è¨äº ã®æç¹ã§ã¯ãåºåé ä¿¡ãµã¼ãã¯ãiPhone/Android/Web ããã® HTTP ãªã¯ã¨ã¹ããç´æ¥åãä»ãããã¨ãåæã«æ¸ããã Web ãµã¼ãã¼ã§ããããããã£ã¦ãæ¢åã®åºåé ä¿¡ãµã¼ãã¼ã«æãå ããgRPC ã«ãããµã¼ãã¹ééä¿¡ãåãä»ãããããªå®è£ æ¡å¼µãæ¤è¨ãã¾ããã
ããããªãããè¦ä»¶å®ç¾©ãé²ãã«ã¤ãã¦ãæ¢åã®åºåé ä¿¡ãµã¼ãã¼ã®æ©è½æ¡å¼µãããã©ã¼ãã³ã¹ã®è¦³ç¹ããå®ç¾ãé£ãããã¨ãããã¨ãæããã«ãªã£ã¦ããã¾ããã
Orcha ããåºåãåãåãããå ´åãåºåé ä¿¡ãµã¼ãã¼ã® Latency ããã¦ã¼ã¶ãªã¯ã¨ã¹ãå ¨ä½ã® Latency ã«å½±é¿ãã¾ããå ·ä½çã«è¨ãã¨ãOrhca ã® Latency ã A ms, åºåé ä¿¡ãµã¼ãã¼ã® Latency ã B ms ãã£ãå ´åãå ¨ä½ã® Latency 㯠(A+B) ms ã¨ãªãã¾ããã§ããããåºåã«ãã£ã¦æ¬ä½ã®ã¦ã¼ã¶ä½é¨ã«æªå½±é¿ãä¸ããªãããã«ãOrcha ã«å¯è½ãªéãæ©ãåºåãè¿ããã¨ãæ±ãããã¾ãããããã§ãåºåé ä¿¡ãµã¼ãã¼ã® SLO ããLatencyããSLI ããp95 ã§ 15msããç®æ¨ã¨ãã¾ããã
ã¨ããããæ¢åã®åºåé ä¿¡ãµã¼ãã¼ã® Latency ã¯ãp95 ã§ 100ms, p99 ã§ 120-140ms åå¾ã§ãããããã¦ãå½åã®ãã¼ã¿ã¢ãã«ã®é½åä¸ãã©ã Scale-up/Scale-out ããã¨ããã§ãç®æ¨ã¨ãã SLI/SLO ãéæã§ããªããã¨ããããã¾ããã
ã¨ããã®ããæ¢åã®åºåé
ä¿¡ãµã¼ãã¼ã§ã¯ãAWS RDS (MySQL) + Elasticache (memcached) ã¨ããæ§æã§ãããmemcached ã¯ãããªãã«æ©ãã1 åã® get command ã«å¯¾ã㦠1-2 ms ã§è¿ããã¨ãã§ãã¦ãã¾ãããããããåãçµãã¹ãæ¬è³ªç課é¡ã¯ãæ¢åã®ãã¼ã¿ã¢ãã«ã®ãªã¬ã¼ã·ã§ã³ã«ããã¾ãããæ¢åã®ãã¸ãã¹ãã¸ãã¯ã§ã¯ãåºåæ½é¸ãããããã«ãæ§é ä¸ 30-40 åãå¤ãã¨ãã«ã¯ 50-60 åç¨åº¦ã memcached ã« get command ãéãå¿
è¦ãããã¾ãããããªãã¡ãp95 ã§ 100ms ã ã£ãã¨ããã¨ããã®ãã¡ã»ã¼åæ°ä»¥ä¸ã memcached é¨åã§å ãã¦ãããã¨ã観測ã§ãã¦ãã¾ããã
ãã®æç¹ã§æ¢åã®åºåé ä¿¡ãµã¼ãã¼ã§ã¯ãç®æ¨ã¨ãã SLI/SLO ãéæã§ããªãã¨å¤æããããå¾ã¾ããã§ããã
解決ç
ããã§ã以ä¸ã®é çªã§åºåé ä¿¡ãµã¼ãã¼ããªãã¬ã¼ã¹ãããã¨ã決æãã¾ããã
- refactoring
- a-1. éå»ã®æè¡çè² åµã極éã¾ã§è¿æ¸ããã³ã¼ãèªä½ã忏
- a-2. 仿§ãã®ãã®ãè¦ç´ããã³ã¼ãèªä½ã忏
- re-architecture
- b-1. ã¢ã¯ã»ã¹ãã¿ã¼ã³ãå ¨ã¦æ´ãåºããä¸ã§ããã¼ã¿ã¢ãã«ãæ ¹æ¬ããè¦ç´ã
- b-2. RDBMS (MySQL) -> NoSQL (DynamoDB) ã«ç§»è¡
ãa-1ãããã³ãa-2ãã¯ã調æ»çãªãã¡ã¯ã¿ãªã³ã°ï¼exploratory refactoringï¼1ã®ä¸ç°ã¨ãã¦ãæ¢åã®ãã¸ãã¹ãã¸ãã¯ãçè§£ãããã¨ãåã³æ¬¡ã® re-architecture ã®ãã§ã¼ãºã®ãªã¹ã¯ã¨ä½æ¥ã³ã¹ãã忏ããããã®åæºåã¨ãã¦è¡ãã¾ããã仿§ãã®ãã®ã®è¦ä¸è¦ããã£ã¬ã¯ã¿ã¼é£ã¨èª¿æ´ããã¨ãã£ããå°éã ãå¿ è¦ä¸å¯æ¬ ãªåãçµã¿ãè¡ãã¾ããã
ãb-1ããã§ã¼ãºã§ã¯ãæ¢åã®åºåé ä¿¡ãµã¼ãã¼ã«ãããå ¨ã¦ã® SQL çºè¡ãã¿ã¼ã³ãåã³å ¥ç¨¿ããé ä¿¡ã¾ã§ã®ä¸é£ã®ãã¼ã¿ããã¼ã® INPUT/OUTPUT ãæ´çãã¾ããããããã¦éæ£è¦åããããã¼ã¿ã¢ãã«ãããb-2ãã®ãã§ã¼ãºã«ç½®ã㦠DynamoDB ã«æ ¼ç´ããåºåæ½é¸ãè¡ãããã« 1 åã® BatchGetItem ãçºè¡ããã ãã§ãæ¢åã®ãã¸ãã¯ãå®ç¾ã§ããããã«ãªãã¾ããã
DynamoDB èªä½ã¯æ°´å¹³æ¹åã§ã®ã¹ã±ã¼ã«æ§ã«åªããæ° ms ã§ã¬ã¹ãã³ã¹çµæãè¿ãã¦ããã¾ããããããªãããåºåé ä¿¡ã®ã¿ã¼ã²ãã£ã³ã°æ©è½ã«ããã DynamoDB ã®å©ç¨å®ç¸¾ãããLatency ã«ä¸å®å®æ§ã§ãããã¨ãããã£ã¦ãã¾ãããå ·ä½çã«ã¯ãMax ã® Latency ã æ°ç¾ ms 以ä¸ã¾ã§è·³ãããããã¨ã䏿¥ã«æ°åã®é »åº¦ã§çºçãã¦ãã¾ãããããã ãã§ãªããã¢ããªã±ã¼ã·ã§ã³ã®æ§è³ªä¸ãèªã¿è¾¼ã¿ã®åæ°ãå¤ããããDynamoDB ã¸ã®ç´æ¥ã®æä½åæ°ãæ¸ããã¦ã¤ã³ãã©ã³ã¹ããæããå¿ è¦ãããã¾ãããæ´ã«ã¯ãå°æ¥ã®ãµã¼ãã¹ã®æé·ã«ä¼´ã£ã¦ã 容æã« Read ã scale-out ã§ããããã«ãã¦ããå¿ è¦ãããã¾ããã
以ä¸ã®çç±ãããDAX ã«ç½ç¾½ã®ç¢ãç«ã¡ã¾ããã
DAX ã¨ã¯
DAX ã«ã¤ãã¦ã¯ã以ä¸ã®ç¹å¾´ãããããã¾ãã
- DynamoDB ãããã¯ã¨ã³ãã¨ãããã¨ã«ç¹åãã in-memory cache storeã
- Single-leader æ§æãPrimary node ããã¹ã¦ã® Write ãåãä»ãããReplica nodes ã Item ãè¤è£½ããã
- DynamoDB ã¸ã®æ¸ãè¾¼ã¿ã¯ Write-through, ããªãã¡ DynamoDB ã¸ã®æ¸ãè¾¼ã¿ãªã¯ã¨ã¹ãã®çµæã¾ã§åæçã«è¡ã
- ãã£ãã·ã¥æ¦ç¥ã¨ãã¦ã¯ Least Recently Used (LRU) ã®ä»ãNegative cache ã TTL ãªã©å¿ è¦æä½éã¯å®è£ ããã¦ãã
ãã®ä»ã®è©³ç´°ã«ã¤ãã¦ã¯ãofficial developer guide ã®ä»ããã¼ã ã«å±éããéã®ä»¥ä¸è³æãåèã«ãã¦ãã ãã
çµæ
çµæããããã¨ã平常æã§ 5ms åå¾ã§ã¬ã¹ãã³ã¹ãè¿ããã¨ãã§ãã¦ãã¾ããæ§åºåé ä¿¡ãµã¼ãã¼ã®å¹³å 100~120ms ã¨æ¯è¼ããæ¦ã 20x ã®æ¹åãå®ç¾ãããã¨ãã§ãã¾ãããã¬ã¤ãã³ã·ãä¸å®å®ã§ã¹ãã¤ã¯ã«ãªã£ã¦ããã®ã¯ãCache ã® TTL ãåããã¿ã¤ãã³ã°ã ã¨å¤æãã¦ãã¾ãããã®å ´åã§ã DynamoDB ã«ãªã¯ã¨ã¹ãããã¦ãããã 10-15ms ã§è¿ãã¦ãã¾ãã
ä¸è¨ã¯è² è·è©¦é¨å®è¡æã®çµæã§ãããã¾ã æ¢åã®å ¨åºåååããªãã¬ã¼ã¹å¾ã®ãµã¼ãã¼ãæãã¦ããããã§ã¯ãªãã¨ã¯ãããä»çµã¿çã«ã¯åºåååãå¢ãã¦ã DAX/DynamoDB ã¸ã®ãªã¯ã¨ã¹ãæ°ã¯å¢ãã¾ããããããã£ã¦ãECS task ã® CPU/Memory ã DAX instance type ãããã£ãã·ã¥ã«ä¹ããã¢ã¤ãã æ°ï¼working setï¼ ã«å¿ãã¦æ£ããè¨å®ãããã¦ããã°ããã®å¤ãããããããã¬ãªãæ³å®ã§ãã

ã¾ããå®å ¨ãªç§»è¡ã«ãã£ã¦ã¤ã³ãã©ã³ã¹ãã大ããã³ã¹ããã¦ã³ã§ãããã¨ãè¦è¾¼ãã§ãã¾ãã詳細ãªè¨ç®ã¯ä¼ãã¾ãããæ§é ä¿¡ãµã¼ãã¼ã§å©ç¨ãã¦ãã AWS RDS + AWS ElastiCache (memcached) ãã DynamoDB + DAX ã«ç§»è¡ããçµæãæéã§ 100,000 JPY åå¾ã®ã³ã¹ãæé©åãå®ç¾ã§ãã¾ããæ´ã«ãã¢ããªã±ã¼ã·ã§ã³ãµã¼ãã¼ã Ruby (Rails) ãã Go ã¸ã®ç§»è¡ããã¦ããã®ã§ãECS Service ã® Running Tasks æ°ã®åæ¸ã«ããã³ã¹ãä½ä¸ãè¦è¾¼ãã§ãã¾ãã
ã³ã¹ãæé©åã®åãçµã¿ã¯ã"ã¤ã³ãã©ã®ã³ã¹ãæé©åã®éè¦æ§ã¨ RI (ãªã¶ã¼ããã¤ã³ã¹ã¿ã³ã¹) ã®ç¶æç®¡çã«ãããã¯ãã¯ãããã§ã®åãçµã¿"ã«ã¦ç´¹ä»ããã¦ããããã«ãSRE ãã¼ã ãæè¡åãçµéãã¦ãæé©åã«å¿ è¦ãªåºç¤ã®æ´åãæ å ±æä¾ãè¡ã£ã¦ããã¦ãã¾ãããã®åçµã®ãããã§ãã¢ããªã±ã¼ã·ã§ã³éçºè ã¨ãã¦ãè¦ä»¶ãéæãã¤ã¤ã¤ã³ãã©ã³ã¹ããæé©åããããã®åå£ãé¤ããã¤ã¤ããã¾ãã
ã¾ããDAX Cluster ãåèµ·åä¸ã§ãã£ãããä¸ä¸çéä¸å¯è½ã§ãã£ãå ´åãèæ ®ããDynamoDB ã¸ãªã¯ã¨ã¹ãã Fallback ããä»çµã¿ãå®è£ ãã¦ãã¾ããDAX ã¸ã®æä½ã¯ DynamoDB ã¨ééçã§ãããããå¿ è¦ä»¥ä¸ã«ã¢ããªã±ã¼ã·ã§ã³ã³ã¼ãã«è¤éæ§ãæã¡è¾¼ããã¨ç¡ãå®ç¾ã§ããã®ããDAX ã鏿ä¸ããã®ç¹é·ã§ãããã
å¶ç´æ¡ä»¶
ããããªãããDAX ã¯ãã¡ããéã®å¼¾ä¸¸ã§ã¯ããã¾ãããçã®å¼·ãããã«ã¦ã§ã¢ã§ãã®ã§ãæ¬çªã«å°å ¥ããéã¯ã以ä¸ã®å¶ç´æ¡ä»¶ãååã«åå³ãã¦ããæ¤è¨ãã¦ãã ããã
- DynamoDB 以å¤ã®ãã¼ã¿ã¹ãã¢ããã£ãã·ã¥ãããã¨ã¯ä¸å¯è½
- ãã ããè¤æ° DynamoDB Table ã使ããã¨ã¯ã§ãã
- DAX SDK ã®å質ããã³ãµãã¼ãç¶æ³ã®ç²¾æ»
- ä¾ãã°ãRuby ã® SDK ã¯åå¨ããªããããèªåå®è£ ãå¿ è¦
- ä¾ãã°ãaws-dax-go ã«ã¯å®è£ ããã¦ããªã API ã夿°åå¨ãã 2
- Single-leader æ§æã®ãããæ¸ãè¾¼ã¿ã¯ã¼ã¯ãã¼ããæ±ããããã¢ããªã±ã¼ã·ã§ã³ã®å ´åã¯æ§è½ã«æ³¨æ
- 許ããããªã Write-Around ã¨å¼ã°ããæ¸ãè¾¼ã¿æ¦ç¥ãã¨ããã¨ã¯ã§ãã
- TTL ã¯ãã¹ã¦ã® Item ã«å
±éã§ãããåå¥ã« TTL ãè¨å®ãããã¨ã¯ã§ããªã
- ä¾ãã°ãããã® Item 㯠TTL 1min, ãã® Item 㯠TTL 60minãã¨ãã£ãæè»ãª TTL ã®è¨å®ãã§ããªã
- TTL ã®å¤æ´ã®ããã« Parameter Group ãæ´æ°ããå ´åãå®è¡ä¸ã® instance ã«é©ç¨ãããã¨ãã§ããªã
- ãã¦ã³ã¿ã¤ã ç¡ãã«é©ç¨ããå ´åãå¥ Cluster ãç«ã¦ãä¸ã§å¾ã ã«ãªã¯ã¨ã¹ããåãæ¿ããæ§ Cluster ãè½ã¨ããã¨ãã£ãéç¨ãå¿ é
éç¨ã»ä¿å®
次ã«ãDAX ãç¨ããã¢ããªã±ã¼ã·ã§ã³ã®å¯ç¨æ§ãä¸é·æç®ç·ã§æ ä¿ããããã®åãçµã¿ã«ã¤ãã¦ã以ä¸ã®è¦³ç¹ããç´¹ä»ãã¾ãã
- Monitoring
- Alerting
- Runbook
- Maintenance
Monitoring
DAX ã® CloudWatch metrics ä¸è¦§ã«ã¤ãã¦ã¯ Developer Guide ãã確èªã§ãã¾ãããã®ãã¡ãã¢ããªã±ã¼ã·ã§ã³ã®æ§è³ªããã³äºæ¥ã®åªå 度ããã以ä¸ã® metrics ãåªå çã«ç£è¦ãããã¨ã«ãã¦ãã¾ãã
| Metrics | Description |
|---|---|
| CPUUtilization | % of CPU utilization |
| CacheHitRatio | ItemCacheHits / (ItemCacheHits + ItemCacheMisses) |
| ThrottlingRequestCount | # of requests throttled by the node or cluster |
| FailedRequestCount | # of requests that resulted in an error reported |
| EvictedSize | Check whether the working set is increasing or not |
Metrics ä¸è¦§ã«ã¤ãã¦ã¯ãGrafana Dashboard ã«ã¾ã¨ãã¦ãã¾ããããã§ã¯ãDAX ã«éããã¢ããªã±ã¼ã·ã§ã³ã®ç¶æ ãä¸è¦§ã§ãããããªç¶æ ãä½ã£ã¦ãã¾ãããããã¤åå¾ã®ç£è¦ä½å¶æããé害æã®åå åãåãã«å©ç¨ãããã¨ãç®çã§ãã

ã¾ããçå®ãã SLI/SLO ã Grafana Dashboard ã«è¡¨ç¤ºããã¦ãã¾ããããã«ãã£ã¦ãã³ã³ããã¹ããããããªãæ°è¦ã¡ã³ãã¼ã§ãã¢ããªã®æ£å¸¸ç¶æ ã夿ããä¸é·æçãªæ¹åã®è¯ãæªãã®å¤æã«å©ç¨ã§ããç¶æ ã®éæãç®æãã¦ãã¾ãã
ãã®ä»ãgithub.com/prometheus/client_golang 3 ãå©ç¨ããã¢ããªã±ã¼ã·ã§ã³ã®ç¶æ ã Monitoring ãã¦ãã¾ããDAX é¢é£ã§ããã¨ãDAX ã¸ã®åãåããæã« Goroutine/Channel ãå©ç¨ããå®è£ ããã¦ãããããGoroutine ã®æåã GC ã®ç¶æ ãªã©ãåã Dashboard ããé²è¦§ã§ããããã«ãã¦ãã¾ããå¿ è¦ã«å¿ã㦠custom metrics ãè¨æ¸¬ã§ããã®ã§ãCloudWatch metrics ã ãã§ã¯æ¸¬ããªãé ç®ãç£è¦ããã¼ã«å°å ¥ããã®ã容æã§ãã

Grafana ã« CloudWatch Metrics ã表示ãããå ´åãDashboard Template ã Grafana Labs > Dashboard ã«å ¬éããã®ã§ããã¡ããåèã«ãã¦ãã ããããªããä¸ç¹æ³¨æã¨ãã¦ã¯ãGrafana Dashboard 使æã«ããã DAX metrics ã®è£å®æ©è½ã¯ v6.6.0 ã«ã¦è¿½å ããã¦ãã¾ãã
Alerting
Monitoring ã ãã§ã¯ãããæ¥çªç¶ working set ãå¢å ãã¢ããªã±ã¼ã·ã§ã³ãå¿çãã¥ãããªã£ãããGoroutine ãå©ç¨ããå®è£ ä¸å ·åãã memory leak ãçºçãããããã¨ãã£ãäºè±¡ã«æ°ã¥ãã¾ãããããã§ã以ä¸ã® Alerting ãå°å ¥ãã¦ãã¾ãã
CloudWatch Alarm
以ä¸ã® 2 ã¤ã® Metrics ã«ã¤ã㦠CloudWatch Alarm ãè¨å®ãã¦ãã¾ãã
- CPUUtilization
- CloudWatch Metrics ã®å¤ããã®ã¾ã¾å©ç¨
- CacheHitRatio
- ItemCacheHits ããã³ ItemCacheMisses ãå©ç¨ã CloudWatch Metric Math ã«ãã£ã¦è¡¨ç¾
éç¥ã«ã¤ãã¦ã¯ããããããã¿ã¼ã³ã§ããã以ä¸ã®çµè·¯ã§éçºã¡ã³ãã¼ã® Slack channel ã«éç¥ãã¦ãã¾ãã
CloudWatch Alarm --> SNS Topic --> AWS Lambda --> Slack
社å ã§ã¯ Terraform ã«ãã£ã¦ AWS ãªã½ã¼ã¹ã管çããã¦ãããé©ç¨ã®ã¿ä¸é¨ã®æ¨©éãããã¦ã¼ã¶ã¼ã«çµã£ã¦ãããã¡ã³ãã¼ã§ããã°èª°ã§ãé¾å¤ã夿´ãããã追å ã»åé¤ãããã§ããç¶æ ãå®ç¾ã§ãã¦ãã¾ãã
Runbook
ãã¼ã ã§ã¯ Runbook ãæ´»ç¨ããé害çºçæã§ãããªã¼ãã¼ã·ãããæã¤å®è£ è 以å¤ã§ã Scale-up ã Scale-out ãªã©ãã§ãããããªç¶æ ãç®æãã¦ãã¾ãã
Alerting éç¥ã®éã« Runbook URL ãå«ãããã¨ã«ãã£ã¦ãé害çºçæã«ããã¦ããã¸ãã¹å½±é¿ã®ç¢ºèªããé害対å¿ã¾ã§ã®ä¸é£ã®ããã¼ãäºåã«åããç¯å²ã§ããã¥ã¡ã³ãåãã誰ã§ã対å¿ã§ããç¶æ ã®éæãç®æãã¦ãã¾ãã
Alerting ã¨ã¯ãéç¥ã®è¨å®ãããã°ãããã¨ãããã®ã§ã¯æ±ºãã¦ããã¾ãããã¢ããªã±ã¼ã·ã§ã³ã®å¯ç¨æ§ãåä¸ãããäºæ¥ã®æé·ã«è²¢ç®ãã¦ããæå³ãããã¾ããAlerting ãéç¥ãããå¾ãé©åãªã¢ã¯ã·ã§ã³ãã¢ããªã±ã¼ã·ã§ã³ã®ä¿®æ£ãéãã¦åãã¦å質ãåä¸ããã®ã§ãããããã¾ã§ã»ããã§èãã¦ãã¼ã ã«å°å ¥ããªãã¨æå³ãããã¾ããã4
çè ã«ããè¦ãæãåºãããã¾ãã以åã"cookpad storeTV ã®åºåé ä¿¡ãæ¯ãããªã¢ã«ã¿ã¤ã ãã°éè¨åºç¤" ã§ãç´¹ä»ããã¨ãããstoreTV ã®åºåé ä¿¡ãã°åºç¤ã«ããã¦ãã¹ããªã¼ã å¦çã«ããã Best Practices ã«åããé å»¶ãã°ã®æ¤åºããã³ Alerting ã®ä»çµã¿ãæ§ç¯ãã¾ãããç§ãã·ã¹ãã æ§ç¯å¾éç¨ãä»ã®ã¡ã³ãã¼ã«ç§»è²ããå¾ãéç¥ã¯æ¥ããã®ã®ãã·ã¹ãã ã®è¨è¨ææ³ãèæ¯ãã¢ã¯ã·ã§ã³ãã©ã³ãé©åã«å¼ãç¶ãã§ãã¦ããªãã£ããããå²ãçªã¨ãªã£ã¦ãã¾ããè¦ãã¹ãã¨ã©ã¼ãè¦è½ã¨ãã¦ãã¾ããã¨ããç¶æ³ãä½ã£ã¦ãã¾ã£ã¦ãã¾ããã
ãã®æã®åçãæ´»ããã誰ã§ãæä½éã®éç¨ä¿å®ã¯ã§ãããããªãã¼ã ã®æåãç¯ããã¨ãç®æ¨ã«ç½®ãã¾ãããå ·ä½çãªæé ã¯åå¼·ä¼ã§å ±æãããããã®éã®ãããã©ãããç¥è¦ã¨ãã¦å±éããã¨è¡ã£ãåãçµã¿ãåããã¦å±éãã¦ãã¾ããæ´ã«ãä¸åº¦è¨å®ããé¾å¤ãããã¼ã ã®ç¶æ³ãã¡ã³ãã¼ã®ã¹ãã«ãã¢ããªã±ã¼ã·ã§ã³ã®æ§è³ªããµã¼ãã¹ã®æé·ã«ä¼´ã£ã¦æè»ã«åé¤ã»ãã¥ã¼ãã³ã°ã§ãããããMetrics ã®æå³ã®å ±æã¨ç®ç·åããããåå¼·ä¼ãªã©ã®ææ®µãéãã¦è¡ã£ã¦ãã¾ãã
Maintenance
DAX ã§ã¯ãã¡ã³ããã³ã¹æã®ã¤ãã³ãã SNS Topic ã«éç¥ããããã¨ãã§ãã¾ãã
When a maintenance event occurs, DAX can notify you using Amazon Simple Notification Service (Amazon SNS). To configure notifications, choose an option from the Topic for SNS notification selector. You can create a new Amazon SNS topic, or use an existing topic. https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.cluster-management.html#DAX.cluster-management.custom-settings
Scale-up ã Scale-outãTTL ã®å¤æ´ã®ããã® Parameter Group æ´æ°ã«ããåèµ·åæãªã©ãDAX Cluster ã«ä½ããã®æ§æå¤åãããå ´åãã¢ããªã±ã¼ã·ã§ã³ã®ç£è¦ä½å¶ã«å ¥ãå¿ è¦ãããã¾ãããã¡ãã CloudWatch Alarm ã¨é¡ä¼¼ã®ä»¥ä¸ã®æ§æã§éçºã¡ã³ãã¼ã® Slack channel ã«éç¥ãã¦ãã¾ãã
DAX Events --> SNS Topic --> AWS Lambda --> Slack
ã¾ã¨ã
以ä¸ãåºåé ä¿¡ãµã¼ãã¼ã«ããã DAX ã®æ´»ç¨äºä¾ã«ã¤ãã¦ç´¹ä»ãã¦ãã¾ããã
ãã¼ã±ãã£ã³ã°é åã¯ãæè¡çã«ãã£ã¬ã³ã¸ã³ã°ãªèª²é¡ãå¤ãããã¤äºæ¥ã®å£²ä¸è²¢ç®ã«ç´çµãããã¨ãå¤ããé常ã«ã¨ããµã¤ãã£ã³ã°ãªé åã§ããã¾ããã¢ããããã¯ã¼ã¯ã§ã¯ãªããèªç¤¾ã®äºæ¥ã§å°ç¨ã®é ä¿¡ãµã¼ãã¼ã¨ã¦ã¼ã¶ã¼ãã¼ã¿ãä¿æããããããã®äºæ¥ã®é¢ç½ãããããããäºæ¥éçºã«èå³ã»é¢å¿ãé«ã人ã«ã¨ã£ã¦ãæ´»èºã®å¯è½æ§ã大ãã«ããå ´ã§ãã
ã¡ãã£ã¢ãããã¯ãéçºé¨ã§ã¯ãä¸ç·ã«åãã¦ãããã¡ã³ãã¼ãåéãã¦ãã¾ããå°ãã§ãèå³ãæã£ã¦ããã ãããã以ä¸ããã¨ã³ããªã¼ããã¦ãã ããã
-
ãã¬ã¬ã·ã¼ã½ããã¦ã§ã¢æ¹åã¬ã¤ãããã↩
-
d.unImpl()ã§ç¢ºèªã§ãã颿°ã¯ç¾å¨æªå®è£ ↩ -
https://grafana.com/grafana/dashboards/6671 ã« Prometheus client ãå©ç¨ãã¦åå¾ã§ãã Metrics ã® Grafana Dashboard ãã³ãã¬ã¼ããåå¨ãã¾ã↩
-
ãå ¥é ç£è¦ã3 ç« ã«ããç£è¦ã¨ã¯ãããã·ã¹ãã ããã®ã·ã¹ãã ã®ã³ã³ãã¼ãã³ãã®æ¯ãèããåºåã観å¯ããã§ãã¯ãç¶ããè¡çºã§ãããã¢ã©ã¼ãã¯ããã®ç®çãéæããããã®ï¼ã¤ã®æ¹æ³ã§ããç¡ãã®ã§ãããã¨ããã¨ãããAlerting ã¨ããææ®µèªä½ãç®çã¨ãªããªãããã«æèããã↩