ããã«ã¡ã¯ãSRE ã® @chaspy ã§ãã
Quipper ã§ã¯ Application Platform ã¨ã㦠Amazon EKSï¼ä»¥éãEKSï¼ãå©ç¨ãã¦ãã¾ãã*1ããã¾ã§ Cluster ã Upgrade ããéã«ã¯ Blue/Green æ¹å¼ã§è¡ã£ã¦ãã¾ããããä»å Canary æ¹å¼ã§ Cluster ã®åãæ¿ããè¡ãã¾ããã*2
æ¬è¨äºã§ã¯ãAWS EKS ã Canary Switching*3ããæ¹æ³ã説æããã¨ã¨ãã«ãããã«ããæ§ã ãªå©ç¹ãç´¹ä»ãã¾ããAWS EKS ã使ã£ã¦ããã²ã¨ã¯ãã¡ãããPlatform ã® Canary Switching ãæ¤è¨ãã¦ããã²ã¨ã«ãå½¹ç«ã¤ã¨å¹¸ãã§ãã
ãªãããã®ã
ååã®è¨äºã§ç´¹ä»ããéããCluster Switch ã«ã¯ä»¥ä¸ã®2ã¤ã®åé¡ãããã¾ããã
- æ¬çªåãæ¿ãæãCanary Switch ãè¡ããªã
- æ§ã¯ã©ã¹ã¿ãå©ç¨å¯è½ãªç¶æ ã§ãæ°ã¯ã©ã¹ã¿ã®åä½ç¢ºèªãã§ããªã
1ã¤1ã¤è¦ã¦ããã¾ãããã
Canary Switch ãããã¨ã®å©ç¹
ãããèããããã«ãä¿¡é ¼æ§ã®èãæ¹ãæ¯ãè¿ã£ã¦ã¿ã¾ãããã
ä¿¡é ¼æ§ã表ç¾ããããã«ã¯ã以ä¸ã®3ã¤ã®ææ¨ãéè¦ã§ãã
- MTTD: Mean Time to Detect
- MTTR: Mean Time to Resolve
- Impact Users
ä¸è¨ã®ææ¨ãå©ç¨ãã¦ãé害ã®å½±é¿ã¯ä»¥ä¸ã®å¼ã§è¡¨ããã¾ãã*4
Impact Users * (MTTD + MTTR)
Canary Switch ãä½ãå©ãããã¨ããã¨ããã® Impact Users ãæ¸ãããã¨ã«ããã¾ãããããªãå ¨ã¦ã¼ã¶ã«å ¬éãã¦ãã¾ã£ãå ´åãä»®ã«ãããã¼ã«ããã¯ã㦠MTTD + MTTR ãæå°éã«ããã¨ãã¦ãã被害ã®éã¯ããªã大ãããªã£ã¦ãã¾ãã§ãããã
æ§ã¯ã©ã¹ã¿ã¨ç¬ç«ãã¦æ°ã¯ã©ã¹ã¿ã§åä½ç¢ºèªãããå©ç¹
Stagingã»Production åãããCluster ã¯å¸¸æå©ç¨ããã¦ãã¾ããããããBlue / Green Dewployment æ¹å¼ã§åãæ¿ããå ´åãæ°ã¯ã©ã¹ã¿ã«å®éã®ãã©ãã£ãã¯ãä¸åº¦ãæµããã¨ãªãåãæ¿ãããã¨ã«ãªãã¾ããããåãæ¿ãå¾ã«åé¡ãçºçããå ´åããã¼ã«ããã¯ãã¦èª¿æ»ããå¿ è¦ãããã¾ããããã«ã¯æéããããã¾ãããåé¡çºçä¸ã¯ã¯ã©ã¹ã¿ã使ç¨ã§ããªããã¨ãããStaging Cluster ã®å ´å Developer ã® Productivity ãèããä¸ãã¦ãã¾ãã¾ããããã¯å®éã«ååã® EKS 移è¡ã®ã¨ãã«æãã課é¡ã§ãã
Staging ã§ããã"é常å¶æ¥"ãç¶ããªãããæ°ããç°å¢ã§ Developer èªèº«ã«åä½ç¢ºèªããã¦ãããããã¨ã¯åæ¹ã«ã¨ã£ã¦å©çã大ããã§ãããã
ã©ãããã®ã
Cluster ä»ãã® Ingress ãå»æ¢ããCluster ã«ä¾åããªã ALB ãé ç½®ãã¾ããã以ä¸ã®ãããªå³ã®æ§æã«ãªãã¾ãã
Before
以å㯠DNS 㧠loadbalancer ã® record ã対象ã«ãã¦ãã Record ãåãæ¿ãããã¨ã§ã100% ä¸æ°ã«åãæ¿ãã¦ãã¾ããã
After: Canary Switching
ãã®ããã« ALB Weighted Target Groups ã®æ©è½ã使ããPercentage-Based ã§åãæ¿ãããã¨ãã§ããããã«ãªãã¾ãã
å ·ä½çã«ã¯ãManaged Node Groups ãä½ææã«çæããã AutoScalingGroup ã Target Group ã«ç´ä»ããALB ãã㯠Listener Rule ã§ãã® Target Group ã«è»¢éãã¾ãã
åãæ¿ãå㯠100% ç¾ã¯ã©ã¹ã¿ã«åãã¦ãããå¾ã ã«æ°ã¯ã©ã¹ã¿ã¸ã¨ Percentage ãããã¦åãå¤ãã¦ããã¾ãã
After: Testing a new cluster by host-based routing
ã¾ããHost-Based Routing ã®æ©è½ã«ãããç¹å®ã®ãã¹ãåã«åè´ããå ´åã¯æ°ã¯ã©ã¹ã¿ã¸ã«ã¼ãã£ã³ã°ãããã¨ãå¯è½ã«ãªãã¾ããã
ã©ãåãæ¿ããã®ã
aws cli ãã©ããããç°¡å㪠shell script*6 ãæ¸ãã Monitoring ãããªããæåã§åãæ¿ãã¾ããã
#!/bin/bash set -eu # Validate arguments if [ $# -ne 4 ]; then echo "Invalid auguments: $*" echo "Usage: ./modify_weighted_target_group.sh product environment weight1(current) weight2(new)" echo "Example: ./modify_weighted_target_group.sh quipper staging 10 90" echo "NOTE: Please run show_weighted_target_group.sh before running this script." exit 1 fi PRODUCT=${1} ENVIRONMENT=${2} WEIGHT1=${3} WEIGHT2=${4} LOAD_BALANCER_ARN=$(aws elbv2 describe-load-balancers --names k8s-"${PRODUCT}"-${ENVIRONMENT} | jq -r '.LoadBalancers[].LoadBalancerArn') LISTENER_ARN=$(aws elbv2 describe-listeners --load-balancer-arn ${LOAD_BALANCER_ARN} | jq -r '.Listeners[].ListenerArn') TARGET_GROUP_ARN_1=$(aws elbv2 describe-target-groups --load-balancer-arn ${LOAD_BALANCER_ARN} | jq -r '.TargetGroups[0].TargetGroupArn') TARGET_GROUP_ARN_2=$(aws elbv2 describe-target-groups --load-balancer-arn ${LOAD_BALANCER_ARN} | jq -r '.TargetGroups[1].TargetGroupArn') # Sort target group arn # Since the order of target groups returned by describe-target-groups is non-deterministic SORTED_TAGET_GROUP_ARN=$(cat << EOF | sort "${TARGET_GROUP_ARN_1}" "${TARGET_GROUP_ARN_2}" EOF) TARGET_GROUP_ARN=$(echo -e "${SORTED_TAGET_GROUP_ARN}" | head -n1 ) NEW_TARGET_GROUP_ARN=$(echo -e "${SORTED_TAGET_GROUP_ARN}" | tail -n1 ) aws elbv2 modify-listener \ --listener-arn "${LISTENER_ARN}" \ --default-actions \ "[{ \"Type\": \"forward\", \"Order\": 1, \"ForwardConfig\": { \"TargetGroups\": [ { \"TargetGroupArn\": "${TARGET_GROUP_ARN}", \"Weight\": "${WEIGHT1}" }, { \"TargetGroupArn\": "${NEW_TARGET_GROUP_ARN}", \"Weight\": "${WEIGHT2}" } ] } }]"
以ä¸ãå®è¡ãããã¨ã§ 1% ã ãæ°ã¯ã©ã¹ã¿ã«ãã©ãã£ãã¯ãæµãã¾ããDashboard ã確èªããã¨ã©ã¼ã¬ã¼ããããã£ã¦ãªããã¨ã確èªãã¦ãå¾ã ã« Percentage ãããããã¨ãç¹°ãè¿ãã¾ãã
bash modify_weighted_target_group.sh quipper production 99 1
å®éã«ã¯ä»¥ä¸ã®ãããªæµãã§è¡ãã¾ããã
- æ°ã¯ã©ã¹ã¿ã§ãã¹ã¦ã® Kubernetes Resources ã Running ã§ãããã¨ã確èªãã
- 1% ã®ãã©ãã£ãã¯ãæ°ã¯ã©ã¹ã¿ã«æµã
- Datadog Dashboard ã§ã¨ã©ã¼ã¬ã¼ãã確èªãã
- 2 㨠3 ãç¹°ãè¿ãã¦å¾ã ã«ãã©ãã£ãã¯ãå¢ããã¦ãããä»å㯠10%, 20%, 50%, 100% ã®å»ã¿ã§å¢ããã¾ããã
ãã©ãã£ãã¯ã®å¢å ã¯10åãã¨ã«è¡ã£ãã®ã§ãåè¨1æéå¼±ãããã¾ããã
å¦ãã ãã¨: æ¥ä¸ã«ã¯ã©ã¹ã¿åãæ¿ããè¡ããã¨ãã§ãã
ALB Weighted Target Groups ã«ã¯ Weight ã 0-999 ãæå®ã§ãããã¨ãããé常ã®ãã©ãã£ãã¯ã® 0.1% ãã¤ããæ°ã¯ã©ã¹ã¿ã«æµããã¨ãã§ãã¾ããããã«ããå½±é¿ç¯å²ãå°ããæãããã¨ãã§ããå®å¿ãã¦åãæ¿ãããã¨ãã§ãã¾ããã
ããã¾ã§ Blue / Green Deployment å½¢å¼ã§åãæ¿ãããã£ã¦ããæã¯ãã¦ã¼ã¶å½±é¿ãã§ããã ãé¿ããããããã©ãã£ãã¯ãå°ãªãæ·±å¤ã«è¡ã£ã¦ãã¾ãããä»åããã® Canary Switching ã«ãã£ã¦ãæ·±å¤ã§ãªãæ¥ä¸ã§ãååã«å®å ¨ã«åãæ¿ãããããã¨ã«æ°ã¥ãã¾ããã
ãã㯠SRE ãæ·±å¤ç¨¼åããåæ°ãæ¸ãããã¨ãã§ãã¾ããSRE ã®å¥åº·ã¯ãµã¼ãã¹ã®å¥åº·ã§ããå¤ã¯å¯ã¾ãããã
ä»å¾ã®èª²é¡
Metrics ã«ãã èªå Rollout / Rollback ã¾ã§ã¯ç¾ç¶èãã¦ãã¾ããããããããã§ããã°ç´ æµã§ããããããå¤æãã SLI/SLO ãå®ãã§ã¯ãªããã¨*7ãããã¦ã¯ã©ã¹ã¿åãæ¿ãã®é »åº¦ããããã3ã¶æã«1åã¨ãã»ã©å¤ããªãããã§ãã
ã¨ã¯ããããåãæ¿ãããã£ã¦ã¿ãã¨ããã¼ã»ã³ãã¼ã¸ãå¤ãã¦ããã°ããå¾ ã¤ãã¨ããããã»ã¹ã¯1æéããããé¢åã«æãã¾ãããèªå Rollback/Rollout ãå¤æã§ãã Metrics / Alert ãå®ã¾ã£ããææ¦ãã¦ã¿ããã¨èãã¦ãã¾ãã
ãããã«
ä»åãEKS Cluster ã® Upgrade ãããã«å®å ¨ã«è¡ããã¨ãã§ããããã«ãªãã¾ãããProductivity 㨠Reliability ã両ç«ãã "Progressive Delivery"*8 ã Infrastructure Layer ã§å®ç¾ãããã¨ã§ãApplication ã«å¯¾ãã¦ããã¦ãã¦ãå±éããããã"å½ããå"ãªä¸çã«å¤ãã¦ããã¾ãã
Quipper ã§ã¯ä¸çã®æã¦ã¾ã§å¦ã³ãå±ããã仲éãåéãã¦ãã¾ãã
*1:EKS åãæ¿ãã®è©±ã¯Self-Hosted Cluster ãã EKS ã¸ã®ç§»è¡ã¨ Platform ã® Production Readinessãã覧ãã ããã
*2:Kubernetes ã v1.15 ãã v1.17 㸠Upgrade ãã¾ãã
*3:Canary Releaseæ¹å¼ã§ãã¯ã©ã¹ã¿ãåãæ¿ãã¾ãã
*4:å®ç¾©ã¯GOOGLE CLOUD PLATFORM Know thy enemy: how to prioritize and communicate risksâCRE life lessons ãåèã«ãã¦ãã¾ã
*5:Service Router ã¯ã¯ã©ã¹ã¿ã®å ¥ãå£ã«ãããå Kubernetes Service 㸠Routing ãã Nginx ã®ãã¨ã§ããhttps://quipper.hatenablog.com/entry/2020/08/11/migration-to-eks#f-8ab45b30
*6:TARGET_GROUP_ARN ã sort ãã¦ããçç±ã¯ cli ãè¿ã json ã® TARGET_GROUP ã®é çªãé決å®çã ããã§ããQuipper ã§ã¯ Cluster ã«æ°åãã¤ãã¦ããã®ã§ãsort ãããã¨ã«ããå¿ ãå¼æ°ã®ååã§æå®ãããã®ãç¾å¨ã®ã¯ã©ã¹ã¿ãå¾åã§æå®ãããã®ãæ°ããã¯ã©ã¹ã¿ã® Weight ãå¤æ´ããããã«ãªãã¾ãã
*7:Platform SLO ã¯å®ãã¾ããããããã¯ããã¾ã§ Deploy ã®å質ãè©ä¾¡ãããã®ã§ãããã
*8:ã³ã³ããã¼ã©ãã«ã«ãããã¤ããèãæ¹ãCanary Release ã¯ãã® 1 ææ³