ããã«ã¡ã¯ãSRE ã® @chaspy ã§ãã
Quipper ã§ã¯ AWS ä¸ã§ Kubernetes Cluster ãéç¨ãã¦ãµã¼ãã¹ãæä¾ãã¦ãã¾ãã ããã¾ã§ kube-aws ãç¨ã㦠Kubernetes Cluster ã Self Host ãã¦ãã¾ãããããã®ãã³ Managed Services ã§ãã Amazon EKS ã«ç§»è¡ãã¾ãããï¼ä»¥ä¸ã Amazon EKS ã EKS ã¨è¡¨è¨ãã¾ãï¼
æ¬è¨äºã§ã¯ã Kubernetes Cluster ã®ç§»è¡ã§ééããåé¡ãã©ã®ããã«è§£æ±ºãããã説æãã¾ããã¾ããæ°å¤ãã® Application ã稼åãã¦ãã Platform ã移è¡ããéã«ã©ã®ãããªç¹ãèæ ®ããã¨ããã®ããçµé¨ãéãã¦å¦ãã ãã¨ãå ±æãã¾ãã
EKS ã¸ã®ç§»è¡ãæ¤è¨ãã¦ããæ¹ã¯ãã¡ãããPlatform Migration ã«æºããæ¹ã«ã¨ã£ã¦å¦ã³ã«ãªãå 容ã«ãªãã¾ããã幸ãã§ãã
- èæ¯
- èæ
®äºé
- Networking
- kubeconfig ã®é å¸æ¹æ³
- Managed Node Groups ã®å¶éäºé
- ä»»æã® Security Group ãä»ä¸ã§ããªã
- Taint ããµãã¼ããã¦ããªã
- Managed Node Groups ã® Tag ã ASG ããã³ Instance ã«ä¼æããªã
- Tag ã« kubernetes.io ãã¤ãããã®ãä»ä¸ã§ããªã
- Scale to 0 ã§ããªã
- Launch Template ããµãã¼ããã¦ãªã
- Instance Class å¤æ´æ Rolling Update ãããªã
- 移è¡æ¹æ³
- Platform ã® Production Readiness
- ä»å¾ã®èª²é¡
- è¬è¾
- ãããã«
èæ¯
æ¬ç·¨ã«å ¥ãåã«ãèæ¯ã説æããã¦ãã ããã
ãªã Self-Hosted Cluster ãéç¨ãã¦ããã®ã
Kubernetes Cluster ã¸ã®ç§»è¡å½æãEKS ã Tokyo Region ã§å©ç¨å¯è½ã§ã¯ãªãã£ãããã§ãã
ãã¨ã㨠Quipper ã® Platform 㯠Heroku ããã¯ãã¾ããDeis*1 ã¨ãããªã¼ãã³ã½ã¼ã¹ç Heroku ã®ãããªãã®ã AWS ã« Hostããã®å¾ Deis v2 ã¨ãã Kubernetes ãã¼ã¹ã®ãã®ãçµã¦ãKubernetes Cluster ã«ç§»è¡ãã¾ããã
EKS ã Tokyo Region ã§å©ç¨å¯è½ã«ãªã£ãã®ã¯2018å¹´12æã§ãããããã¦ãæã ã Production ç°å¢ã« Platform ã Kubernetes ã«åãæ¿ããã®ã¯ãGlobal ã 2018å¹´9æãæ¥æ¬ã® StudySapuri ã2018å¹´12æã§ããã
EKS ã® GA ã¯2018å¹´6æ ã ã£ãã®ã§ãGlobal ã§ã¯å©ç¨å¯è½ã ã£ããã®ã®ãStudySapuri ã¨ã®è¶³ä¸¦ã¿ãæããå¿ è¦ããã£ãã®ã¨ã移è¡æ¤è¨¼èªä½ã¯ãã®åããè¡ã£ã¦ããã®ã§ãEKS ã¯å½æé¸æè¢ã«å ¥ãã¾ããã§ããã
ãªã EKS ã¸ç§»è¡ããã®ã
Control Plane ã®ç®¡çã³ã¹ãã¨ãCluster Switching ã³ã¹ãã®æ¸å°ããããªçãã§ãã
Kubernetes Cluster ã Self Host ããã¨ãããã¨ã¯ãControl Plane ããã³ etcd ãèªåãã¡ã§ç®¡çããå¿ è¦ãããã¾ãã幸ããProduction ç°å¢ã§ Control Plane èµ·å ã§é害ãçºçãããã¨ã¯ããã¾ããã*2åé¡ãçºçããã¨ãã«èªåãã¡èªèº«ã§å¯¾å¦ããå¿ è¦ãããã¾ããManaged Service ã«ç§»è¡ããã°ãããããç§ãã¡ãèããå¿ è¦ã¯ãªããªãã¾ãã
Deploy 㯠kube-aws ã§èªååããã¦ãããããã¾ã§å¤§å¤ã¨ããããã§ã¯ãªãã«ãã¦ãããã¯ãã©ããªã«æ £ãã¦ã1æ¥ãããã¯ããã£ã¦ãã¾ãã¾ãããã®æéãå°ãã§ã軽æ¸ã§ããã°è¯ãã¨èãã¦ãã¾ããã
ã¨ã¯ãããä¸è¨ãéã¿ã¦ããå®éã®æ¸©åº¦æã¯ãå°ã£ã¦ã¯ããªãããããã§ããã*3ã§ã¯ãªãä»å移è¡ã決ãããã¨ããã¨ãkube-aws ã® Kubernetes ã®ãµãã¼ããã¼ã¸ã§ã³ã¸ã®è¿½éã¹ãã¼ãã§ããv1.16 ãå¾ ã£ã¦ããéã« v1.15 ã EOL ã«ãªã£ã¦ãã¾ã£ããã¨ãæçµçãªæ±ºå®æã«ãªãã¾ããã*4
Kubernetes ã¯3ã¶æã«1度 Minor Version ããããã3 Minor Version ãããµãã¼ãããã¾ããããã®ãããUpgrade ãç´ æ©ãã§ããããã«ãªãã¨ãããã¨ã¯é常ã«éè¦ã§ããEKS ã¸ç§»è¡ãããã¨ã§ãUpgrade ãããããããªãã ããã¨ããçããããã¾ããã
èæ ®äºé
ç¶ãã¦ãEKS ã¸ç§»è¡ããä¸ã§èæ ®ãã¹ãäºé ã«ã¤ãã¦è©±ãã¾ãã
Networking
ãã£ã¨ã大ããªéã㯠Pod Networking ã§ããããEKS ã§ã¯ Amazon VPC Container Network interface(CNI) plugin for Kubernetes ãå©ç¨ãã¾ããããã«ãããPod ãã¨ã« Worker Node ãå±ãã Subnet ã® IP Address ãæ¶è²»ãã¾ãã
Quipper ã§ã¯ VPC ã® IPv4 CIDR Block ã /16 ã§åã£ã¦ããããã®ä¸ã§ Worker Node ãä» EC2 Instance ãå©ç¨ãã¦ãã Private subnet 㯠/24 ã§ãAZ ãã¨ã«åå¨ãã¦ãã¾ãã
ä»®ã«æ¢åã® Private Subnet ããã®ããã«ãªã£ã¦ããã¨ãã¾ãã
cidr | first | last | total | AZ |
---|---|---|---|---|
10.10.1.0/24 | 10.10.1.0 | 10.10.1.255 | 256 | AZ1 |
10.10.3.0/24 | 10.10.3.0 | 10.10.3.255 | 256 | AZ2 |
10.10.5.0/24 | 10.10.5.0 | 10.10.5.255 | 256 | AZ3 |
Private Network å ¨ä½ã§ 763 åã® IP Address ãæã¤ãã¨ãã§ããããã¯ç¾ç¶ã ã¨ååãªæ°ã§ãã
ãããã1 Pod ããã 1 IP Address ãæ¶è²»ããã¨ãªã£ãå ´åãæããã«æ¯æ¸ãããã¨ããããã¾ãããéã« Pod ã®æ°ã調ã¹ã¦ã¿ã¦ããã©ã®ã¯ã©ã¹ã¿ã 1000 ã¯ä½è£ã§è¶ ãã¦ããããã§ãã
Cluster ãã¨æ°è¦ã® VPC ãæ§ç¯ãããã¨ãé¸æè¢ã¨ãã¦ãããã¾ããããæ¢åã® VPC CIDR Block ã«ã¯ååä½è£ããã£ããã¨ãããä¸è¶³ããå ´å㯠VPC CIDR Block ãæ¡å¼µå¯è½ã§ãããã¨ãããEKS ç¨ã«æ°è¦ã® subnet ãä½æãããã¨ã«ãã¾ããã
ä½ã£ã¦ããé åããã以ä¸3ã¤ãåãåºãã¾ããã
cidr | first | last | total | memo |
---|---|---|---|---|
10.10.0.0/18 | 10.10.0.0 | 10.10.63.255 | 16384 | used |
10.10.64.0/18 | 10.10.64.0 | 10.10.127.255 | 16384 | new |
10.10.128.0/18 | 10.10.128.0 | 10.10.191.255 | 16384 | new |
10.10.192.0/18 | 10.10.192.0 | 10.10.255.255 | 16384 | new |
ããã«ãã£ã¦ãåè¨ 49152 åã® Pod æ°ã許容ãããã¨ãã§ãã¾ããããã¯ç¾ç¶ã® Pod æ°ã®æ°åå以ä¸ã§ãããã¨ãããå°ãªãã¨ããã°ããã¯æ¯æ¸ããªãã¨å¤æãã¾ããã
kubeconfig ã®é å¸æ¹æ³
Kubernetes ã®èªè¨¼ã¯ãaws-iam-authenticator ãç¨ãã¦ãã¾ãããå IAM ã¦ã¼ã¶ã¯å±ããã°ã«ã¼ããã¨ã«ãKubernetes ã® Cluster Role ã«å¯¾å¿ãã Policy ã Assume ãããã¨ã§èªè¨¼ãè¡ã£ã¦ãã¾ãã
ã¯ã©ã¹ã¿ãå¢æ¸ãããã³ã« kubeconfig ãçæããS3 ã«é ç½®ãã¾ããããã¦ãkubectl ã Install ãã script ãå®è¡ãã㨠kubeconfig ã update ãããä»çµã¿ãæ§ç¯ãã¦ãã¾ããããã®ã¿ã¤ãã³ã°ã§ Slack ä¸ã«ä»¥ä¸ã®ããã«éç¥ããã¦ãã¾ããã
ããããä»åã¯ãã®æ¹æ³ãè¸è¥²ãããaws cli ã® eks update-kubeconfig ã使ããã¨ã«ãã¾ãããçç±ã¯ããã·ã³ãã«ã«ãªãããã§ãã
ãã¦ãQuipper ã§ã¯ã¯ã©ã¹ã¿ãã¨ã«3種é¡ã® Role ãåå¨ãã¦ãã¾ããã³ã³ããã¹ãã¨åããã¦è¡¨ç¤ºããã¨ä»¥ä¸ã®ããã«ãªãã¾ãã
Context | ClusterRole | role |
---|---|---|
cluster-name-admin | quipper-admin | KubernetesAdminProduction |
cluster-name-app-admin | quipper:app-admin*5 | KubernetesAppAdminProduction |
cluster-name-viewer | quipper-viewer | KubernetesViewerProduction |
ã¨ããããã§ããã®ãããªãã¨ãå®ç¾ãã kubeconfig ãçæããããã«ã以ä¸ã®ãããªã³ãã³ããå©ãã¦ãããã°ããã¨æã£ãã®ã§ããã
aws eks update-kubeconfig --name CLUSTER_NAME --role-arn ROLE_ARN --alias ALIAS
update-kubeconfig ã¯ãkubeconfig å ã® "cluster" ã1ã¤ããæããªãç¹ãåé¡ã§ãããã¤ã¾ãåä¸ã¯ã©ã¹ã¿ã«è¤æ°åå®è¡ããã¨ãã¦ãã1çªæå¾ã®ãã®ã ããæ®ã£ã¦ãã¾ãã¾ãã
èããçµæãå IAM Group ã«å¯¾ãã¦ã使ç¨ãã Context ãéçã«æ±ºãã¦ãã¾ããã¨ã«ãã¾ããã
ããã¾ã§ã¯ä¸å¾ã«ã¯ã©ã¹ã¿ã«å¯¾ãã¦ä¸è¨ 3 Role ã® kubeconfig context ã追å ããã¦ãã¾ããããä¾ãã° Developer 㯠admin ã使ãã¾ããããProduction Cluster ã®å ´å㯠Viewer ã®ã¿ã§ãããã¡ãããå©ç¨ç¨éã«åããã¦ãæ´æ°æä½ãããªãã¨ãã¯å±éºé²æ¢ã®ãã Viewer ã使ããã¨ããã®ãé©åã§ã¯ããã¾ããããã®ãããªä½¿ãæ¹ããã¦ããã²ã¨ã¯å¤æ°æ´¾ã§ã¯ãªãããã§ããããã®ãããããã·ã³ãã«ã«ã§ãããã®æ¹æ³ãé¸æãã¾ããã
ã¨ã¯ãããæ¯å Cluster ã®ååã¯å¤ãã£ã¦ãã¾ãã¾ãããAssume ãã RoleArn ã Alias åãªã©ãã¯ã©ã¹ã¿ãå¢æ¸ãããã³ã«ãã¡ãã¡èãã¦ãããªãã®ã§ãupdate-kubeconfig ã®ã©ããã¼ãã¼ã«ã Go ã§æ¸ãã¾ãããä»æ§ã¨ãã¦ã¯
- 使ç¨ãã IAM User ã®æå± Group ãã Assume ãã Role ãåå¾ãã
- ç¾å¨ Ready ã®ã¯ã©ã¹ã¿ã«å¯¾ã㦠eks update-kubeconfig ãå®è¡ãã
ã¨ãããã®ã«ãªã£ã¦ãã¾ããã¯ã©ã¹ã¿ã Ready ãã©ãã㯠EKS ã® Tag ã« "quipper/ready" ã¨ããé ç®ãç¨æãã¦å¤æãã¦ãã¾ããã¾ããã¯ã©ã¹ã¿åºæã®ã·ãªã¢ã«çªå·ãªãã®ãææ°ã®ã¯ã©ã¹ã¿ã示ã Context ã®ããã«ã"quipper/latest" ã¨ãã Tag ãåæ§ã«ç¨æãã¦ãã¾ãã
ä½è«ã§ãããQuipper SRE Team ã§ã¯ 100è¡ãè¶ ãã Programming ãè¡ãå ´å㯠Go ã Standard ã¨ãã¦ãã¾ãããã㯠Kubernetes ã Terraform ãªã©ãSRE ãå¯æ¥ã«é¢ãã OSS 㯠Go 製ã®ãã®ãå¤ããããã³ã¼ããèªãã ããupstream ã«ããããæããããPlugin ãæ¸ãããããéã« Go ãæ¸ãããã¨ã¯æçã ããã§ããã¾ããç¾å¨ã® SRE Team ã«ã¯ Go ãå¾æãªã¡ã³ãã¼ãå¤ããã¨ããã®çç±ã®1ã¤ã§ããä»åãèªåèªèº«ãï¼ç°¡åãªãã¼ã«ã§ã¯ããã¾ããï¼ä»äºã§ Go ãæ¸ããæ©ä¼ãæã¦ã¦è¯ãã£ãã¨æãã¾ãã
Managed Node Groups ã®å¶éäºé
ä»åããã¾ãæ·±ãèããã« Managed Node Groups ãæ¡ç¨ãã¾ãããã¡ãªããã¨ãã¦ã¯ Node Group ã® AMI ã®ã¡ã³ããã³ã¹ãããªãã¦ãããã¨ãããããã¾ããããããç¾ç¶ããªãå¶ç´ãå¤ããå°å ¥ã«é常ã«è¦å´ãã¾ããã
ä»»æã® Security Group ãä»ä¸ã§ããªã
åæã¨ãã¦ãQuipper ã§ã¯ VPC å ã® EC2 Instance ã RDS ã« "Default" ã¨ããç¹å¥ãª Security Group (ä»¥ä¸ SG)ãæããã¦ãããDefault SG 㯠Default SG ããã®éä¿¡ã許å¯ãã¾ãã
ããã¾ã§ã Worker Node ã«ã¯ Default SG ãæããããã¨ã§ä»ã®ã¤ã³ã¹ã¿ã³ã¹ã RDS ã¨éä¿¡å¯è½ã§ããããManagred Node Group ã§ã¯ãããã§ãã¾ããã
ããã«ã¤ãã¦ã¯ãEKS Cluster ä½ææã«çæããã Cluster SG ããDefault SG ã¸ã®éä¿¡ã«è¨±å¯ãããããã«ãã¾ããã
Taint ããµãã¼ããã¦ããªã
ããã¾ã§ã¯ Taint ãç¨ã㦠Node Group ãã¨ã® Schedule ãå¶å¾¡ãã¦ãã¾ãããç¹ã«æå®ããªããã®ã¯ Default ã® Node Group ã« Schedule ããã¾ãããTaint ããµãã¼ããããªããã¨ããããã¹ã¦ã® Deployment ã«æ示çã« Node Affinity ãæå®ããå¿ è¦ãããã¾ããã
ã»ã¼å ¨ã¦ã®ãµã¼ãã¹ã« Node Affinity ãã¤ãã YAML ããã½ã½è·äººã®ä»äºã¯ãªããªãã«ããã©ãã£ãã§ãã
Managed Node Groups ã® Tag ã ASG ããã³ Instance ã«ä¼æããªã
Quipper ã§ã¯ç£è¦ã« Datadog ã使ã£ã¦ãã¾ããAWS ã®ãªã½ã¼ã¹ã®ã¿ã°ã¯åºæ¬çã«ãã®ã¾ã¾ Datadog ã§ãæ´»ç¨ã§ãã¾ãããããããã®ã¿ã°ãä¼æããªããã¨ã§ãããã¾ã§ã§ãã¦ãã Monitor ããã®ã¾ã¾ä½¿ããªãã£ãããDashboard ããã®ã¾ã¾å©ç¨ã§ããªãã¨ããåé¡ãããã¾ããã
ããã«é¢ãã¦ã¯ @d-kuro ã Datadog Agent å´ã§ããã¾ã§ä½¿ã£ã¦ããã¿ã°ãä»ä¸ãããã¨ã§åé¿ãã¾ããããããã¨ããããã¾ãã
[EKS] [request]: Nodegroup should support tagging ASGs · Issue #608 · aws/containers-roadmap · GitHub [EKS]: EKS Cluster Tagging Propagation · Issue #374 · aws/containers-roadmap · GitHub
Tag ã« kubernetes.io ãã¤ãããã®ãä»ä¸ã§ããªã
Quipper ã§ã¯ Cloud Logging (æ§ Stackdriver Logging) ã« Container ã® Log ãéãããã«ãfluentd-gcp ã使ã£ã¦ãã¾ãã
ãã¦ã³ãã¼ããã YAML ã apply ããã¨ãbeta.kubernetes.io/fluentd-ds-ready=true
ã Node Selector ã¨ãªã£ã¦ãã¾ããããããkubernetes.io
ã¯äºç´ããã¦ãããManaged Node Groups ã«ä»ä¸ãããã¨ãã㨠validation ã§å¼¾ããã¦ãã¾ãã¾ããkubectl label
ã³ãã³ãã§ä»ä¸ãããã¨ã¯ã§ãããã®ã®ãNode ã¯å¸¸ã«å¢æ¸ãããããNode Group å´ã§ä»ä¸ããã¦æ¬²ããã§ãã
åºæ¬çã«ããã㯠DaemonSet ã§åããããApply ããåã«ãã® Node Selector ãåãé¤ã対å¿ãè¡ãã¾ããã
åæ§ã«ãkubectl get node ã§è¡¨ç¤ºããã Role ã node-role.kubernetes.io
ã¿ã°ããããã®ã§ãRole ã表示ããã¾ãããä¸ä¾¿ããã
Scale to 0 ã§ããªã
minimum size ã 1 ã®ãããä¸è¦ãªã¨ãã«å®å ¨ã«æ¸ããã¦ãã¾ã£ãããManaged Node Groups èªä½ã®å ¥ãæ¿ãããåãªã縮éã ãã§ã¯è¡ãããDrain ãããããManaged Node Groups ãã®ãã®ãåé¤ãã¦ãã¾ããªãã¨ã§ãã¾ããã
[EKS] [request]: Managed Nodes scale to 0 · Issue #724 · aws/containers-roadmap · GitHub
Launch Template ããµãã¼ããã¦ãªã
以åã¾ã§ã®ã¯ã©ã¹ã¿ã§ã¯ãKernel Parameter ã«å¤æ´ãå ãã¦ãã¾ããã幸ãæ¬çªç§»è¡å¾ã¯ãããã«é¢ããåé¡ã¯çºçãã¦ãã¾ããããä»å¾çºçããªãã¨ãéããªãã®ã§ãèªç±åº¦ããã£ãã»ããæã¾ããã¨æãã¾ãã
[EKS] Managed Node Groups Launch Template Support · Issue #585 · aws/containers-roadmap · GitHub
Instance Class å¤æ´æ Rolling Update ãããªã
ãããã§ããã°ããã¦ã»ãããã®ã§ãã
ãããã®å¶éäºé ãäºåã«æ³å®ã§ãã¦ããªãã£ãããã移è¡æ¤è¨¼ä¸ã«ããªãã¦ãããã¾ãããçµãã£ã¦ãã¾ã£ãä»ããããã¨ãç¹ã« Managed Node Group ãããããã¢ããã¼ã·ã§ã³ã¯ãªãã®ã§ãããSpotInstance ãæ´»ç¨ããããã¨ããè¦æ±ãé«ã¾ã£ãå ´åã¯ãããå¯è½æ§ãããã¾ãã
ä»ã§ãã©ã¡ããé¸ã¶ã¹ãã ã£ããã¯æ©ã¿ã¾ãããå°ãªãã¨ãããå°ãã ãæéãããã¦2ã¤ã®ãªãã·ã§ã³ãæ¯è¼ãã¦ç¸è«ããããcontainers-roadmap ãè¦ãã¦ããã¹ãã ã£ããªã¨åçãã¦ãã¾ãã
移è¡æ¹æ³
ä»åã大ãã以ä¸ã®æµãã§æ¤è¨¼ãè¡ãã¾ããã
- Self-Hosted Cluster ãã EKS Cluster ã®ç§»è¡(Staging)
- EKS Cluster ã®ç§»è¡æ¤è¨¼(Staging)
- Self-Hosted Cluster ãã EKS Cluster ã®ç§»è¡(Production)
ãã®ããã»ã¹ãçµããã¨ã§ãDeveloper ã« kubeconfig ã® update ã試ãã¦ãããã¨ã¨ãã«ã移è¡æã®åé¡ç¹ãæ´ãåºãã¾ããã
ã¾ããåã¯ã©ã¹ã¿ã®åãæ¿ãã«é¢ãã¦ã¯ã移è¡æ¤è¨¼ãè¡ããã¨ã§æé ã確ç«ãã¾ããã
- Create a new cluster with Terraform
- Deploy Cluster-Level Kubernetes Resources
- Backup and Restore application pods with velero
- Update deploy definition of monorepo
- Switch DNS record for service-router(branch-router)*6
- Remove the cluster definition at a kubernetes-clusters repository
- Destroy an old cluster with Terraform
3ã¤ç®ã® Backup and Restore application pods with velero
ã§ãããvelero ãç¨ã㦠Application Resource ã®ããã¯ã¢ããã¨ãªã¹ãã¢ãè¡ã£ã¦ãã¾ãããã®æ¹æ³ã¯ä»¥åãã @d-kuro ãå°å
¥ãã¦ããæ¹æ³ã§ããä»åãå¼ç¶ããããã¦ãvelero ã® Install ãã³ã¼ãåããã¨ã¨ãã«ãæé ã Script åãããã¨ã§ãã容æã«è¡ããã¨ãã§ãã¾ããã
å®éã«åãæ¿ããéã¯ãã¢ã¯ã»ã¹ã®å°ãªãæé帯ã«ãDNS ã«ããåãæ¿ããè¡ãã¾ããããã®æ¹æ³ã¯ä»¥åã¨åæ§ã§ããããã¨ã©ã¼ã¬ã¼ããæ¥ä¸æããã¨ãã¯ããã« Revert ããããã¨ã§æ§ã¯ã©ã¹ã¿ã«æ»ããã¨ãã§ãã¾ãã
ãã®ããã«ãååãªç§»è¡æéãæã£ã¦äºåã«åé¡ãæ½°ãããã¨ã¨ãã«ãæ¬çªã§ã®åãæ¿ãæãããã«æ»ããæºåããã¦æã¿ãç¡äºåãæ¿ããå®äºãã¾ããã
Platform ã® Production Readiness
ãã¦ãä»åç¡äºã« Platform ã®ç§»è¡ãçµãããããã¨ãã§ãã¾ãããããã¾ã§ããéå»ã®çµé¨ãããªãã¨ãªããPlatform ã®ç§»è¡ãå®å ¨ã«è¡ãæ¹æ³ãèãã¦å®è¡ãã¦ãã¾ããããããæ©ä¼ãªã®ã§ãPlatform ã® Production Readiness ã«ã¤ãã¦ãããè¨èªåãã¦ã¿ããã¨æãã¾ãã
ãã¨ãã¨ãæ°è¦ãµã¼ãã¹ï¼Applicationï¼ã«é¢ãã¦ã¯ãProduction Release ã®åã« Design Doc ã Production Readiness Checklist ã¨ããããã»ã¹ãããããæ¬çªå¯¾å¿ãã®æ¹æ³ã¯åºã¾ã£ã¦ãã¦ãã¾ããä»åããã¡ããåç §ã«ããªããèãã¦ã¿ã¾ããã
Service Level
ãµã¼ãã¹ãåã Platform ã«ã¯ããèªä½ã«ã Service Level ã®å®ç¾©ãå¿ è¦ã§ãã
ä»åã移è¡æ¤è¨¼éä¸ã§ Deploy ããããã® CI ã® Workflow ãããªãã®ç¢ºçã§å¤±æããäºè±¡ãçºçããæ¥é½ Platform SLO ãè¨å®ãã¾ããã以ä¸ã SLI ã§ãã
- 99% ã® Job Success Rate: å·®åæ¤ç¥ã® Job *7
- 99% ã® Job Success Rate: Deploy ã® Job
ãããããCI ãã Kubernetes Cluster ã® API ãå®è¡ãããã®ã§ããããã¤ãã©ã® Workflow ã§ãå¿ ãéã Job ã§ãããã¨ããããããã SLI ã¨ãã¦é¸æãã¾ããã
æè¿åºãã°ããã® Datadog ã® SLO Error Budget Alert ãæ´»ç¨ããããããä¸åã£ãã¨ãã«ã¯åå 調æ»*8ã¨æ¹åãè¡ãããã«ãã¦ãã¾ãã
ãã®ãããªææ¨ããããã¨ã§ãPlatform ä¸ã§éçºãã Developer ããä½ãæè¿ãã失æãã㪠ð¤ãã§çµãããã¨ãªããFact Based ã« SRE ã¨ååãã¦ææ¨ã®æ¹åã«åãããã¨ãã§ãã¾ãã
ã¾ãã大åæã¨ãã¦ãPlatform ä¸ã§åããµã¼ãã¹ã® SLO ãè¨å®ããããã¡ãã¨éç¨ããã¦ãããã¨ã大åã§ãã*9Platform ã®å½±é¿ã§ãµã¼ãã¹ã® SLO ãéåããå ´åã«ããæ°ä»ãããã¨ã¯ãPlatform ã® Migration ãè¡ãéã«å¤§ããªå®å¿æãä¸ãã¦ããã¾ããã
Monitoring / Logging
ããã㯠Platform ã«ããã¦ãéè¦ã§ããSLI/SLO ã¨ãã¦è¨å®ããé ç®ã¯ãã¡ãããä½ãè¨æ¸¬ãã¹ãã§ãä½ã Dashboard ã«è¡¨ç¤ºãã¹ããªã®ããå¿ è¦ãªãã°ã¯æ°¸ç¶åã¹ãã¬ã¼ã¸ã«éããã¦ãã¤ã§ãè¦ããããã«ãªã£ã¦ããã®ããKubernetes ä¸ã§åã Application ã®ããã«ç»ä¸ãªè¡¨ç¾ã¯é£ããããããã¾ããããã©ã㪠Platform ã§ããã®è¦³ç¹ã¯å¿ è¦ãªã¯ãã§ãã
Migration
å®éã« Platform ã移管ããå ´åã¯ãPlatform ã¨ããã ããã£ã¦ããã®å½±é¿ç¯å²ã¯åºãããªã¹ã¯ãé«ããã¨ãå¤ãã§ããããããã¦ãã©ãã»ã©æºåãéãã¦ããæ³å®å¤ã®ãã¨ãèµ·ãããã®ã§ãããã®å ´åãã©ããã£ããã¨ãæ¤è¨ããã°ããã§ããããã
Staging ã§ååãªæééç¨ãã
ä»åãEKS ã«ç§»è¡ãã¦ãªãã ããã 1ã¶æéããã㯠Staging ç°å¢ã§éç¨ãã¾ãããQuipper ã§ã¯é±ã«1度 Weekly Release ã®ããã® Regression Test ãè¡ããã¦ãã¾ããæ°ãã Platform ä¸ã§ããããä½åº¦ãéãã¦ãããã¨ã¯é常ã«å®å¿ã§ãã¾ããã¾ããéç¨é¢ã§ããããç¨åº¦ã®æé Developer ã«ä½¿ã£ã¦ããããã¨ã§ãåé¡ã®è¦è½ã¨ããé¿ãããã¨ãã§ãã¾ãã
åé¡ããã£ãã¨ãããã«åãæ»ããã¨ãã§ãã
ããã1çªå¤§äºããããã¾ããã
ãã®ããã«ã¯å¤§åæã§ãããã¤ã³ãã©ã®å¤æ´ãã³ã¼ãã§ç®¡çããã¦ãã¦ãRevert PR ãããã«åºã㦠Apply ã§ããå¿ è¦ãããã¾ãã
æ°æ§ã®åä¸æ§ãä¿è¨¼ãã
ä½ããã®æ段ã§ãæ°æ§ Platform ã®åä¸æ§ãä¿è¨¼ã§ããã¨ããèªä¿¡ãæã£ã¦ãªãªã¼ã¹ãã§ããã¨æãã¾ãã
ãã®çãã®å¤§åã¯è¨å®ã®ã³ã¼ãåã§è§£æ±ºã§ããã¨ã¯æãã¾ãããäºæããªãç¹ã§å·®åãçãããã¨ãããããããã¾ããã
å¯è½ã§ããã°ç¾ç¶ã®è¨å®ããå®éã® Platform ãã dump ã㦠diff ãåããã®ãçæ³çã ã¨æãã¾ãã
ä»å㯠Cluster Level ã® Resource ã«é¢ãã¦ã¯ CI ã§åä¸ã®ãã®ãApply ããä¸ã§ãkubectl ã§åå¾ãããªã½ã¼ã¹ãåãæ°ãããã©ãããç°¡åã«ç¢ºèªãã¾ããã
ã¾ããApplication ã«é¢ãã¦ã¯åè¿°ããããã« velero 㧠backup/restore ãè¡ããã¨ã§åä¸æ§ãä¿è¨¼ãã¾ããã
ä»å¾ã®èª²é¡
æå¾ã«ãPlatform ã«é¢ããä»å¾ã®èª²é¡ã«ã¤ãã¦è¿°ã¹ã¾ãã
System Component ã® GitOps å
Cluster Level ã§é©ç¨ãã System Componentï¼i.e. Datadog, RBAC, Ingress, Fluentd, ClusterAutoscaler etc.)ã¯ã1ã¤ã®ãªãã¸ããªå ã§ãshellscript 㨠kustomize ã§å·®åãå±éããã®ã¡ apply ãã¦ãã¾ãã
ãããããã® Script ãååã«å¤§ãããªã£ã¦ãã¦ãããã¡ã³ããã³ã¹ãé£ããç¶æ ã«ãªã£ã¦ãã¦ãã¾ããã¾ããSystem Component ã® Version Up ã«é¢ãã¦ããæ°ã¥ããã¨ãã«æ°ã¥ããã²ã¨ããããã¨ããç¶æ³ã«ãªã£ã¦ãã¾ãã
ç¾å¨ã@d-kuro ã Application ã«é¢ã㦠ArgoCD ã«ãã GitOps åãé²ãã¦ããããããæ©ä¼ã« System Component ã«é¢ãã¦ã GitOps åãé²ããäºå®ã§ãã
Multi-Cluster Support
åºæ¬çã« Quipper ã§ã¯ã¯ã©ã¹ã¿ã®åãæ¿ã㯠Blue/Green Deployment æ¹å¼ãæ¡ç¨ãã¦ãããDNS ã§åãæ¿ãã¦ãã¾ããããã«ããåé¡ããã£ãã¨ãã«ããåãæ»ããããã«ãªã£ã¦ãã¾ãã
ãªããã®ãããªæ¹å¼ãåã£ã¦ãããã¨ããã¨ãå段㫠Internet Facfing ã® Reverse Proxy ãåããã®ã¡ãALB Ingress ãéãã¦ã¯ã©ã¹ã¿ã¸éä¿¡ãã¦ããããã§ãã
ç¾ç¶ã以ä¸ã®2ç¹ã課é¡ã«æãã¦ãã¾ãã
- æ¬çªåãæ¿ãæãCanary Release ãè¡ããªã
- æ§ã¯ã©ã¹ã¿ãå©ç¨å¯è½ãªç¶æ ã§ãæ°ã¯ã©ã¹ã¿ã®åä½ç¢ºèªãã§ããªã
æ£ç¢ºã«ã¯ããããã"ã§ããªã"ããã§ã¯ãªãã®ã§ããã
- ç¾ç¶ã 㨠Route53 Weighted Routing ã¨ããé¸æè¢ãããã¾ãï¼æªæ¤è¨¼ï¼ãã§ãããProxy Layer 㧠Percentage Base 㧠Traffic Splitting ãã§ããã»ããããã³ã³ããã¼ã©ãã«ã ã¨æãã¾ãã
- ä¾ãã° learn-exp.quipper.com ã¨ãããµããã¡ã¤ã³ã®å ´åã¯æ°ã¯ã©ã¹ã¿ã«ã¢ã¯ã»ã¹ãããããã¨ããå ´åãReverse Proxy ã® config ãå Virtual Host ãã¨ã«å¤æ´ããå¿ è¦ããããå°ãé¢åã§ãã
ãããã®èª²é¡ã解決ããããã«ã¯ãALB Ingress ãå¤ããKubernetes 管çã§ãªã ALB ãã両ã¯ã©ã¹ã¿ã¸ã® Proxy ãã³ã³ããã¼ã«ã§ããããã«ãããã¨èãã¦ãã¾ãã
è¬è¾
Kubernetes ã«é¢ãã¦é常ã«è±å¯ãªç¥èãæã¡ãããã¾ã§ãã¯ã©ã¹ã¿ã®åãæ¿ããä½åº¦ãè¡ã£ãçµé¨ãããå¤æ°ã®ã¢ããã¤ã¹ãããã @d-kuro ã«å¿ããæè¬ãã¾ããæ¬çªç°å¢ã§ã®åãæ¿ãã«ãç«ã¡ä¼ã£ã¦ããã¦ãããã¨ããããã¾ãããå½åãã®ã¿ã¹ã¯ã¯ @d-kuro ãè¡ãæ³å®ã§ã¯ããã¾ããããã¯ã©ã¹ã¿åãæ¿ãã®çµé¨ããªãåãè¡ããã¨ã§ãï¼æéã¯ããã£ã¦ãã¾ã£ããã®ã®ï¼ãã¼ã ã¨ãã¦ã¯ Knowledge Transfer ãè¡ããã¨ã¨ãã«ã彼㯠GitOps ã«ãã CI/CD åé¢*10ã«æ³¨åãããã¨ãã§ãã¦çµæçã«ã¯ããã£ãã¨æãã¾ãã
æ¯é±ã® Meeting ã§ã¢ããã¤ã¹ãããã @yuya-takeyama, @d-kuro ã«å¿ããæè¬ãã¾ããæè¿ã® SRE Team ã§ã¯äººæ°ãå¢ãããã¨ã«ãããããã¸ã§ã¯ãå¶ãåã£ã¦ããããªã¼ã1åã¨ããããµãã¼ããã1ã2åã®å°äººæ°ã§åé¡ã«ç«ã¡åããä½å¶ãåã£ã¦ãã¾ããä»åã¯ãã®ä½å¶ãé常ã«ãã¾ããã£ãä¾ã ã¨æã£ã¦ãã¾ãã
èªè¨¼ã®æ¹éããEKS ã«ãããæ¸å¿µäºé ãªã©ç¸è«ã«è¼ã£ã¦ããã ããæè¡é¡§åã® @mumoshu ã«å¿ããæè¬ãã¾ãã
kubeconfig ã®æ´æ°ã®ããã®ã©ããã¼ãã¼ã«ã®ã¬ãã¥ã¼ããã¦ããã Go Lover ã® @pankona 㨠@suzuki-shunsuke ã«å¿ããæè¬ãã¾ãã
æ¥é ããããããªé¢ã§ãµãã¼ããã¦ããã SRE Team ã®ã¿ããªã«å¿ããæè¬ãã¾ãã
Thanks @global-web-developers to give me a feedback for the migration.
å人活åã§ã¯ããã¾ãããsre.fm#2 ã«ã²ã¹ãã¨ãã¦åå ãã¦ããã Wantedly ã® @koudaiii ã¨ãµã¤ãã¦ãºã® @_a0i ã«å¿ããæè¬ãã¾ãããã®ã¨ã Platform ã® Production Readiness ãèãããã¨ã§ä»åã®çµé¨ã«æ´»ãããã¨ãã§ãã¾ããã
ãããã«
Managed Service ã§ãã EKS ã«ç§»è¡ãããã¨ã§ãControl Plane ã®ç®¡çãä¸è¦ã«ãªããããã« Kubernetes Cluster ã®è¿ é㪠Upgrade ãå¯è½ã«ãªãã¾ããããã®ãã¨ã¯ Platform ãä»å¾ããé²åããã¦ããããã®éè¦ãªç¬¬ä¸æ©ã«ãªã£ãã¨èãã¦ãã¾ãã
ä»å¾ã SRE Team ã¯ãããã¯ããã¼ã ã® Productivity ãçéã«ãã¤ã¤ãReliability ãæ ä¿ã§ããæé«ã® Platform ãæä¾ããããã«é²åãç¶ãã¾ãã
Quipper ã§ã¯ä¸çã®æã¦ã¾ã§å¦ã³ãå±ããã仲éãåéãã¦ãã¾ãã
*1:2017å¹´ã« Microsoft ã«è²·åããã¾ãã
*2:Staging ã§ã¯2åããããã£ã
*3:ãã®ãããEKS ã Tokyo Region ã§å©ç¨å¯è½ã«ãªã£ãå¾ããEKS ã¸ã®ç§»è¡ã®ã¢ããã¼ã·ã§ã³ã¯é«ããªãã£ã
*4:æ¬è¨äºå·çç¾å¨ããµãã¼ããã¼ã¸ã§ã³ã¯v1.16.10
*5:app-admin 㯠System Component 以å¤ã触ããããã«ä¸é¨å¶éãã ClusterRole ã§ã
*6:Cluster ã«å ¥ã£ããã©ãã£ãã¯ãåããService Routing ãè¡ã Nginx ã®ãã¨ã service-router ã¨å¼ãã§ãã¾ãã詳細ã¯Kuberneteså°å ¥ã§å®ç¾ãããä¸çã¨ãã®å ã«ããMicroservices ãåç §ãã ããã
*7:å®æ ã® Deployment ã«å¯¾ãã¦å¤æ´å·®åããããã©ãããæ¤ç¥ãã¦ãå¤æ´ã®ãããã®ã ãã© Deploy ããããã®ä»çµã¿ã詳細ã¯CI ã®ä¿®æ£ããªãªã¼ã¹åã«æ¬çªã¨åãæ¡ä»¶ä¸ã§æ¤è¨¼åºæ¥ãä»çµã¿ãæ§ç¯ãã話ãã覧ãã ããã
*8:Script ã失æããã¨ã㯠CircleCI ã® Build URL ãªã©ã®ã¡ã¿ãã¼ã¿ãå«ãã æ å ±ã Sentry ã«éãããã«ãã¦ããã失æããã¸ã§ãã«ããé£ã¹ãããã«ãã¦ã¾ã
*9:ãµã¼ãã¹ã® SLO ã«é¢ãã¦ã¯ä»¥åæ¸ããSRE NEXT 2020 ã§ãSLO Reviewãã¨ããã¿ã¤ãã«ã§ç»å£ãã¾ãã #srenext ãåç §ãã ããããªãããã®ã¨ãããç¾å¨ã¯éç¨æ¹æ³ãé²åãã¦ãã¾ããããã«ã¤ãã¦ã¯å¥éè¨äºãæ¸ããã¨æãã¾ãã
*10:è¿ããã¡ã«ã¢ã¦ãããããããã¯ãã§ãï¼ãã¶ãã