æã ã®ãã¼ã ã§ã¯æ¤ç´¢åºç¤ã¨ãã¦Elasticsearchã¯ã©ã¹ã¿ãKubernetesä¸ã§å¤æ°éç¨ãã¦ãã¾ãããããã®Elasticsearchã¯ã©ã¹ã¿ã管çãã¦ããnamespaceã¯ãã«ãããã³ããªæã ã®Kubernetesã¯ã©ã¹ã¿ã®ä¸ã§æ大ã®ãªã½ã¼ã¹ãè¦æ±ãã¦ããnamespaceã§ãã
ä¸æ¹ã§ã¯ã©ã¹ã¿ã®ãµã¤ãºããã¼ã¯ã¿ã¤ã ã«åããã¦åºå®ãã¦ããããããã®ãªã½ã¼ã¹å©ç¨çã¯é常ã«ä½ãã¨ããåé¡ãããã¾ãããElasticsearch EnterpriseãElastic Cloudã«ã¯ãªã¼ãã¹ã±ã¼ãªã³ã°æ©è½ãåå¨ããã®ã§ãããããã¯ã¹ã±ã¼ã«ã¤ã³/ã¢ã¦ãã®ããã®ãã®ã§ã¯ãªãããã£ã¹ã¯ãµã¤ãºã«é¢ããã¹ã±ã¼ã«ã¢ãã/ãã¦ã³ãæä¾ãããã®ã§æã ã®è¦æ±ãæºãããã®ã§ã¯ããã¾ããã§ããã
ããã§ä»åã¯ãHPAãç¨ããã¹ã±ã¼ã«ã¤ã³/ã¢ã¦ãã®ããã®ãªã¼ãã¹ã±ã¼ãªã³ã°ã®ä»çµã¿ãéçºãã¾ãããããã«ãã£ã¦ãªã½ã¼ã¹å©ç¨çãåä¸ãããç´40%ã®ã³ã¹ãåæ¸ãéæã§ããã®ã§ããã®è©³ç´°ã«ã¤ãã¦èª¬æãã¾ãã
Elasticsearchã¨ECK
ã¡ã«ã«ãªã§ã¯ElasticsearchãECK(https://github.com/elastic/cloud-on-k8s) ãç¨ãã¦Kubernetesä¸ã§ç®¡çãã¦ãã¾ããECKã¯Elasticsearchã¨ããCustom Resourceã¨ãã®controllerã§ããã以ä¸ã®ãããªãªã½ã¼ã¹ãä½æããã¨å¯¾å¿ããStatefuleSetãServiceãConfigMapããã³Secretãªã©ã®ãªã½ã¼ã¹ãèªåã§ä½æããã¾ãã
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: example
spec:
version: 8.8.1
nodeSets:
- name: coordinating
count: 2
- name: master
count: 3
- name: data
count: 6
ãã®å®ç¾©ããcoordinatingãmasterãdataã®3ã¤ã®StatefulSetãä½æããã¾ãã
Horizontal Pod Autoscaler(HPA)ã使ã£ã¦ãããã®StatefulSetããªã¼ãã¹ã±ã¼ãªã³ã°ããããã®ã§ããã以ä¸ã®ãããªèª²é¡ãããã¾ãã
- Elasticsearchãªã½ã¼ã¹èªä½ãHPAã®å¯¾è±¡ã¨ã¯ã§ããªãããªããªãscale subresource(å¾è¿°)ãå®ç¾©ããã¦ããªããããè¤æ°ããnodeSetã®ã©ããå¢æ¸ãããã°è¯ãã®ãããããªãã
- Elasticsearchãã¹ã±ã¼ãªã³ã°ããéã¯Podæ°ã®å¢æ¸ã ãã§ã¯ãªãããã®Podã«é ç½®ãããElasticsearchã®indexãã¬ããªã«æ°ãå¤æ´ãã¦å¢æ¸ãããªããã°ãªããªããã¤ã¾ãã¹ã±ã¼ãªã³ã°ã®åä½ã¯ (indexã®shardæ° / Podãããã®shardæ°)ã¨ãªããä¸å³ã®å ´å㯠(3 / 1) = 3ã ä¸æ¹HPAã¯minReplicasããmaxReplicasã¾ã§ã®éã®ä»»æã®å¤ãæå®ããå¯è½æ§ãããããã®å ´åãElasticsearchã®auto_expand_replicasãªãã·ã§ã³ã¯Podãããã®shardæ° = indexã®shardæ°ã¨ãªãã1Podããã3ã¤ã®shardãä¹ã£ã¦ãã¾ãã®ã§æã ã®ã¦ã¼ã¹ã±ã¼ã¹ã«ã¯åããªããããèªåã§ã¬ããªã«æ°ãå¤æ´ããå¿ è¦ãããã
- Elasticsearchãªã½ã¼ã¹ã®ç®¡çä¸ã®StatefulSetãç´æ¥HPAã®å¯¾è±¡ã¨ããå ´åã2ã®åé¡ã«å ãã親ãªã½ã¼ã¹ã§ããElasticsearchãæ´æ°ããå ´åã«HPAã«ãã£ã¦èª¿æ´ããã¦ããPodæ°ã親ãªã½ã¼ã¹ã®å¤ã«ãªã»ããããã¦ãã¾ãã
ãããã®åé¡ã解決ããããã«æ°ããKubernetesã®Custom Resourceã¨controllerãä½æãã¾ããã
Custom Resourceã¨controller
以ä¸ãæ°ãã«å°å ¥ããCustom Resourceã®ä¾ã§ãã
apiVersion: search.mercari.in/v1alpha1
kind: ScalableElasticsearchNodeSet
metadata:
name: example
spec:
clusterName: example
count: 6
index:
name: index1
shardsPerNode: 1
nodeSetName: data
ããã¯å
ã»ã©ã®Elasticsearchãªã½ã¼ã¹ã®dataã¨ããååã®nodeSetã«å¯¾å¿ãã¾ãããã®ãªã½ã¼ã¹ã¯ç´æ¥Elasticsearchãªã½ã¼ã¹ã¨ã®è¦ªåé¢ä¿ã¯ãªããscale subresourceãæä¾ãã¦ããã kubectl scale
ã³ãã³ããHPAã®å¯¾è±¡ã¨ãããã¨ãã§ãã¾ããCustom Resourceã®å®ç¾©ã¯kubebuilderãç¨ãã¦çæãã¦ããã®ã§ããã以ä¸ã®ãããªã³ã¡ã³ãã追å ãããã¨ã§scale subresourceãæä¾ã§ããããã«ãªãã¾ãã
//+kubebuilder:subresource:scale:specpath=.spec.count,statuspath=.status.count,selectorpath=.status.selector
ããã¯ä¸è¨ã®ScalableElasticsearchNodeSetã®.spec.countãHPAãkubectl scale
ã³ãã³ãã®æä½å¯¾è±¡ã§ãããã¨ã示ãã.status.countã«ç¾å¨ã®countæ°ãè¨é²ããããã¨ãæå³ãã¾ããããã«.status.selectorã«ãã®ãªã½ã¼ã¹ã®ç®¡ç対象ãããªãã¡å¯¾è±¡ã®StatefulSetã®ç®¡ç対象ãé¸æããããã®selectorãè¨é²ããã¾ãããããã¯å¿è«èªåã§è¨é²ãããããã§ã¯ãªãããããªãããã«èªåã§controllerãå®è£
ããªããã°ãªãã¾ããã
ã¾ãããã®Custom Resourceã®specå ã®countãshardsPerNodeããã³å¯¾è±¡ã¨ãªãindexã®shardæ°ããå®éã®StatefulSetã®ã¬ããªã«æ°ã以ä¸ã®ããã«ç®åºãã¾ãã
ceil(ceil(count * shardsPerNode / shardæ°) * shardæ° / shardsPerNode)
Scale subresourceã®.spec.count
ã¨å®éã®countãä¸è´ãã¦ããªãã¦ã(å°ãªãã¨ãtype: Resource
ã®å ´å)HPAã®æåã«åé¡ããªããã¨ã¯ãHPAã®ã½ã¼ã¹ã³ã¼ããèªãã§ç¢ºèªæ¸ã¿ã§ããHPAã§è¨å®ãã¹ãã¬ããªã«æ°ãè¨ç®ããéã«ç¨ããããç¾å¨ã®ã¬ããªã«æ°ã¯.status.selector
ã§é¸æãããPodã®æ°ã¨ãªãã¾ãã
ã¹ã±ã¼ã«ã¢ã¦ãæã«ã¯ã¾ãElasticsearchãªã½ã¼ã¹ã®è©²å½ã®nodeSetã®countãä¸è¨ã®è¨ç®å¼ããç®åºãããå¤ã«è¨å®ãããã¹ã¦ã®PodãReadyã«ãªã£ãå¾ãElasticsearchã®APIãç¨ãã¦indexã®ã¬ããªã«æ°ãå¢ããã¾ããã¹ã±ã¼ã«ã¤ã³ããå ´åã¯éã«indexã®ã¬ããªã«æ°ãæ¸ãããå¾ã«Elasticsearchãªã½ã¼ã¹ã®countãå¤æ´ãã¾ãã
ããã§å
ã»ã©æãã課é¡ã®1ã¨2ã«ã¤ãã¦ã¯è§£æ±ºã§ãã¾ããã3ã«é¢ãã¦ã¯MutatingWebhookConfigurationãç¨ãã¦è§£æ±ºãã¾ããããã¯Elasticsearchãªã½ã¼ã¹ãæ´æ°ãããéã«å¼ã³åºãããhookãæå®ããä»çµã¿ã§ããã®hookã®ä¸ã§ search.mercari.in/ignore-count-changeâ: âdata,coordinating
ã®ãããªannotationãæå®ããã¦ããå ´åããã®annotationã«å¯¾å¿ããnodeSetã®countæ°ãç¾å¨ã®countæ°ã«ä¸æ¸ããã¾ããããã«ããHPAã®å¯¾è±¡ã¨ãªã£ã¦ããç¶æ
ã§Elasticsearchãªã½ã¼ã¹ã®å¤æ´ãGitOpsçã§è¡ã£ã¦ããcountããªã»ããããããã¨ããªããªãã¾ãã
å°å ¥ã«éãã¦ã®åé¡ã¨è§£æ±º
以ä¸ã®æ¹éã§å®è£ ããcontrollerãå®éã«å°å ¥ãã¦ã¿ãã¨ãããããã¤ãã®èª²é¡ãããã£ãã®ã§ãããã«ã¤ãã¦ç´¹ä»ãã¾ãã
- ã¹ã±ã¼ã«ã¢ã¦ãç´å¾ã«latencyãå¢å ãã
- Force mergeã«ããHPAã®metricãCPUå©ç¨çã«ã§ããªã
- ãã©ãã£ãã¯ãå°ãªãæéã§ã¯ããã«ããã¯ã¨ãªãmetricsãå¤åãã
ã¹ã±ã¼ã«ã¢ã¦ãç´å¾ã«latencyãå¢å ãã
ãã®èª²é¡ã¯å ã rolling updateãè¡ãã¨ããªã©ã§ã観測ã§ãã¦ããã®ã§ãããDataãã¼ããèµ·åããshardãé ç½®ãããæ¤ç´¢ãªã¯ã¨ã¹ããåãä»ãå§ããç´å¾ã®latencyãé常ã«é«ããªã£ã¦ãã¾ãããããã¯Dataãã¼ãã«éã£ã話ã§ã¯ãªãElasticsearchã«ãªã¯ã¨ã¹ããéãmicroserviceã«Istioãå°å ¥ããéã«ãCoordinatingãã¼ã (shardãæããã«æåã«ãªã¯ã¨ã¹ããåãä»ãã¦routingã¨mergeå¦çãè¡ãã ãã®ãã¼ã)ã§ãçºçãã¦ãã¾ããã
åå ã¯ããããJVMã®ã³ã¼ã«ãã¹ã¿ã¼ãåé¡ã«ãããã®ã§ãIstioã®å ´åsidecarãæ°ãã追å ãããPodã«å³åº§ã«åçã«ãªã¯ã¨ã¹ããéããã¨ãããã¨ãåé¡ã§ããããã®ç¹ã«ã¤ãã¦ã¯ãIstioå°å ¥ä»¥åã¯HTTPã®keep aliveã«ãããæ°ãã追å ãããPodã«ç·©ããã«ãã©ãã£ãã¯ã移è¡ãã¦ããããåé¡ã¨ãªã£ã¦ãã¾ããã§ããã
ãã®èª²é¡ã解決ããããã«passthrough(Istioã®service discoveryã«é ¼ãããã®ã¾ã¾éã)ãDestinationRuleã®warmupDurationSecs(æå®ã®ç§æ°ãããã¦æ°ããPodã«å¾ã ã«ãã©ãã£ãã¯ãå¢ããã¦ãã)ã使ãã¾ããããã Dataãã¼ãã®å ´åã¯ãroutingã¯å®å ¨ã«Elasticsearchä¾åã¨ãªããå¤é¨ããã©ãã«ãã§ããä½å°ããªãã£ãããElasticsearchèªä½ãä¿®æ£ãããã¨ã«ãã¾ãããããã¯upstreamã«Pull Requestã¨ãã¦ããã¦ãã¾ããhttps://github.com/elastic/elasticsearch/pull/90897
Force mergeã«ããHPAã®metricãCPUå©ç¨çã«ã§ããªã
æã ã®indexã¯ããã¥ã¡ã³ãã®åé¤ï¼æ´æ°(Elasticsearchãå©ç¨ãã¦ããæ¤ç´¢ã©ã¤ãã©ãªã§ããLuceneã«ãããæ´æ°ã¯ãå é¨çã«ã¯åé¤+追å ã¨ããå¦çããããªãã¾ã)ã®é »åº¦ãé«ãããæ¯æ¥ãã©ãã£ãã¯ã®å°ãªãæé帯ã«force mergeãè¡ã£ã¦è«ççã«åé¤æ¸ã¿ã®ããã¥ã¡ã³ããåé¤ãã¦ãã¾ããããã®force mergeãå¿ããã¨æ°æ¥å¾ã«ãã©ãã£ãã¯ãæããªããªãã¨ãããã¨ãéå»çºçãã¦ãã¾ããã
ãããForce mergeã¯CPUã«è² è·ã®ãããå¦çã§ãããã¾ããã®æ§è³ªä¸åãã¿ã¤ãã³ã°ã§ã¹ã±ã¼ã«ã¢ã¦ããè¡ãã¹ããã®ã§ããªããããHPAã®metricãCPUå©ç¨çã«ãããã¨ãã§ãã¾ããã§ããããã®ããåæã¯æ¤ç´¢ãªã¯ã¨ã¹ãæ°ãDatadogçµç±ã§external metricã¨ãã¦å©ç¨ãããã¨èãã¦ãã¾ããããæ°ããmicroserviceããå¼ã³åºãããéã«ã¯ã¨ãªã®ãã¿ã¼ã³ãå¤åãè² è·ã®ãã¿ã¼ã³ãå¤ããããæ¬è³ªçã«ã¯CPUå©ç¨çãHPAã®metricã«ãããã¨ãæã¾ããã§ãã
ããã§Luceneã®ã½ã¼ã¹ã³ã¼ããèªãã§ããã¨ãdeletes_pct_allowed
ã¨ãããªãã·ã§ã³ãè¦ã¤ãã¾ãããããã¯è«ççã«åé¤æ¸ã¿ã®ããã¥ã¡ã³ãã®å²åãæå®ããããã®ãã®ã§ãããã©ã«ãå¤ã¯33ã§ããããã®å¤ãå¤æ´ããªããããã©ã¼ãã³ã¹ãã¹ããå®æ½ããã¨30%ä»è¿ããæ¥æ¿ã«latencyãæªåãããã¨ããããã¾ããããã®ãããã®å¤ãæå°å¤ã§ãã20 (ææ°ã®Elasticsearchã§ã¯ããã©ã«ã20ãæå°å¤ã¯5 https://github.com/elastic/elasticsearch/pull/93188
)ã«è¨å®ãããã¨ã§Force mergeå¦çãåé¤ãããã¨ãã§ãã¾ãããããã«ããHPAã®metricã«CPUå©ç¨çãæå®ãããã¨ãã§ãã¦ãã¾ãã
ãã©ãã£ãã¯ãå°ãªãæéã§ã¯ããã«ããã¯ã¨ãªãmetricsãå¤åãã
Elasticsearchã§ã¯indexã®ä¸èº«ããã¡ã¤ã«ã·ã¹ãã ãã£ãã·ã¥ã«è¼ãããã¨ã§ä½latencyãå®ç¾ãã¾ããæã ãå¿ è¦ãªæ å ±ã¯ãã¹ã¦ãã¡ã¤ã«ã·ã¹ãã ãã£ãã·ã¥ã«è¼ãããã¨ãç®æãã¦ããããã巨大ãªindexã§ã¯å¤ãã®memoryã使ç¨ãã¾ãããã©ãã£ãã¯ãããç¨åº¦åå¨ããæé帯ã§ã¯ããã«ããã¯ãCPUã§ãããCPUå©ç¨çãHPAã®metricã«ãããã¨ã§ãã¾ããªã¼ãã¹ã±ã¼ã«ãã¾ãã
ããããã©ãã£ãã¯ã極端ã«å°ãªãæé帯ã§ãã£ã¦ãå¯ç¨æ§ã®ããã«æä½éã®ã¬ããªã«ã¯ç¢ºä¿ããªãã¦ã¯ãªãã¾ããããã®ãããã®æé帯ã§ã¯ããã«ããã¯ã¯memoryã¨ãªããå¿ è¦ãªCPUã«å¯¾ãã¦ç¡é§ã«å¤ãã®CPUãå²ãå½ã¦ã¦ãã¾ããã¨ã«ãªãã¾ãã
å
ã
ã®æ§æã¯memoryã®éãdiskä¸ã®indexãµã¤ãºã®2åã¨ãªãããè¨å®ããã¦ãããmemory.usage
ãé«ãå¤ã示ãã¦ãã¾ããããmemory.working_set
ãè¦ãã¨ã¾ã ã¾ã ä½è£ãããããã§ãããKubernetesã«ãã㦠memory.working_set
ã¨ã¯ memory.usage
ããinactive filesãå¼ããå¤ã¨ãªãã¾ããinactive filesã¯ãã£ããããã¨ã»ã¨ãã©åç
§ããã¦ããªããã¡ã¤ã«ã·ã¹ãã ãã£ãã·ã¥ã®ãµã¤ãºã¨ãªãã¾ããKubernetesã§ã¯containerã®memory limitã«éããåã«ãããã®ãã¡ã¤ã«ã·ã¹ãã ãã£ãã·ã¥ã¯evictããããããå²ãå½ã¦ãmemoryã¯ãã£ã¨å°ãªãã¦ãè¯ããã¨ããããã¾ãã
å¿è«inactive filesã§ã¯ãªããã¡ã¤ã«ã·ã¹ãã ãã£ãã·ã¥ãå¿ è¦ãªãã°evictãããã®ã§ããããã¡ãã¯evictããããã¨ããã©ã¼ãã³ã¹ã®å£åã«ã¤ãªããã¾ããé£ãããã¨ã«inactiveã§ãªããªãæ¡ä»¶ãæå¤ã¨ç·©ãã®ã§ã©ãã¾ã§evictå¯è½ãªã®ããæ示çã«ã¯ããããªããããmemory requestããã¾ãæ»ããå¤ã«ã¯ã§ãã¦ãã¾ããããããã«ããmemoryãããã«ããã¯ã«ãªã£ã¦ããæé帯ã«åè¨CPU requestãæ¸ãããã¨ãã§ãã¾ããã
Elasticsearchã¯statefulãªã¢ããªã±ã¼ã·ã§ã³ãªã®ã§Podã®åèµ·åãå¿ è¦ãªVPAãé©ç¨ããã®ãé£ããã§ããIn-place Update of Pod Resources (https://kubernetes.io/blog/2023/05/12/in-place-pod-resize-alpha/) ãå©ç¨å¯è½ã«ãªãã¨CPU requestãåèµ·åãªãã«ã¹ã±ã¼ã«ãã¦ã³ã§ããããã«ãªãããããã®åé¡ãç·©åããããã¨ãæå¾ ãã¦ãã¾ãã
ãããã«
ãã®è¨äºã§ã¯ãECKã§Kubernetesä¸ã§åããã¦ããElasticsearchã¯ã©ã¹ã¿ã«å¯¾ãã¦HPAãç¨ãã¦CPUå©ç¨çãåºã«ãªã¼ãã¹ã±ã¼ãªã³ã°ããæ¹æ³ã«ã¤ãã¦è¿°ã¹ã¾ãããããã«ããElasticsearchã®éç¨ã«é¢ããKubernetesã®ã³ã¹ããç´40%åæ¸ã§ãã¾ãããããããä»å¾Elastic Cloudã«ã¯Serverlessã®ä¸ç°ã¨ãã¦ãã®è¾ºãã®ãªã¼ãã¹ã±ã¼ãªã³ã°æ©è½ãæä¾ããããã¨ã«ãªãã¨äºæ³ãã¾ãããæã ã®ä»ã®ç¶æ³ä¸ã«ããã¦ã¯å¹æçãªææ³ã ã¨æãã¦ãã¾ãã
search infraãã¼ã ã§ã¯ç¾å¨ã¨ãã«åã仲éãåéãã¦ãã¾ããããèå³ãããã¾ããããæ°è»½ã«ãååããã ããã