ããã«ã¡ã¯ãã¨ã³ã¸ãã¢ãªã³ã°Gã®é«æ©ã§ãã
å»å¹´ã®11æã«ã¨ã ã¹ãªã¼ã«SREã¨ãã¦åç»ãã¦ããããµã¼ãã®ã»ããã¢ããä½æ¥ãªã©ã®åºæ¬çãªã¤ã³ãã©ä½æ¥ã«å ãã¦ãåãµã¼ãã¹ã®ãµã¼ãã¹ã¬ãã«ã®è¨å®ãç£è¦ã®ä»çµã¿ä½ããªã©ãè¡ã£ã¦ãã¾ããã
ä»åã¯ãã®ãµã¼ãã¹ã¬ãã«ãç£è¦ããä»çµã¿ããç´¹ä»ãããã¨æãã¾ãã
æ¬ç¨¿ã®æµã
- SLIè¨å®
- SLOè¨å®
- å種ã¡ããªã¯ã¹ã®åé
- ã¢ã©ã¼ãã£ã³ã°
- ç£è¦ããã·ã¥ãã¼ãã®ä½æ
- ã¾ã¨ã
å ¨ä½å
ãã£ããã¨ããå ¨ä½åã¨ãã¦ã¯ä¸å³ã®ãããªæãã§ãã
ã¾ãããã®åãçµã¿ãå®æ½ããåå¾ã§ãä¸ã®ãããªå¤åï¼å¹æï¼ãããã¾ããã
- å
- ãã°ã®åéã¯ãã¦ããããå ¨ãµã¼ãã¹ã§ã¯åãã¦ããªã
- ãã°ã®åéçµè·¯ããµã¼ãã«ãã£ã¦ç°ãªãï¼Service AããElasticsearchã¨ãããã£ããï¼
- å¾
- å ¨ãµã¼ãã¹ã®ã¢ã¯ã»ã¹ãã°ãåéã»é²è¦§å¯è½
- ãã°ã®åéçµè·¯ãå ±éå
- ã»ã¨ãã©ã®ãµã¼ãã¹ï¼70以ä¸ï¼ã®ãµã¼ãã¹ã¬ãã«ãè¦å®
- ãµã¼ãã¹ã¬ãã«ã®è¶ éç£è¦ãåºæ¥ãããã«ãªã£ã
SLIè¨å®
ã¾ã決ããªãã¦ã¯ãªããªãã®ã¯ãµã¼ãã¹ã¬ãã«ã®ææ¨ï¼Service Level Indicatorï¼ãä½ã«ãããã§ãããå¼ç¤¾ã§ã¯ç¨¼åçã¨HTTPãªã¯ã¨ã¹ãã®ã¬ã¤ãã³ã·ã主ãªã¿ã¼ã²ããã¨ãã¾ããã
ãã¾ãå¤ãããã¨åæè¨å®ã大å¤ãªã®ã§ãæä½éã ãè¨å®ããå¿ è¦ã«å¿ãã¦å¾ãã追å ããã¨ããæ¹éã«ãã¦ãã¾ãã
SLOè¨å®
æåã«ã©ããããã®å¤ãè¨å®ãããã¯çµæ§æ©ã¿ã©ããã ã¨æãã¾ãã
試é¨çã«ç£è¦ãéå§ããæ°ãµã¼ãã¹ã§ã¯ç¾å¨ã®SLIã®å¤ãåèã«æ±ºãã¦ãã¾ãã......ãããããå ¨ãµã¼ãã¹ã«å¯¾ãã¦ãã£ã¦ããã¨ãã¾ãã«ãæéãããããããã®ã§ãã¨ããããä¸å¾ã§ãã£ãã決ãã¦ã3ã¶æ~åå¹´ç¨åº¦ã®ééã§éçºé£ã¨SREã§ãã¼ãã£ã³ã°ãè¨ãã¦ä¿®æ£ãã¦ããã¨ããå½¢ãã¨ã£ã¦ãã¾ãã
æåã«è¨å®ããSLO
- 稼åç
- ãã«ã¹ãã§ãã¯ç¨ã®URLã¸ã®ãªã¯ã¨ã¹ãã99.9~99.99%以ä¸200å¿çãè¿ããã¨
- ã¡ã¼ã«éä¿¡ãµã¼ãã¹çã®ã¢ããªã±ã¼ã·ã§ã³ã®ç¨¼åæéãããã»ã©éè¦ã§ãªããã®ã«é¢ãã¦ã¯ã¨ã©ã¼ççå¥ã®ææ¨ãæ¤è¨ä¸
- HTTPãªã¯ã¨ã¹ãã®ã¬ã¤ãã³ã·
- ããããã¼ã¸ or å ¨URL ã® 98~99% percentile ã 1000ms 以ä¸
- ãµã¼ãã¹ãã¨ã«éç¹çã«ç£è¦ãããURLãããã°å¥é追å
å種ã¡ããªã¯ã¹ã®åé
次ã«ã©ããã£ã¦SLIã¨ãã¦å®ããå¤ãåéãããããç´¹ä»ãã¾ãã
稼åç
å¼ç¤¾ã§ã¯ãµã¼ãã¹ã®æ»æ´»ç£è¦ãã¼ã«ã¨ãã¦ã¯ä¸è¨2ã¤ã主ã«ä½¿ã£ã¦ãã¾ãã
- Nagios
- åãµã¼ãã®æ»æ´»ç£è¦ç¨
- NodePing
- 社å¤ããã®URLå¤å½¢ç£è¦ç¨
- ã¨ã¦ãå®ãï¼ããã¨ããï¼
- ã¬ã¹ãã³ã¹ã¿ã¤ã ãåãã¾ãããã¢ã¯ã»ã¹å ã«ãã£ã¦ã°ãã¤ããããã®ã§æªä½¿ç¨
NodePingããã®ç¨¼åçåé
ãã®æã®URLå¤å½¢ç£è¦ãµã¼ãã¹ã§ã¯å¤§ä½ããã®ã§ã¯ãªããã¨æãã®ã§ãããuptimeãè¿ãã¦ãããAPIãããã®ã§ãåå¾æéãçµã£ã¦å®æçã«åå¾ãã¦Elasticsearchã«æ ¼ç´ãã¦ããã¾ãã
https://nodeping.com/docs-api-results.html#uptime
Nagiosããã®ç¨¼åçåé
Nagiosã«ã¯Availability Reportã¨ããæ©è½ãããåç£è¦å¯¾è±¡ã®ç¨¼åçã表示ãããã¨ãåºæ¥ã¾ãã
ããå²ã¨æ¯ãããã¼ã«ãªã®ã§JSONçã®æ±ããããå½¢å¼ã§ãã¼ã¿ãè¿ãã¦ãããAPIããªãã£ãããã¾ããï¼ããã¤ãREST APIãã©ã°ã¤ã³ã¯ããããã§ãããAvailability Reportã¯å¯¾å¿ãã¦ããªãã£ããï¼
ãããªã®ã§ã 
ãããä½ãã°ãããããï¼ã¨ãã話ã§ã¯ããã¾ãã......ç°¡åã«å®ç¾ãããã®ã§ãåæã§ã¯ããã¾ãããPythonã®BeautifulSoupã§HTMLããã¼ã¹ãã¦State OKã®Timeãåå¾ãã¦ãã¾ãã
soup = BeautifulSoup(res.text, "html.parser") elements = soup.select('.serviceOK')[3:] pattern = r'[0-9]+' lists = re.findall(pattern, elements[0].text) lists[0] = int(lists[0]) * 24 * 60 * 60 lists[1] = int(lists[1]) * 60 * 60 lists[2] = int(lists[2]) * 60 lists[3] = int(lists[3]) uptime = sum(lists)
HTTPãªã¯ã¨ã¹ãã®ã¬ã¤ãã³ã·
åºæ¬çã«ã¯Nginx/Apache/Playçã®ã¢ã¯ã»ã¹ãã°ãããªã¯ã¨ã¹ãã®ã¬ã¤ãã³ã·ï¼ã¬ã¹ãã³ã¹ã¿ã¤ã ï¼ãåå¾ãã¦ãã¾ãã åãµã¼ãã«ã¯ãã°ãéä»ããããã®Fluentdãå°å ¥ããã¦ãããltsvå½¢å¼ã§åºåããããã°ãä¸ç¶ç¨ã®Fluentdã«éä»ãã¦ãã¾ãã
ã¾ãä¸ç¶ç¨ã®Fluentdã§ã¯å種ãã£ã«ã¿ãªã³ã°å¦çããé·æä¿åç¨ã®ã¤ã³ããã¯ã¹ãä½æããããã®ãµã³ããªã³ã°å¦çãå®æ½ãã¦ãã¾ãã
ä¸è¨ã¯rawã®ä»ããã¿ã°ã®ã¤ãã³ããéå¼ãã¦sampleãçæãã¦ããå¦ç
@type sampling_filter interval [ã¢ã¯ã»ã¹æ°ã«å¿ãããµã³ããªã³ã°ã¬ã¼ã] remove_prefix raw add_prefix sample
ãµã³ããªã³ã°ãããã¨ã§ç²¾åº¦ã¯ä¸ããã¾ããä¸è¨ã®ã¡ãªãããããã¾ãã
- ãã¼ã¿ãé·æéä¿åãã¦ããã¼ã¿éãããã»ã©è¥å¤§åããªã
- éè¨æã®Elasticsearchã®è² è·è»½æ¸ï¼ä½ãªã½ã¼ã¹ã®VMã§åããã¦ããã®ã§å¤§äºï¼
- ãã£ã¼ã«ããå ã ã®ãã°ã¨åæ§ã§ãããããã°ã©ãã使ãåãã
ä¸å®æéã§éç´ããå¤ãèç©ãã¦ããæ¹æ³ãããã¨ã¯æãã¾ãããKibanaã§ã¯å éå¹³åã®ã°ã©ããä½æãããã¨ãå°é£ãªããæ¡ç¨ãã¦ããã¾ããã
ä¾:
02:00-03:00 ã¢ã¯ã»ã¹æ°: 1, ã¬ã¹ãã³ã¹ã¿ã¤ã 5000ms
03:00-04:00 ã¢ã¯ã»ã¹æ°: 100, ã¬ã¹ãã³ã¹ã¿ã¤ã : 1000ms
-> å¹³åã3000msã«ãªã£ã¦ãã¾ã£ãã...ï¼ã¢ã¯ã»ã¹æ°ã«å¿ããéã¿ä»ããå¿
è¦ï¼
ã¢ã©ã¼ãã£ã³ã°
ä¸è¨ã®ãããªã¢ã©ã¼ãã¯åå¨ãã¾ãããã
- Nagios/NodePing/Warder*1/CloudWatch/Prometheus+Grafanaã«ããé害æã¢ã©ã¼ãã¡ã¼ã«
- å®æçãªã¬ã¹ãã³ã¹ã¿ã¤ã ç£è¦ã¹ã¯ãªããã«ããã¢ã©ã¼ãã¡ã¼ã«
ãããã«å ãã¦ãéå»5æ¥éã®ç¯å²ã§SLOãè¶ éãã¦ããªããç£è¦ããããã«ãã¾ããã
æ¹æ³ã¨ãã¦ã¯å®æçã«Pythonã§Elasticsearchã«ç¨¼åçãã¬ã¹ãã³ã¹ã¿ã¤ã ã®ãã¼ã»ã³ã¿ã¤ã«å¤ãæ±ããAggregationã¯ã¨ãªï¼éç´é¢æ°ã®ãããªãã®ï¼ãå®è¡ããçµæãSLOãè¶ éãã¦ããå ´åã¯Incomming Webhookã§Slackã«éç¥ããããã«ãã¦ãã¾ãã
åSLOã¯ä¸è¨ã®ãããªæãã§yamlãã¡ã¤ã«ã§ç®¡ç
xxx: # ãµã¼ãã¹å channel: xxx # æ稿å Slack Channel dashboard: xxx # ç£è¦ããã·ã¥ãã¼ãã®URL slo: xxx: # åSLIåºæã®å称ï¼~availçï¼ template: xxx.j2 # Elasticsearchã¸ã®ã¯ã¨ãªã®ãã³ãã¬ã¼ãï¼jinja2ï¼ index: xxx # ã¯ã¨ãªå¯¾è±¡ã®ã¤ã³ããã¯ã¹ operator: "<=" # ä¸è¨valueãã大ããã®ãå°ããã®ã value: 1000 # SLOã®å¤ var: # Elasticsearchã®ã¯ã¨ãªã«åãè¾¼ãå¤æ°å¤ query_list: - 'uri: [ä»»æURL]'
åéçºå´ã®ãã¼ã ã¨ã®é£çµ¡ç¨Channelã«ã¯ãµã¼ãã¹ãã¨ã«éç¥ã
SREç¨ã®Channelã«ã¯è¶ éç¶æ³ãä¸è¦§åºæ¥ãã¡ãã»ã¼ã¸ãéç¥ãã¾ãã
ç£è¦ããã·ã¥ãã¼ãã®ä½æ
ç¶æ³ãä¸è¦§ããããã«ãå ¨ãµã¼ãã¹ã®SLOè¶ éç¶æ³ã横æçã«è¦ãããã·ã¥ãã¼ããKibanaã«ã¦ä½æãã¾ããã
å³ä¸ã®SLI Selectorã¨ããã¨ããã§ãµã¼ãã¹ã¨ææ¨ãæå®ããã¨ã°ã©ããçµãè¾¼ããã¨ãåºæ¥ã¾ãã
éå»æ
å ±ã¯ã²ã¨æãã¨ã«éè¨ããæ
å ±ãTimelionã¨ããæ©è½ã使ã£ã¦ã°ã©ãåãã¦ãã¾ãã
Kibana赤ç·ãSLOã§ãéç·ãSLIã®å¤ã¨ãªã£ã¦ãã¾ãã
â» ä¸å³ã¯MRåã¨ãããµã¼ãã¹ã®Apacheã®ã¬ã¹ãã³ã¹ã¿ã¤ã ã®å¤ãæç»ãã¦ãã¾ãã
Timelionè¨è¼ä¾
$sli=.es(index=[æãã¨ã®å¤ãæ ¼ç´ããã¤ã³ããã¯ã¹],split=indicator:1,metric=max:sli), $slo=.es(index=[æãã¨ã®å¤ãæ ¼ç´ããã¤ã³ããã¯ã¹],split=indicator:1,metric=max:slo), ($sli).lines().label("SLI"), ($slo).lines().label("SLO"), ($sli).if(gt,($slo), $sli, null).points().color(red).label(""), <- SLOè¶ éæã«èµ¤ä¸¸ãã¤ãã¦å¼·èª¿ãã ($sli).trend().label("Trend") <- ç·å½¢å帰ãã¬ã³ã
ã¾ã¨ã
ä¸è¨åãçµã¿ã«ãããç°¡åã«ã§ã¯ããã¾ããåãµã¼ãã¹ã®ãµã¼ãã¹ã¬ãã«ç£è¦ã¨ã¢ã©ã¼ãã£ã³ã°ã®ä»çµã¿ãä½ããã¨ãåºæ¥ã¾ããã
ãã ãä¸çªéè¦ãªã®ã¯SLOãè¶
éããªãããã«ãã°ããæ¹åæ½çã決ãã¦å®è¡ãã¦ããä½å¶ä½ãã ã¨æãã®ã§ããã®ããããé²ãã¦ããããã¨æã£ã¦ãã¾ãã
ã¾ãMicroserviceåãããµã¼ãã¹ã§ã¯ãåä¸ã®ã¢ã¯ã»ã¹ãã°ã§ã¯ãå ¨ä½ã®ã¬ã¤ãã³ã·ãææ¡ããããªããã®ãå¤ããç¾ç¶ã¨ãã¦ã¯ãã¾ãç´°ããSLOãè¨å®ã§ãã¦ããªãã§ãã ä¸é¨ãµã¼ãã¹ã§ã¯zipkinãX-Rayãå©ç¨ãã¦åæ£ãã¬ã¼ã·ã³ã°ãã¦ãã¾ãããä»å¾ã¯ããããããå¼·åãã¦ããããã¨èãã¦ãã¾ãã
We're Hiring!
å¼ç¤¾ã§ã¯ä¸ç·ã«åãã¦ãããã¤ã³ãã©ã¨ã³ã¸ãã¢/SREã絶è³åéãã¦ããã¾ãï¼
- GitLabãJenkinsãç¨ããCI/CDã®å®è·µ
- Fluentdãªã©ã使ã£ããã°åéã管çã®æè¡
- KibanaãGrafana, CloudWatchãªã©ã«ããã·ã¹ãã ææ¨ã®å¯è¦åæè¡
- AnsibleãTerraformãç¨ããInfrastructure as Codeã®å®è·µã¹ãã«
- ãªã³ãã¬ãã¹ã¨ãããªãã¯ã¯ã©ã¦ã両æ¹ã«é¢ããã·ã¹ãã è¨è¨ãæ§ç¯ãéç¨ã®çµé¨
- Microservicesã·ã¹ãã ã«å¯¾ããç£è¦ãéç¨ã®æè¡
- DockerãECSãç¨ããã·ã¹ãã ã®æ§ææè¡ãéç¨ãã¦ãã¦
ä¸è¨ã®ãããªãã¨ã«å°ãã§ããèå³ã湧ãããã§ãããã ãã²ä¸è¨ãã©ã¼ã ãããæ°è»½ã«ã¨ã³ããªã¼ãã¦ã¿ã¦ä¸ããã
*1:M3ã®å 製ç£è¦ãã¼ã«ãNagiosã対å¿ãã¦ããªããããã³ã«ã®ãã«ã¹ãã§ãã¯ãæ å½