20171027 モニタリング勉強会

Operating Prometheus
モニタリング勉強会
2017/10/27 @kfdm

Self Introduction
• Paul Traylor
• LINE Fukuoka 開発室
• Currently responsible for updating monitoring environment at
LINE Fukuoka
• https://github.com/line/promgen
• https://promcon.io/2017-munich/talks/prometheus-as-a-
internal-service/

Operating Prometheus at LINE Fukuoka
• 4 HA Pairs
• ~2000 targets
per machine
• ~800k samples
per machine
• ~3.5 million samples
• ~7000 exporters
https://github.com/line/promgen

Scaling Prometheus ‒ HA
• Run multiple Prometheus
instance with the same targets
• Alerts are de-duplicated by Alertmanager

Scaling Prometheus ‒ Shard
• Split targets
across multiple
servers
• Alertmanager
de-duplicates
alerts
• Proxy or remote
read

Prometheus 1.8 ‒ Storage Format
https://promcon.io/2016-berlin/talks/the-prometheus-time-series-database/
http://labs.gree.jp/blog/2017/10/16614/
• One series per file
• Rewrites may have
to touch millions
of files
• Queries also may
touch millions of
files
• No easy way to backup

Prometheus 2.0 ‒ New Storage Format
https://promcon.io/2017-munich/slides/storing-16-bytes-at-scale.pdf
https://fabxc.org/blog/2017-04-10-writing-a-tsdb/
• Chunks stored in buckets by time
• Chunks past retention setting are just deleted
• Easier to backup
• Easier to compress

Prometheus 2.0 ‒ Backups
├── 01BX40G8TA6T1MNSS8JJE7ENPY/
│ ├── chunks/
│ ├── index
│ ├── meta.json
│ └── tombstones
├── 01BX5Y9SSE10VBZK4CMZ86WDR6/
│ ├── chunks/
│ ├── index
│ ├── meta.json
│ └── tombstones
├── lock
└── wal/
├── 000760
└── 000761
• https://github.com/Gouthamve/agni

Prometheus 2.0 ‒ Flag Changes
• Most flags move from single dash to double dash
• Many storage settings move to tsdb settings
• -config.file -> --config.file
• -storage.local.path -> --storage.tsdb.path

Prometheus 2.0 ‒ Rule Format Changes
https://www.robustperception.io/converting-rules-to-
the-prometheus-2-0-format/
groups:
- name: alert.rules
rules:
- alert: HighErrorRate
expr: job:request_latency_seconds:mean5m{job="myjob"}
> 0.5
for: 10m
annotations:
summary: High request latency
- alert: DailyTest
expr: vector(1)
for: 1m
annotations:
summary: Daily alert test
• ./promtool update rules /path/to/rules

Prometheus 2.0 ‒ Remote Read
• Prometheus 1.8 (Read)
• InfluxDB (Read and Write)
• Graphite (Write)
• OpenTSDB (Write)
• TimescaledB (Read and Write)
• https://prometheus.io/docs/operating/integrations/
• https://github.com/prometheus/prometheus/tree/master/do
cumentation/examples/remote_storage/remote_storage_ada
pter

Open Metrics
• https://github.com/RichiH/OpenMetrics
• https://github.com/RichiH/OpenMetrics/blob/master/CONT
RIBUTORS.md

20171027 モニタリング勉強会

More Related Content

20171027 モニタリング勉強会