This document summarizes and compares several open source monitoring tools: Nagios, Graphite, StatsD, Logstash, and Sensu. Nagios is introduced as a commonly used tool that some love and some find frustrating. Graphite is described as a tool for storing and graphing time-series data. StatsD aggregates counters and timers and sends them to backend services like Graphite. Logstash is an tool for managing logs and events that can input, filter, and output data. Sensu is a monitoring router that connects check scripts to handler scripts to alert or process monitoring data. Examples are given for each tool and what types of metrics to collect.
1 of 59
Downloaded 675 times
More Related Content
Open Source Monitoring Tools
1. The State of Open
Source Monitoring
Tools
Michael Richardson (@m_richo)
Energized Work
2. What tools are we currently using to
monitor and troubleshoot our systems?
3. What tools are we currently using to
monitor and troubleshoot our systems?
•
•
•
•
Nagios
ssh + grep <something_bad> /some/random/log/file.log
tail –f /some/random/log/file.log
Others?
19. Graphite
•
Everything stored in graphite has a path with
components delimited by dots. Eg
servers.HOSTNAME.METRIC
applications.APPNAME.METRIC
servers.database01.memfree
applications.trading.loginattempts
20. Graphite
•
•
No need to pre-define metric end-points
Determine granularity of data upfront.
/opt/graphite/conf/storage-schemas.conf
[stats]
pattern = ^stats.*
retentions = 10:2160,60:10080,600:262974
[catchall]
priority = 0
pattern = ^.*
retentions = 30:86400,300:525600
21. Graphite
What should I graph/trend?
1. Application Profiling Data
2. Operational Profiling Data
3. Regression Testing (releases)
Why should I Graph/trend?
1. Trends can tell you when something is about to break.
2. …instead of hearing from your customers that it’s broken
3. Data can tell you when something is already broken but
you don’t yet know it (regression).
Source: Jason Dixon (@obfuscurity)
25. StatsD
• Written in node.js
• ~400 lines of javascript
• Listens to statistics (counters & timers),
and sends aggregates to backend
services (like graphite).
• simple
28. StatsD
Don’t like Javascript or Node.js??
Google “statsd alternatives”…..
20+ rewrites/clones for you including..
Ruby, python, scala, python+twisted,
erlang, clojure, C, groovy
29. StatsD
Concepts
• Buckets (a name that translates to graphite end-point)
• Values
• Flush (default 10 seconds)
Counter metrics
successfullogins:1|c|@0.1
Timing metrics
apitimer:320|ms
31. StatsD
Timer examples
• How fast is our function blah()
• How fast is a database query
• How fast is our 3rd party API service
• How fast is our internet access
• How fast are our page response times.
34. LogStash
•
•
•
•
•
Tool for managing Events and logs
http://logstash.net
https://github.com/logstash/logstash
Apache 2.0 license
Created by Jordan Sissel
(@jordansissel)
44. LogStash
Kibana
• Web interface for viewing logstash
records stored in elastic search
• http://kibana.org/
• http://github.com/rashidkpc/Kibana
• Search for records
• Stream records (near realtime)
• Create RSS feeds based on search
results
• Score, trend data
52. Sensu
• Message oriented architecture
(messages are JSON objects)
• Described as a monitoring router
• Connects “check” scripts on Sensu
Clients to “handler” scripts on Sensu
Servers
53. Sensu
Checks can
• Determine if a service like apache up
and running? (check exit code)
• Collect metrics like page views or
database cache usage.
55. Sensu
Output of checks are router to 1 or more
handlers who determine what to do.
• Send alerts via email, pagerduty, IRC,
twitter, basecamp, xmpp, hipchat,
campfire, etc, etc
56. Sensu
Output of checks are router to 1 or more
handlers who determine what to do.
• Send alerts via email, pagerduty, IRC,
twitter, basecamp, xmpp, hipchat,
campfire, etc, etc
• Feed metrics to backend services like
graphite, librato, opentsdb, etc, etc
Anyone want a quick rundown of how it works?Fault detection, notifictations, escalations, acknowledgements, adding new nodes, no ajax
Graphite is a highly scalable real-time graphing systemwritten in pythonapache 2.0 license
Graphite is a highly scalable real-time graphing systemwritten in pythonapache 2.0 license
Web – djangoWhisper – metrics database format (similar to RRDTool). Accepts out-of-order data and supports pipelining of data in a single operation.Carbon – storage engine (agent + cache + persister)
Web – djangoWhisper – database for storing time series dataCarbon – listening service for capturing data
Web – djangoWhisper – database for storing time series dataCarbon – listening service for capturing data
Why Graphing and trendingApplication profiling dataOperational profiling data
Why Graphing and trendingApplication profiling dataOperational profiling data
Counter example add 1 to the particular bucket. Count is sent at flush interval and reset to 0tells statsd that counter is sampled every 1/10th of the time.Timing exampleAPI service took 320ms to completeStatsd determines percentiles, average (mean), standard deviation, sum, lower and upper bounds for the flush intervalCan support storing histogram of values too (not default)