Initially presented at OpenWest 2014 conference.
Graphite and StatsD gather line series data and offer a robust set of APIs to access that data. While the tools are robust, the dashboards are straight from 1992 and alerting off the data is nonexistent. Nark, an opensource project, solves both of these problems. It provides easy to use dashboards and readily available alerts and notifications to users. It has been used in production at Lucid Software for almost a year. Related to Nark are the tools required to make Graphite highly available.
2. About Alyssa
Software Developer at
Lucid Software Inc
BYU graduate with
Bachelors in Computer
Science
I love
Playing the carillon and
piano
Fast-paced board games
Hats
Traveling
Playing foosball
3. About “The Barlocker”
• Chief Architect at Lucid
Software Inc
• Bachelors degree from
BYU in Computer Science
• I love to
• play board games
• go 4-wheeling
• wrestle my sons
• fly airplanes
• Follow me on
nineofclouds.blogspot.com
5. Graphite
Graphite is a highly scalable real-time graphing system
Initially developed by Chris Davis at Orbitz.com
Comprised of 3 related projects
Carbon – collects and records metrics
Whisper – Backend storage mechanism
Graphite-Web – HTTP frontend that displays graphs
Written in Python
http://graphite.wikidot.com/
https://github.com/graphite-project/
6. StatsD
A network daemon that aggregates statistics for
backend services.
Developed by Etsy
Written in Node.js
https://github.com/etsy/statsd/
http://codeascraft.etsy.com/2011/02/15/measure
-anything-measure-everything/
7. HA Receiver
Used to make StatsD highly available and scalable.
Initially developed by Matthew Barlocker at Lucid
Software Inc
Written in Node
https://github.com/lucidsoftware/statsd-ha-receiver
8. Nark
Nark is an alerting and dashboard frontend for
Graphite.
Under active development by Lucid Software.
Written in Scala using the Play! Framework
MySQL backed
https://github.com/lucidchart/nark
11. Data Flows IN
Applications report
different types of
metrics
StatsD aggregates
metrics
Carbon-cache gathers
and groups metrics
Whisper stores metrics
to disk
12. Data Flows OUT
User initiates request
over HTTP
Graphite-web requests
information from
carbon-cache
Carbon-cache reads
data from disk using
whisper
Graphite-web builds
graph using data
14. StatsD - Options
We can put StatsD in 3 places:
On the reporting server
Scales as well as your reporting servers do
As available as the reporting servers are
Can’t get vital metrics like
stats.production.applications.chart.users.login
On a central server
Doesn’t scale
Single point of failure
On a load-balanced set of servers
AWS ELB doesn’t listen on UDP
One stat will be aggregated in multiple places
15. StatsD - Solution
StatsD with smart-
repeater on reporting
servers
Accepts UDP and sends
TCP for reliability
Reduces chattiness over
the wire
Allows aggregation to
occur at a centralized
location
As scalable and
available as the
application servers
16. StatsD - Solution
AWS Elastic Load
Balancer distributes
traffic to ha-receivers
HA-receivers:
Duplicate and transform
metrics
Deliver metrics to correct
server for aggregation
Are stateless – they scale
horizontally
Are highly available
behind the ELB
17. StatsD - Solution
HA-receivers pass the
data to StatsD
StatsD does the final
aggregation
Every metric has
exactly one StatsD
destination
Aggregated metrics
are sent to carbon
18. Carbon & Whisper
Carbon and whisper direct data to disk
The daemons are stateless except for buffers
Carbon consists of multiple daemons
Carbon-relay: Direct traffic to other carbon daemons
Carbon-aggregator: A mix between carbon-relay and
StatsD
Carbon-cache: Gather metrics in a buffer, and write
them to disk using whisper
Whisper is called from carbon-cache, and is short-
lived
19. Carbon & Whisper
We chose to use sharding
Every server holds 1/n metrics, where n = # shards
All servers in a shard hold the same data
Syncing data requires a single rsync
A b-tree of carbon-relays is used to pick a shard
Adding new shards is as easy as adding a new node in
the b-tree of carbon-relays
Retrieving data can be done by checking one server
from every shard
20. Carbon & Whisper
StatsD sends metrics to
the root carbon-relay on
localhost
Carbon-relay is setup in
a binary tree to pick a
shard
Every metric goes to
exactly one shard
Every carbon-relay goes
to either 1 shard or 2
relays
21. Carbon & Whisper
Carbon-cache receives
the metrics from the
final relay
Metrics are written to
disk using whisper on
localhost
Carbon-cache has a
last-in-wins policy
22. graphite-web
Graphite-web is stateless
All state is contained within carbon-cache
Reading data out from a highly available, scalable
graphite installation is the same as reading from a
single server
Use the same ELB as the ha-receiver
23. Nark
Nark is stateless
All state is contained in MySQL and Graphite
Nark will be no more highly available than your
MySQL and Graphite installations
Use an ELB, an autoscale group, and a multi-AZ RDS
instance
26. Join The Team
• Building the
next generation
of
collaborative
web
applications
• VC funded
• High growth
rate
• Profitable
• Graduates
from Harvard,
MIT, Stanford
• Former Google,
Amazon,
Microsoft
employees
https://www.golucid.co/jobs