SlideShare a Scribd company logo
GRAPHITE:
HIGHLY AVAILABLE
Alyssa Stringham & Matthew Barlocker
About Alyssa
 Software Developer at
Lucid Software Inc
 BYU graduate with
Bachelors in Computer
Science
 I love
 Playing the carillon and
piano
 Fast-paced board games
 Hats
 Traveling
 Playing foosball
About “The Barlocker”
• Chief Architect at Lucid
Software Inc
• Bachelors degree from
BYU in Computer Science
• I love to
• play board games
• go 4-wheeling
• wrestle my sons
• fly airplanes
• Follow me on
nineofclouds.blogspot.com
Tools
Graphite
 Graphite is a highly scalable real-time graphing system
 Initially developed by Chris Davis at Orbitz.com
 Comprised of 3 related projects
 Carbon – collects and records metrics
 Whisper – Backend storage mechanism
 Graphite-Web – HTTP frontend that displays graphs
 Written in Python
 http://graphite.wikidot.com/
 https://github.com/graphite-project/
StatsD
 A network daemon that aggregates statistics for
backend services.
 Developed by Etsy
 Written in Node.js
 https://github.com/etsy/statsd/
 http://codeascraft.etsy.com/2011/02/15/measure
-anything-measure-everything/
HA Receiver
 Used to make StatsD highly available and scalable.
 Initially developed by Matthew Barlocker at Lucid
Software Inc
 Written in Node
 https://github.com/lucidsoftware/statsd-ha-receiver
Nark
 Nark is an alerting and dashboard frontend for
Graphite.
 Under active development by Lucid Software.
 Written in Scala using the Play! Framework
 MySQL backed
 https://github.com/lucidchart/nark
Demo
Data Flow Overview
Data Flows IN
 Applications report
different types of
metrics
 StatsD aggregates
metrics
 Carbon-cache gathers
and groups metrics
 Whisper stores metrics
to disk
Data Flows OUT
 User initiates request
over HTTP
 Graphite-web requests
information from
carbon-cache
 Carbon-cache reads
data from disk using
whisper
 Graphite-web builds
graph using data
High Availability & Scaling
StatsD - Options
 We can put StatsD in 3 places:
 On the reporting server
 Scales as well as your reporting servers do
 As available as the reporting servers are
 Can’t get vital metrics like
stats.production.applications.chart.users.login
 On a central server
 Doesn’t scale
 Single point of failure
 On a load-balanced set of servers
 AWS ELB doesn’t listen on UDP
 One stat will be aggregated in multiple places
StatsD - Solution
 StatsD with smart-
repeater on reporting
servers
 Accepts UDP and sends
TCP for reliability
 Reduces chattiness over
the wire
 Allows aggregation to
occur at a centralized
location
 As scalable and
available as the
application servers
StatsD - Solution
 AWS Elastic Load
Balancer distributes
traffic to ha-receivers
 HA-receivers:
 Duplicate and transform
metrics
 Deliver metrics to correct
server for aggregation
 Are stateless – they scale
horizontally
 Are highly available
behind the ELB
StatsD - Solution
 HA-receivers pass the
data to StatsD
 StatsD does the final
aggregation
 Every metric has
exactly one StatsD
destination
 Aggregated metrics
are sent to carbon
Carbon & Whisper
 Carbon and whisper direct data to disk
 The daemons are stateless except for buffers
 Carbon consists of multiple daemons
 Carbon-relay: Direct traffic to other carbon daemons
 Carbon-aggregator: A mix between carbon-relay and
StatsD
 Carbon-cache: Gather metrics in a buffer, and write
them to disk using whisper
 Whisper is called from carbon-cache, and is short-
lived
Carbon & Whisper
 We chose to use sharding
 Every server holds 1/n metrics, where n = # shards
 All servers in a shard hold the same data
 Syncing data requires a single rsync
 A b-tree of carbon-relays is used to pick a shard
 Adding new shards is as easy as adding a new node in
the b-tree of carbon-relays
 Retrieving data can be done by checking one server
from every shard
Carbon & Whisper
 StatsD sends metrics to
the root carbon-relay on
localhost
 Carbon-relay is setup in
a binary tree to pick a
shard
 Every metric goes to
exactly one shard
 Every carbon-relay goes
to either 1 shard or 2
relays
Carbon & Whisper
 Carbon-cache receives
the metrics from the
final relay
 Metrics are written to
disk using whisper on
localhost
 Carbon-cache has a
last-in-wins policy
graphite-web
 Graphite-web is stateless
 All state is contained within carbon-cache
 Reading data out from a highly available, scalable
graphite installation is the same as reading from a
single server
 Use the same ELB as the ha-receiver
Nark
 Nark is stateless
 All state is contained in MySQL and Graphite
 Nark will be no more highly available than your
MySQL and Graphite installations
 Use an ELB, an autoscale group, and a multi-AZ RDS
instance
Recap
Questions?
Feature Requests?
Thanks For Your Time
Join The Team
• Building the
next generation
of
collaborative
web
applications
• VC funded
• High growth
rate
• Profitable
• Graduates
from Harvard,
MIT, Stanford
• Former Google,
Amazon,
Microsoft
employees
https://www.golucid.co/jobs

More Related Content

Highly Available Graphite

  • 2. About Alyssa  Software Developer at Lucid Software Inc  BYU graduate with Bachelors in Computer Science  I love  Playing the carillon and piano  Fast-paced board games  Hats  Traveling  Playing foosball
  • 3. About “The Barlocker” • Chief Architect at Lucid Software Inc • Bachelors degree from BYU in Computer Science • I love to • play board games • go 4-wheeling • wrestle my sons • fly airplanes • Follow me on nineofclouds.blogspot.com
  • 5. Graphite  Graphite is a highly scalable real-time graphing system  Initially developed by Chris Davis at Orbitz.com  Comprised of 3 related projects  Carbon – collects and records metrics  Whisper – Backend storage mechanism  Graphite-Web – HTTP frontend that displays graphs  Written in Python  http://graphite.wikidot.com/  https://github.com/graphite-project/
  • 6. StatsD  A network daemon that aggregates statistics for backend services.  Developed by Etsy  Written in Node.js  https://github.com/etsy/statsd/  http://codeascraft.etsy.com/2011/02/15/measure -anything-measure-everything/
  • 7. HA Receiver  Used to make StatsD highly available and scalable.  Initially developed by Matthew Barlocker at Lucid Software Inc  Written in Node  https://github.com/lucidsoftware/statsd-ha-receiver
  • 8. Nark  Nark is an alerting and dashboard frontend for Graphite.  Under active development by Lucid Software.  Written in Scala using the Play! Framework  MySQL backed  https://github.com/lucidchart/nark
  • 11. Data Flows IN  Applications report different types of metrics  StatsD aggregates metrics  Carbon-cache gathers and groups metrics  Whisper stores metrics to disk
  • 12. Data Flows OUT  User initiates request over HTTP  Graphite-web requests information from carbon-cache  Carbon-cache reads data from disk using whisper  Graphite-web builds graph using data
  • 14. StatsD - Options  We can put StatsD in 3 places:  On the reporting server  Scales as well as your reporting servers do  As available as the reporting servers are  Can’t get vital metrics like stats.production.applications.chart.users.login  On a central server  Doesn’t scale  Single point of failure  On a load-balanced set of servers  AWS ELB doesn’t listen on UDP  One stat will be aggregated in multiple places
  • 15. StatsD - Solution  StatsD with smart- repeater on reporting servers  Accepts UDP and sends TCP for reliability  Reduces chattiness over the wire  Allows aggregation to occur at a centralized location  As scalable and available as the application servers
  • 16. StatsD - Solution  AWS Elastic Load Balancer distributes traffic to ha-receivers  HA-receivers:  Duplicate and transform metrics  Deliver metrics to correct server for aggregation  Are stateless – they scale horizontally  Are highly available behind the ELB
  • 17. StatsD - Solution  HA-receivers pass the data to StatsD  StatsD does the final aggregation  Every metric has exactly one StatsD destination  Aggregated metrics are sent to carbon
  • 18. Carbon & Whisper  Carbon and whisper direct data to disk  The daemons are stateless except for buffers  Carbon consists of multiple daemons  Carbon-relay: Direct traffic to other carbon daemons  Carbon-aggregator: A mix between carbon-relay and StatsD  Carbon-cache: Gather metrics in a buffer, and write them to disk using whisper  Whisper is called from carbon-cache, and is short- lived
  • 19. Carbon & Whisper  We chose to use sharding  Every server holds 1/n metrics, where n = # shards  All servers in a shard hold the same data  Syncing data requires a single rsync  A b-tree of carbon-relays is used to pick a shard  Adding new shards is as easy as adding a new node in the b-tree of carbon-relays  Retrieving data can be done by checking one server from every shard
  • 20. Carbon & Whisper  StatsD sends metrics to the root carbon-relay on localhost  Carbon-relay is setup in a binary tree to pick a shard  Every metric goes to exactly one shard  Every carbon-relay goes to either 1 shard or 2 relays
  • 21. Carbon & Whisper  Carbon-cache receives the metrics from the final relay  Metrics are written to disk using whisper on localhost  Carbon-cache has a last-in-wins policy
  • 22. graphite-web  Graphite-web is stateless  All state is contained within carbon-cache  Reading data out from a highly available, scalable graphite installation is the same as reading from a single server  Use the same ELB as the ha-receiver
  • 23. Nark  Nark is stateless  All state is contained in MySQL and Graphite  Nark will be no more highly available than your MySQL and Graphite installations  Use an ELB, an autoscale group, and a multi-AZ RDS instance
  • 24. Recap
  • 26. Join The Team • Building the next generation of collaborative web applications • VC funded • High growth rate • Profitable • Graduates from Harvard, MIT, Stanford • Former Google, Amazon, Microsoft employees https://www.golucid.co/jobs