SlideShare a Scribd company logo
Cloud	
  Architecture	
  Tutorial	
  
   Construc2ng	
  Cloud	
  Architecture	
  the	
  Ne5lix	
  Way	
  



            Gluecon	
  May	
  23rd,	
  2012	
  
               Adrian	
  Cockcro7	
  
                @adrianco	
  #ne:lixcloud	
  
        h=p://www.linkedin.com/in/adriancockcro7	
  
Netflix Architecture Tutorial at Gluecon
Tutorial	
  Abstract	
  –	
  Set	
  Context	
  
•    Dispensing	
  with	
  the	
  usual	
  quesMons:	
  “Why	
  Ne:lix,	
  why	
  cloud,	
  why	
  AWS?”	
  as	
  they	
  are	
  old	
  hat	
  now.	
  
•    This	
  tutorial	
  explains	
  how	
  developers	
  use	
  the	
  Ne:lix	
  cloud,	
  and	
  how	
  it	
  is	
  built	
  and	
  operated.	
  
•    The	
  real	
  meat	
  of	
  the	
  tutorial	
  comes	
  when	
  we	
  look	
  at	
  how	
  to	
  construct	
  an	
  applicaMon	
  with	
  a	
  host	
  of	
  
     important	
  properMes:	
  elasMc,	
  dynamic,	
  scalable,	
  agile,	
  fast,	
  cheap,	
  robust,	
  durable,	
  observable,	
  
     secure.	
  Over	
  the	
  last	
  three	
  years	
  Ne:lix	
  has	
  figured	
  out	
  cloud	
  based	
  soluMons	
  with	
  these	
  
     properMes,	
  deployed	
  them	
  globally	
  at	
  large	
  scale	
  and	
  refined	
  them	
  into	
  a	
  global	
  Java	
  oriented	
  
     Pla:orm	
  as	
  a	
  Service.	
  The	
  PaaS	
  is	
  based	
  on	
  low	
  cost	
  open	
  source	
  building	
  blocks	
  such	
  as	
  Apache	
  
     Tomcat,	
  Apache	
  Cassandra,	
  and	
  Memcached.	
  Components	
  of	
  this	
  pla:orm	
  are	
  in	
  the	
  process	
  of	
  
     being	
  open-­‐sourced	
  by	
  Ne:lix,	
  so	
  that	
  other	
  companies	
  can	
  get	
  a	
  start	
  on	
  building	
  their	
  own	
  
     customized	
  PaaS	
  that	
  leverages	
  advanced	
  features	
  of	
  AWS	
  and	
  supports	
  rapid	
  agile	
  development.	
  
•    The	
  architecture	
  is	
  described	
  in	
  terms	
  of	
  anM-­‐pa=erns	
  -­‐	
  things	
  to	
  avoid	
  in	
  the	
  datacenter	
  to	
  cloud	
  
     transiMon.	
  A	
  scalable	
  global	
  persistence	
  Mer	
  based	
  on	
  Cassandra	
  provides	
  a	
  highly	
  available	
  and	
  
     durable	
  under-­‐pinning.	
  Lessons	
  learned	
  will	
  cover	
  soluMons	
  to	
  common	
  problems,	
  availability	
  and	
  
     robustness,	
  observability.	
  A=endees	
  should	
  leave	
  the	
  tutorial	
  with	
  a	
  clear	
  understanding	
  of	
  what	
  is	
  
     different	
  about	
  the	
  Ne:lix	
  cloud	
  architecture,	
  how	
  it	
  empowers	
  and	
  supports	
  developers,	
  and	
  a	
  set	
  
     of	
  flexible	
  and	
  scalable	
  open	
  source	
  building	
  blocks	
  that	
  can	
  be	
  used	
  to	
  construct	
  their	
  own	
  cloud	
  
     pla:orm.	
  
PresentaMon	
  vs.	
  Tutorial	
  
•  PresentaMon	
  
   –  Short	
  duraMon,	
  focused	
  subject	
  
   –  One	
  presenter	
  to	
  many	
  anonymous	
  audience	
  
   –  A	
  few	
  quesMons	
  at	
  the	
  end	
  

•  Tutorial	
  
   –  Time	
  to	
  explore	
  in	
  and	
  around	
  the	
  subject	
  
   –  Tutor	
  gets	
  to	
  know	
  the	
  audience	
  
   –  Discussion,	
  rat-­‐holes,	
  “bring	
  out	
  your	
  dead”	
  
Cloud	
  Tutorial	
  SecMons	
  
         Intro:	
  Who	
  are	
  you,	
  what	
  are	
  your	
  quesMons?	
  
	
  
Part	
  1	
  –	
  WriMng	
  and	
  Performing	
  
     	
  Developer	
  Viewpoint	
  
	
  
Part	
  2	
  –	
  Running	
  the	
  Show	
  
     	
  Operator	
  Viewpoint	
  
	
  
Part	
  3	
  –	
  Making	
  the	
  Instruments	
  
     	
  Builder	
  Viewpoint	
  
Adrian	
  Cockcro7	
  
•    Director,	
  Architecture	
  for	
  Cloud	
  Systems,	
  Ne:lix	
  Inc.	
  
      –  Previously	
  Director	
  for	
  PersonalizaMon	
  Pla:orm	
  

•    DisMnguished	
  Availability	
  Engineer,	
  eBay	
  Inc.	
  2004-­‐7	
  
      –  Founding	
  member	
  of	
  eBay	
  Research	
  Labs	
  

•    DisMnguished	
  Engineer,	
  Sun	
  Microsystems	
  Inc.	
  1988-­‐2004	
  
      –    2003-­‐4	
  Chief	
  Architect	
  High	
  Performance	
  Technical	
  CompuMng	
  
      –    2001	
  Author:	
  Capacity	
  Planning	
  for	
  Web	
  Services	
  
      –    1999	
  Author:	
  Resource	
  Management	
  
      –    1995	
  &	
  1998	
  Author:	
  Sun	
  Performance	
  and	
  Tuning	
  
      –    1996	
  Japanese	
  EdiMon	
  of	
  Sun	
  Performance	
  and	
  Tuning	
  
                •  	
  SPARC	
  &	
  Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)	
  


•    Heavy	
  Metal	
  Bass	
  Guitarist	
  in	
  “Black	
  Tiger”	
  1980-­‐1982	
  
      –  Influenced	
  by	
  Van	
  Halen,	
  Yesterday	
  &	
  Today,	
  AC/DC	
  
•    More	
  
      –  Twi=er	
  @adrianco	
  –	
  Blog	
  h=p://perfcap.blogspot.com	
  
      –  PresentaMons	
  at	
  h=p://www.slideshare.net/adrianco	
  
A=endee	
  IntroducMons	
  
•  Who	
  are	
  you,	
  where	
  do	
  you	
  work	
  
•  Why	
  are	
  you	
  here	
  today,	
  what	
  do	
  you	
  need	
  
•  “Bring	
  out	
  your	
  dead”	
  
    –  Do	
  you	
  have	
  a	
  specific	
  problem	
  or	
  quesMon?	
  
    –  One	
  sentence	
  elevator	
  pitch	
  
•  What	
  instrument	
  do	
  you	
  play?	
  
	
  
WriMng	
  and	
  Performing	
  
 Developer	
  Viewpoint	
  
          Part	
  1	
  of	
  3	
  
Van	
  Halen	
  

Audience	
  and	
  Fans	
  

Listen	
  to	
  Songs	
  and	
  Albums	
  

Wri=en	
  and	
  Played	
  by	
  Van	
  Halen	
  

Using	
  Instruments	
  and	
  Studios	
  
Developers	
  
Toons	
  from	
  gapingvoid.com	
  




                                      Customers	
  

                                      Use	
  Products	
  

                                      Built	
  by	
  Developers	
  

                                      That	
  run	
  on	
  Infrastructure	
  
Why	
  Use	
  Cloud?	
  
                   	
  
“Runnin’	
  with	
  the	
  Devil	
  –	
  Van	
  Halen”	
  
Things	
  we	
  don’t	
  do	
  
 “Unchained	
  –	
  Van	
  Halen”	
  
What	
  do	
  developers	
  care	
  about?	
  

         “Right	
  Now	
  –	
  Van	
  Halen”	
  
Netflix Architecture Tutorial at Gluecon
Keeping	
  up	
  with	
  Developer	
  Trends	
  
                                                         In	
  producMon	
  
                                                         at	
  Ne:lix	
  
•    Big	
  Data/Hadoop	
                                2009	
  
•    Cloud	
                                             2009	
  
•    ApplicaMon	
  Performance	
  Management	
   2010	
  
•    Integrated	
  DevOps	
  PracMces	
                  2010	
  
•    ConMnuous	
  IntegraMon/Delivery	
                  2010	
  
•    NoSQL	
                                             2010	
  
•    Pla:orm	
  as	
  a	
  Service	
                     2010	
  
•    Social	
  coding,	
  open	
  development/github	
   2011	
  
AWS	
  specific	
  feature	
  dependence….	
  
                      	
  
   “Why	
  can’t	
  this	
  be	
  love?	
  –	
  Van	
  Halen”	
  
Portability	
  vs.	
  FuncMonality	
  
•  Portability	
  –	
  the	
  OperaMons	
  focus	
  
   –  Avoid	
  vendor	
  lock-­‐in	
  
   –  Support	
  datacenter	
  based	
  use	
  cases	
  
   –  Possible	
  operaMons	
  cost	
  savings	
  

•  FuncMonality	
  –	
  the	
  Developer	
  focus	
  
   –  Less	
  complex	
  test	
  and	
  debug,	
  one	
  mature	
  supplier	
  
   –  Faster	
  Mme	
  to	
  market	
  for	
  your	
  products	
  
   –  Possible	
  developer	
  cost	
  savings	
  
Portable	
  PaaS	
  
•  Portable	
  IaaS	
  Base	
  -­‐	
  some	
  AWS	
  compaMbility	
  
    –  Eucalyptus	
  –	
  AWS	
  licensed	
  compaMble	
  subset	
  
    –  CloudStack	
  –	
  Citrix	
  Apache	
  project	
  
    –  OpenStack	
  –	
  Rackspace,	
  Cloudscaling,	
  HP	
  etc.	
  

•  Portable	
  PaaS	
  
    –  Cloud	
  Foundry	
  -­‐	
  run	
  it	
  yourself	
  in	
  your	
  DC	
  
    –  AppFog	
  and	
  Stackato	
  –	
  Cloud	
  Foundry/Openstack	
  
    –  Vendor	
  opMons:	
  Rightscale,	
  Enstratus,	
  Smartscale	
  
FuncMonal	
  PaaS	
  
•  IaaS	
  base	
  -­‐	
  all	
  the	
  features	
  of	
  AWS	
  
     –  Very	
  large	
  scale,	
  mature,	
  global,	
  evolving	
  rapidly	
  
     –  ELB,	
  Autoscale,	
  VPC,	
  SQS,	
  EIP,	
  EMR,	
  DynamoDB	
  etc.	
  
     –  Large	
  files	
  and	
  mulMpart	
  writes	
  in	
  S3	
  


•  FuncMonal	
  PaaS	
  –	
  based	
  on	
  Ne:lix	
  features	
  
     –  Very	
  large	
  scale,	
  mature,	
  flexible,	
  customizable	
  
     –  Asgard	
  console,	
  Monkeys,	
  Big	
  data	
  tools	
  
     –  Cassandra/Zookeeper	
  data	
  store	
  automaMon	
  
Developers	
  choose	
  FuncMonal	
  
                  	
  
   Don’t	
  let	
  the	
  roadie	
  write	
  the	
  set	
  list!	
  
(yes	
  you	
  do	
  need	
  all	
  those	
  guitars	
  on	
  tour…)	
  
Freedom	
  and	
  Responsibility	
  
•  Developers	
  leverage	
  cloud	
  to	
  get	
  freedom	
  
   –  Agility	
  of	
  a	
  single	
  organizaMon,	
  no	
  silos	
  

•  But	
  now	
  developers	
  are	
  responsible	
  
   –  For	
  compliance,	
  performance,	
  availability	
  etc.	
  

   “As	
  far	
  as	
  my	
  rehab	
  is	
  concerned,	
  it	
  is	
  within	
  my	
  
   ability	
  to	
  change	
  and	
  change	
  for	
  the	
  beNer	
  -­‐	
  Eddie	
  
   Van	
  Halen”	
  	
  
Amazon Cloud Terminology Reference
     See http://aws.amazon.com/ This is not a full list of Amazon Web Service features

•    AWS	
  –	
  Amazon	
  Web	
  Services	
  (common	
  name	
  for	
  Amazon	
  cloud)	
  
•    AMI	
  –	
  Amazon	
  Machine	
  Image	
  (archived	
  boot	
  disk,	
  Linux,	
  Windows	
  etc.	
  plus	
  applicaMon	
  code)	
  
•    EC2	
  –	
  ElasMc	
  Compute	
  Cloud	
  
       –    Range	
  of	
  virtual	
  machine	
  types	
  m1,	
  m2,	
  c1,	
  cc,	
  cg.	
  Varying	
  memory,	
  CPU	
  and	
  disk	
  configuraMons.	
  
       –    Instance	
  –	
  a	
  running	
  computer	
  system.	
  Ephemeral,	
  when	
  it	
  is	
  de-­‐allocated	
  nothing	
  is	
  kept.	
  
       –    Reserved	
  Instances	
  –	
  pre-­‐paid	
  to	
  reduce	
  cost	
  for	
  long	
  term	
  usage	
  
       –    Availability	
  Zone	
  –	
  datacenter	
  with	
  own	
  power	
  and	
  cooling	
  hosMng	
  cloud	
  instances	
  
       –    Region	
  –	
  group	
  of	
  Avail	
  Zones	
  –	
  US-­‐East,	
  US-­‐West,	
  EU-­‐Eire,	
  Asia-­‐Singapore,	
  Asia-­‐Japan,	
  SA-­‐Brazil,	
  US-­‐Gov	
  
•    ASG	
  –	
  Auto	
  Scaling	
  Group	
  (instances	
  booMng	
  from	
  the	
  same	
  AMI)	
  
•    S3	
  –	
  Simple	
  Storage	
  Service	
  (h=p	
  access)	
  
•    EBS	
  –	
  ElasMc	
  Block	
  Storage	
  (network	
  disk	
  filesystem	
  can	
  be	
  mounted	
  on	
  an	
  instance)	
  
•    RDS	
  –	
  RelaMonal	
  Database	
  Service	
  (managed	
  MySQL	
  master	
  and	
  slaves)	
  
•    DynamoDB/SDB	
  –	
  Simple	
  Data	
  Base	
  (hosted	
  h=p	
  based	
  NoSQL	
  datastore,	
  DynamoDB	
  replaces	
  SDB)	
  
•    SQS	
  –	
  Simple	
  Queue	
  Service	
  (h=p	
  based	
  message	
  queue)	
  
•    SNS	
  –	
  Simple	
  NoMficaMon	
  Service	
  (h=p	
  and	
  email	
  based	
  topics	
  and	
  messages)	
  
•    EMR	
  –	
  ElasMc	
  Map	
  Reduce	
  (automaMcally	
  managed	
  Hadoop	
  cluster)	
  
•    ELB	
  –	
  ElasMc	
  Load	
  Balancer	
  
•    EIP	
  –	
  ElasMc	
  IP	
  (stable	
  IP	
  address	
  mapping	
  assigned	
  to	
  instance	
  or	
  ELB)	
  
•    VPC	
  –	
  Virtual	
  Private	
  Cloud	
  (single	
  tenant,	
  more	
  flexible	
  network	
  and	
  security	
  constructs)	
  
•    DirectConnect	
  –	
  secure	
  pipe	
  from	
  AWS	
  VPC	
  to	
  external	
  datacenter	
  
•    IAM	
  –	
  IdenMty	
  and	
  Access	
  Management	
  (fine	
  grain	
  role	
  based	
  security	
  keys)	
  
Ne:lix	
  Deployed	
  on	
  AWS	
  
   2009	
            2009	
                  2010	
              2010	
            2010	
             2011	
  

Content	
            Logs	
                  Play	
              WWW	
             API	
                CS	
  
   Content	
             S3	
                                                                         InternaMonal	
  
  Management	
                                   DRM	
             Sign-­‐Up	
      Metadata	
          CS	
  lookup	
  
                      Terabytes	
  


      EC2	
                                                                           Device	
         DiagnosMcs	
  
                           EMR	
             CDN	
  rouMng	
        Search	
          Config	
           &	
  AcMons	
  
    Encoding	
  


      S3	
                                                          Movie	
         TV	
  Movie	
       Customer	
  
                      Hive	
  &	
  Pig	
     Bookmarks	
           Choosing	
       Choosing	
           Call	
  Log	
  
   Petabytes	
  


                       Business	
                                                     Social	
  
                                                Logging	
           RaMngs	
        Facebook	
        CS	
  AnalyMcs	
  
                     Intelligence	
  
   CDNs	
  
    ISPs	
  
  Terabits	
  
 Customers	
  
Datacenter	
  to	
  Cloud	
  TransiMon	
  Goals	
  
                             “Go	
  ahead	
  and	
  Jump	
  –	
  Van	
  Halen”	
  

•  Faster	
  
     –  Lower	
  latency	
  than	
  the	
  equivalent	
  datacenter	
  web	
  pages	
  and	
  API	
  calls	
  
     –  Measured	
  as	
  mean	
  and	
  99th	
  percenMle	
  
     –  For	
  both	
  first	
  hit	
  (e.g.	
  home	
  page)	
  and	
  in-­‐session	
  hits	
  for	
  the	
  same	
  user	
  
•  Scalable	
  
     –  Avoid	
  needing	
  any	
  more	
  datacenter	
  capacity	
  as	
  subscriber	
  count	
  increases	
  
     –  No	
  central	
  verMcally	
  scaled	
  databases	
  
     –  Leverage	
  AWS	
  elasMc	
  capacity	
  effecMvely	
  
•  Available	
  
     –  SubstanMally	
  higher	
  robustness	
  and	
  availability	
  than	
  datacenter	
  services	
  
     –  Leverage	
  mulMple	
  AWS	
  availability	
  zones	
  
     –  No	
  scheduled	
  down	
  Mme,	
  no	
  central	
  database	
  schema	
  to	
  change	
  
•  ProducMve	
  
     –  OpMmize	
  agility	
  of	
  a	
  large	
  development	
  team	
  with	
  automaMon	
  and	
  tools	
  
     –  Leave	
  behind	
  complex	
  tangled	
  datacenter	
  code	
  base	
  (~8	
  year	
  old	
  architecture)	
  
     –  Enforce	
  clean	
  layered	
  interfaces	
  and	
  re-­‐usable	
  components	
  
Datacenter	
  AnM-­‐Pa=erns	
  

  What	
  do	
  we	
  currently	
  do	
  in	
  the	
  
 datacenter	
  that	
  prevents	
  us	
  from	
  
          meeMng	
  our	
  goals?	
  
        “Me	
  Wise	
  Magic	
  –	
  Van	
  Halen”	
  
                             	
  
Ne:lix	
  Datacenter	
  vs.	
  Cloud	
  Arch	
  
   Central	
  SQL	
  Database	
          Distributed	
  Key/Value	
  NoSQL	
  

SMcky	
  In-­‐Memory	
  Session	
         Shared	
  Memcached	
  Session	
  

      Cha=y	
  Protocols	
                 Latency	
  Tolerant	
  Protocols	
  

Tangled	
  Service	
  Interfaces	
         Layered	
  Service	
  Interfaces	
  

    Instrumented	
  Code	
              Instrumented	
  Service	
  Pa=erns	
  

   Fat	
  Complex	
  Objects	
          Lightweight	
  Serializable	
  Objects	
  

 Components	
  as	
  Jar	
  Files	
         Components	
  as	
  Services	
  
The	
  Central	
  SQL	
  Database	
  
•  Datacenter	
  has	
  a	
  central	
  database	
  
   –  Everything	
  in	
  one	
  place	
  is	
  convenient	
  unMl	
  it	
  fails	
  


•  Schema	
  changes	
  require	
  downMme	
  
   –  Customers,	
  movies,	
  history,	
  configuraMon	
  
                             	
  
   AnS-­‐paNern	
  impacts	
  scalability,	
  availability	
  
The	
  Distributed	
  Key-­‐Value	
  Store	
  
•  Cloud	
  has	
  many	
  key-­‐value	
  data	
  stores	
  
    –  More	
  complex	
  to	
  keep	
  track	
  of,	
  do	
  backups	
  etc.	
  
    –  Each	
  store	
  is	
  much	
  simpler	
  to	
  administer	
  
                                                                                  DBA	
  
    –  Joins	
  take	
  place	
  in	
  java	
  code	
  
    –  No	
  schema	
  to	
  change,	
  no	
  scheduled	
  downMme	
  

•  Minimum	
  Latency	
  for	
  Simple	
  Requests	
  
    –  Memcached	
  is	
  dominated	
  by	
  network	
  latency	
  <1ms	
  
    –  Cassandra	
  cross	
  zone	
  replicaMon	
  around	
  one	
  millisecond	
  
    –  DynamoDB	
  replicaMon	
  and	
  auth	
  overheads	
  around	
  5ms	
  
    –  SimpleDB	
  higher	
  replicaMon	
  and	
  auth	
  overhead	
  >10ms	
  
The	
  SMcky	
  Session	
  
•  Datacenter	
  SMcky	
  Load	
  Balancing	
  
   –  Efficient	
  caching	
  for	
  low	
  latency	
  
   –  Tricky	
  session	
  handling	
  code	
  

•  Encourages	
  concentrated	
  funcMonality	
  
   –  one	
  service	
  that	
  does	
  everything	
  
   –  Middle	
  Mer	
  load	
  balancer	
  had	
  issues	
  in	
  pracMce	
  
                              	
  
  AnS-­‐paNern	
  impacts	
  producSvity,	
  availability	
  
Shared	
  Session	
  State	
  
•  ElasMc	
  Load	
  Balancer	
  	
  
    –  We	
  don’t	
  use	
  the	
  cookie	
  based	
  rouMng	
  opMon	
  
    –  External	
  “session	
  caching”	
  with	
  memcached	
  


•  More	
  flexible	
  fine	
  grain	
  services	
  
    –  Any	
  instance	
  can	
  serve	
  any	
  request	
  
    –  Works	
  be=er	
  with	
  auto-­‐scaled	
  instance	
  counts	
  
Cha=y	
  Opaque	
  and	
  Bri=le	
  Protocols	
  
•  Datacenter	
  service	
  protocols	
  
    –  Assumed	
  low	
  latency	
  for	
  many	
  simple	
  requests	
  

•  Based	
  on	
  serializing	
  exisMng	
  java	
  objects	
  
    –  Inefficient	
  formats	
  
    –  IncompaMble	
  when	
  definiMons	
  change	
  
                               	
  
   AnS-­‐paNern	
  causes	
  producSvity,	
  latency	
  and	
  
                     availability	
  issues	
  
Robust	
  and	
  Flexible	
  Protocols	
  
•  Cloud	
  service	
  protocols	
  
    –  JSR311/Jersey	
  is	
  used	
  for	
  REST/HTTP	
  service	
  calls	
  
    –  Custom	
  client	
  code	
  includes	
  service	
  discovery	
  
    –  Support	
  complex	
  data	
  types	
  in	
  a	
  single	
  request	
  

•  Apache	
  Avro	
  
    –  Evolved	
  from	
  Protocol	
  Buffers	
  and	
  Thri7	
  
    –  Includes	
  JSON	
  header	
  defining	
  key/value	
  protocol	
  
    –  Avro	
  serializaMon	
  is	
  half	
  the	
  size	
  and	
  several	
  Mmes	
  
       faster	
  than	
  Java	
  serializaMon,	
  more	
  work	
  to	
  code	
  
Persisted	
  Protocols	
  
•  Persist	
  Avro	
  in	
  Memcached	
  
   –  Save	
  space/latency	
  (zigzag	
  encoding,	
  half	
  the	
  size)	
  
   –  New	
  keys	
  are	
  ignored	
  
   –  Missing	
  keys	
  are	
  handled	
  cleanly	
  

•  Avro	
  protocol	
  definiMons	
  
   –  Less	
  bri=le	
  across	
  versions	
  
   –  Can	
  be	
  wri=en	
  in	
  JSON	
  or	
  generated	
  from	
  POJOs	
  
   –  It’s	
  hard,	
  needs	
  be=er	
  tooling	
  
Tangled	
  Service	
  Interfaces	
  
•  Datacenter	
  implementaMon	
  is	
  exposed	
  
    –  Oracle	
  SQL	
  queries	
  mixed	
  into	
  business	
  logic	
  

•  Tangled	
  code	
  
    –  Deep	
  dependencies,	
  false	
  sharing	
  

•  Data	
  providers	
  with	
  sideways	
  dependencies	
  
    –  Everything	
  depends	
  on	
  everything	
  else	
  

     AnS-­‐paNern	
  affects	
  producSvity,	
  availability	
  
Untangled	
  Service	
  Interfaces	
  
•  New	
  Cloud	
  Code	
  With	
  Strict	
  Layering	
  
    –  Compile	
  against	
  interface	
  jar	
  
    –  Can	
  use	
  spring	
  runMme	
  binding	
  to	
  enforce	
  
    –  Fine	
  grain	
  services	
  as	
  components	
  

•  Service	
  interface	
  is	
  the	
  service	
  
    –  ImplementaMon	
  is	
  completely	
  hidden	
  
    –  Can	
  be	
  implemented	
  locally	
  or	
  remotely	
  
    –  ImplementaMon	
  can	
  evolve	
  independently	
  
Untangled	
  Service	
  Interfaces	
  
               Poundcake	
  –	
  Van	
  Halen	
  

Two	
  layers:	
  
•  SAL	
  -­‐	
  Service	
  Access	
  Library	
  
    –  Basic	
  serializaMon	
  and	
  error	
  handling	
  
    –  REST	
  or	
  POJO’s	
  defined	
  by	
  data	
  provider	
  

•  ESL	
  -­‐	
  Extended	
  Service	
  Library	
  
    –  Caching,	
  conveniences,	
  can	
  combine	
  several	
  SALs	
  
    –  Exposes	
  faceted	
  type	
  system	
  (described	
  later)	
  
    –  Interface	
  defined	
  by	
  data	
  consumer	
  in	
  many	
  cases	
  
Service	
  InteracMon	
  Pa=ern	
  
    Sample	
  Swimlane	
  Diagram	
  
Service	
  Architecture	
  Pa=erns	
  
•  Internal	
  Interfaces	
  Between	
  Services	
  
   –  Common	
  pa=erns	
  as	
  templates	
  
   –  Highly	
  instrumented,	
  observable,	
  analyMcs	
  
   –  Service	
  Level	
  Agreements	
  –	
  SLAs	
  


•  Library	
  templates	
  for	
  generic	
  features	
  
   –  Instrumented	
  Ne:lix	
  Base	
  Servlet	
  template	
  
   –  Instrumented	
  generic	
  client	
  interface	
  template	
  
   –  Instrumented	
  S3,	
  SimpleDB,	
  Memcached	
  clients	
  
CLIENT	
  
                                                                  Request	
  Start	
  
                                                                   Timestamp,	
               Client	
  
                                          Inbound	
               Request	
  End	
          outbound	
  
                                       deserialize	
  end	
        Timestamp	
            serialize	
  start	
  
                                         Mmestamp	
  
                                                                                           Mmestamp	
  

                  Inbound	
                                                                                            Client	
  
                 deserialize	
                                                                                      outbound	
  
                    start	
                                                                                        serialize	
  end	
  
                 Mmestamp	
                                                                                         Mmestamp	
  




Client	
  network	
  
    receive	
  
  Mmestamp	
  
                                       Service	
  Request	
                                                                       Client	
  Network	
  
                                                                                                                                       send	
  
                                                                                                                                    Mmestamp	
  



                                      Instruments	
  Every	
  
   Service	
  
network	
  send	
  
 Mmestamp	
  
                                        Step	
  in	
  the	
  call	
                                                                   Service	
  
                                                                                                                                      Network	
  
                                                                                                                                      receive	
  
                                                                                                                                     Mmestamp	
  




                  Service	
                                                                                           Service	
  
                outbound	
                                                                                           inbound	
  
               serialize	
  end	
                                                                                  serialize	
  start	
  
                Mmestamp	
                                                                                          Mmestamp	
  

                                           Service	
                                         Service	
  
                                          outbound	
                                        inbound	
  
                                        serialize	
  start	
     SERVICE	
  execute	
     serialize	
  end	
  
                                                                   request	
  start	
  
                                         Mmestamp	
                                        Mmestamp	
  
                                                                    Mmestamp,	
  
                                                                 execute	
  request	
  
                                                                  end	
  Mmestamp	
  
Boundary	
  Interfaces	
  
•  Isolate	
  teams	
  from	
  external	
  dependencies	
  
   –  Fake	
  SAL	
  built	
  by	
  cloud	
  team	
  
   –  Real	
  SAL	
  provided	
  by	
  data	
  provider	
  team	
  later	
  
   –  ESL	
  built	
  by	
  cloud	
  team	
  using	
  faceted	
  objects	
  

•  Fake	
  data	
  sources	
  allow	
  development	
  to	
  start	
  
   –  e.g.	
  Fake	
  IdenMty	
  SAL	
  for	
  a	
  test	
  set	
  of	
  customers	
  
   –  Development	
  solidifies	
  dependencies	
  early	
  
   –  Helps	
  external	
  team	
  provide	
  the	
  right	
  interface	
  
One	
  Object	
  That	
  Does	
  Everything	
  
               Can’t	
  Get	
  This	
  Stuff	
  No	
  More	
  –	
  Van	
  Halen	
  

•  Datacenter	
  uses	
  a	
  few	
  big	
  complex	
  objects	
  
    –  Good	
  choice	
  for	
  a	
  small	
  team	
  and	
  one	
  instance	
  
    –  ProblemaMc	
  for	
  large	
  teams	
  and	
  many	
  instances	
  

•  False	
  sharing	
  causes	
  tangled	
  dependencies	
  
    –  Movie	
  and	
  Customer	
  objects	
  are	
  foundaMonal	
  
    –  UnproducMve	
  re-­‐integraMon	
  work	
  
                             	
  
AnS-­‐paNern	
  impacSng	
  producSvity	
  and	
  availability	
  
An	
  Interface	
  For	
  Each	
  Component	
  
•  Cloud	
  uses	
  faceted	
  Video	
  and	
  Visitor	
  
    –  Basic	
  types	
  hold	
  only	
  the	
  idenMfier	
  
    –  Facets	
  scope	
  the	
  interface	
  you	
  actually	
  need	
  
    –  Each	
  component	
  can	
  define	
  its	
  own	
  facets	
  

•  No	
  false-­‐sharing	
  and	
  dependency	
  chains	
  
    –  Type	
  manager	
  converts	
  between	
  facets	
  as	
  needed	
  
    –  video.asA(PresentaMonVideo)	
  for	
  www	
  
    –  video.asA(MerchableVideo)	
  for	
  middle	
  Mer	
  
Stan	
  Lanning’s	
  Soap	
  Box	
  
•  Business	
  Level	
  Object	
  -­‐	
  Level	
  Confusion	
                          Listen	
  to	
  the	
  bearded	
  guru…	
  


    –  Don’t	
  pass	
  around	
  IDs	
  when	
  you	
  mean	
  to	
  refer	
  to	
  the	
  BLO	
  

•  Using	
  Basic	
  Types	
  helps	
  the	
  compiler	
  help	
  you	
  
    –  Compile	
  Mme	
  problems	
  are	
  be=er	
  than	
  run	
  Mme	
  problems	
  

•  More	
  readable	
  by	
  people	
  
    –  But	
  beware	
  that	
  asA	
  operaMons	
  may	
  be	
  a	
  lot	
  of	
  work	
  

•  MulMple-­‐inheritance	
  for	
  Java?	
  
    –  Kinda-­‐sorta…	
  
Model	
  Driven	
  Architecture	
  
•  TradiMonal	
  Datacenter	
  PracMces	
  
   –  Lots	
  of	
  unique	
  hand-­‐tweaked	
  systems	
  
   –  Hard	
  to	
  enforce	
  pa=erns	
  
   –  Some	
  use	
  of	
  Puppet	
  to	
  automate	
  changes	
  

•  Model	
  Driven	
  Cloud	
  Architecture	
  
   –  Perforce/Ivy/Jenkins	
  based	
  builds	
  for	
  everything	
  
   –  Every	
  producMon	
  instance	
  is	
  a	
  pre-­‐baked	
  AMI	
  
   –  Every	
  applicaMon	
  is	
  managed	
  by	
  an	
  Autoscaler	
  

                       Every	
  change	
  is	
  a	
  new	
  AMI	
  
Ne:lix	
  PaaS	
  Principles	
  
•  Maximum	
  FuncMonality	
  
     –  Developer	
  producMvity	
  and	
  agility	
  

•  Leverage	
  as	
  much	
  of	
  AWS	
  as	
  possible	
  
     –  AWS	
  is	
  making	
  huge	
  investments	
  in	
  features/scale	
  

•  Interfaces	
  that	
  isolate	
  Apps	
  from	
  AWS	
  
     –  Avoid	
  lock-­‐in	
  to	
  specific	
  AWS	
  API	
  details	
  

•  Portability	
  is	
  a	
  long	
  term	
  goal	
  
     –  Gets	
  easier	
  as	
  other	
  vendors	
  catch	
  up	
  with	
  AWS	
  
Ne:lix	
  Global	
  PaaS	
  Features	
  
•    Supports	
  all	
  AWS	
  Availability	
  Zones	
  and	
  Regions	
  
•    Supports	
  mulMple	
  AWS	
  accounts	
  {test,	
  prod,	
  etc.}	
  
•    Cross	
  Region/Acct	
  Data	
  ReplicaMon	
  and	
  Archiving	
  
•    InternaMonalized,	
  Localized	
  and	
  GeoIP	
  rouMng	
  
•    Security	
  is	
  fine	
  grain,	
  dynamic	
  AWS	
  keys	
  
•    Autoscaling	
  to	
  thousands	
  of	
  instances	
  
•    Monitoring	
  for	
  millions	
  of	
  metrics	
  
•    ProducMve	
  for	
  100s	
  of	
  developers	
  on	
  one	
  product	
  
•    25M+	
  users	
  USA,	
  Canada,	
  LaMn	
  America,	
  UK,	
  Eire	
  
Basic	
  PaaS	
  EnMMes	
  
•  AWS	
  Based	
  EnMMes	
  
    –  Instances	
  and	
  Machine	
  Images,	
  ElasMc	
  IP	
  Addresses	
  
    –  Security	
  Groups,	
  Load	
  Balancers,	
  Autoscale	
  Groups	
  
    –  Availability	
  Zones	
  and	
  Geographic	
  Regions	
  


•  Ne:lix	
  PaaS	
  EnMMes	
  
    –  ApplicaMons	
  (registered	
  services)	
  
    –  Clusters	
  (versioned	
  Autoscale	
  Groups	
  for	
  an	
  App)	
  
    –  ProperMes	
  (dynamic	
  hierarchical	
  configuraMon)	
  
Core	
  PaaS	
  Services	
  
•  AWS	
  Based	
  Services	
  
    –  S3	
  storage,	
  to	
  5TB	
  files,	
  parallel	
  mulMpart	
  writes	
  
    –  SQS	
  –	
  Simple	
  Queue	
  Service.	
  Messaging	
  layer.	
  

•  Ne:lix	
  Based	
  Services	
  
    –  EVCache	
  –	
  memcached	
  based	
  ephemeral	
  cache	
  
    –  Cassandra	
  –	
  distributed	
  persistent	
  data	
  store	
  

•  External	
  Services	
  
    –  GeoIP	
  Lookup	
  interfaced	
  to	
  a	
  vendor	
  
    –  Secure	
  Keystore	
  HSM	
  
Instance	
  Architecture	
  

Linux	
  Base	
  AMI	
  (CentOS	
  or	
  Ubuntu)	
  
   OpMonal	
  
   Apache	
  
  frontend,	
  
                          Java	
  (JDK	
  6	
  or	
  7)	
  
memcached,	
  
non-­‐java	
  apps	
  
                          AppDynamics	
  


 Monitoring	
  
                            appagent	
  
                           monitoring	
             Tomcat	
  
 Log	
  rotaMon	
                                    ApplicaMon	
  war	
  file,	
  base	
         Healthcheck,	
  status	
  
    to	
  S3	
            GC	
  and	
  thread	
     servlet,	
  pla:orm,	
  interface	
        servlets,	
  JMX	
  interface,	
  
AppDynamics	
             dump	
  logging	
         jars	
  for	
  dependent	
  services	
         Servo	
  autoscale	
  
machineagent	
  
        Epic	
  	
  
Security	
  Architecture	
  
•  Instance	
  Level	
  Security	
  baked	
  into	
  base	
  AMI	
  
    –  Login:	
  ssh	
  only	
  allowed	
  via	
  portal	
  (not	
  between	
  instances)	
  
    –  Each	
  app	
  type	
  runs	
  as	
  its	
  own	
  userid	
  app{test|prod}	
  

•  AWS	
  Security,	
  IdenMty	
  and	
  Access	
  Management	
  
    –  Each	
  app	
  has	
  its	
  own	
  security	
  group	
  (firewall	
  ports)	
  
    –  Fine	
  grain	
  user	
  roles	
  and	
  resource	
  ACLs	
  

•  Key	
  Management	
  
    –  AWS	
  Keys	
  dynamically	
  provisioned,	
  easy	
  updates	
  
    –  High	
  grade	
  app	
  specific	
  key	
  management	
  support	
  
ConMnuous	
  IntegraMon	
  /	
  Release	
  
        Lightweight	
  process	
  scales	
  as	
  the	
  organizaMon	
  grows	
  

•  No	
  centralized	
  two-­‐week	
  sprint/release	
  “train”	
  

•  Thousands	
  of	
  builds	
  a	
  day,	
  tens	
  of	
  releases	
  

•  Engineers	
  release	
  at	
  their	
  own	
  pace	
  

•  Unit	
  of	
  release	
  is	
  a	
  web	
  service,	
  over	
  200	
  so	
  far…	
  

•  Dependencies	
  handled	
  as	
  excepMons	
  
Hello	
  World?	
  
                   Ge•ng	
  started	
  for	
  a	
  new	
  developer…	
  

•    Register	
  the	
  “helloadrian”	
  app	
  name	
  in	
  Asgard	
  
•    Get	
  the	
  example	
  helloworld	
  code	
  from	
  perforce	
  
•    Edit	
  some	
  properMes	
  to	
  update	
  the	
  name	
  etc.	
  
•    Check-­‐in	
  the	
  changes	
  
•    Clone	
  a	
  Jenkins	
  build	
  job	
  
•    Build	
  the	
  code	
  
•    Bake	
  the	
  code	
  into	
  an	
  Amazon	
  Machine	
  Image	
  
•    Use	
  Asgard	
  to	
  setup	
  an	
  AutoScaleGroup	
  with	
  the	
  AMI	
  
•    Check	
  instance	
  healthcheck	
  is	
  “Up”	
  using	
  Asgard	
  
•    Hit	
  the	
  URL	
  to	
  get	
  “HTTP	
  200,	
  Hello”	
  back	
  
Register	
  new	
  applicaMon	
  name	
  
                                                                                              	
  
naming	
  rules:	
  all	
  lower	
  case	
  with	
  underscore,	
  no	
  spaces	
  or	
  dashes
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Portals	
  and	
  Explorers	
  
•  Ne:lix	
  ApplicaMon	
  Console	
  (Asgard/NAC)	
  
   –  Primary	
  AWS	
  provisioning/config	
  interface	
  
•  AWS	
  Usage	
  Analyzer	
  
   –  Breaks	
  down	
  costs	
  by	
  applicaMon	
  and	
  resource	
  
•  Cassandra	
  Explorer	
  
   –  Browse	
  clusters,	
  keyspaces,	
  column	
  families	
  
•  Base	
  Server	
  Explorer	
  
   –  Browse	
  service	
  endpoints	
  configuraMon,	
  perf	
  
AWS	
  Usage	
  
for	
  test,	
  carefully	
  omi•ng	
  any	
  $	
  numbers…   	
  
Pla:orm	
  Services	
  
•    Discovery	
  –	
  service	
  registry	
  for	
  “ApplicaMons”	
  
•    IntrospecMon	
  –	
  Entrypoints	
  
•    Cryptex	
  –	
  Dynamic	
  security	
  key	
  management	
  
•    Geo	
  –	
  Geographic	
  IP	
  lookup	
  
•    ConfiguraMon	
  Service	
  –	
  Dynamic	
  properMes	
  
•    LocalizaMon	
  –	
  manage	
  and	
  lookup	
  local	
  translaMons	
  
•    Evcache	
  –	
  ephemeral	
  volaMle	
  cache	
  
•    Cassandra	
  –	
  Cross	
  zone/region	
  distributed	
  data	
  store	
  
•    Zookeeper	
  –	
  Distributed	
  CoordinaMon	
  (Curator)	
  
•    Various	
  proxies	
  –	
  access	
  to	
  old	
  datacenter	
  stuff	
  
IntrospecMon	
  -­‐	
  Entrypoints	
  
•  REST	
  API	
  for	
  tools,	
  apps,	
  explorers,	
  monkeys…	
  
     –  E.g.	
  GET	
  /REST/v1/instance/$INSTANCE_ID	
  

•  AWS	
  Resources	
  
     –  Autoscaling	
  Groups,	
  EIP	
  Groups,	
  Instances	
  

•  Ne:lix	
  PaaS	
  Resources	
  
     –  Discovery	
  ApplicaMons,	
  Clusters	
  of	
  ASGs,	
  History	
  

•  Full	
  History	
  of	
  all	
  Resources	
  
     –  Supports	
  Janitor	
  Monkey	
  cleanup	
  of	
  unused	
  resources	
  
Entrypoints	
  Queries	
  
    MongoDB	
  used	
  for	
  low	
  traffic	
  complex	
  queries	
  against	
  complex	
  objects                	
  
Descrip2on	
                                                       Range	
  expression	
  
Find	
  all	
  acMve	
  instances.	
  	
                           all()	
  
Find	
  all	
  instances	
  associated	
  with	
  a	
  group	
     %(cloudmonkey)	
  
name.	
  
Find	
  all	
  instances	
  associated	
  with	
  a	
              /^cloudmonkey$/discovery()	
  
discovery	
  group. 	
  	
  
Find	
  all	
  auto	
  scale	
  groups	
  with	
  no	
  instances.	
   asg(),-­‐has(INSTANCES;asg())	
  
How	
  many	
  instances	
  are	
  not	
  in	
  an	
  auto	
       count(all(),-­‐info(eval(INSTANCES;asg())))       	
  	
  
scale	
  group?	
  
What	
  groups	
  include	
  an	
  instance?	
                     *(i-­‐4e108521)	
  
What	
  auto	
  scale	
  groups	
  and	
  elasMc	
  load	
         filter(TYPE;asg,elb;*(i-­‐4e108521))	
  
balancers	
  include	
  an	
  instance?	
  
What	
  instance	
  has	
  a	
  given	
  public	
  ip?	
           filter(PUBLIC_IP;174.129.188.{0..255};all())	
  
Metrics	
  Framework	
  
•  System	
  and	
  ApplicaMon	
  
     –  CollecMon,	
  AggregaMon,	
  Querying	
  and	
  ReporMng	
  
     –  Non-­‐blocking	
  logging,	
  avoids	
  log4j	
  lock	
  contenMon	
  
     –  Honu-­‐Streaming	
  -­‐>	
  S3	
  -­‐>	
  EMR	
  -­‐>	
  Hive	
  

•  Performance,	
  Robustness,	
  Monitoring,	
  Analysis	
  
     –  Tracers,	
  Counters	
  –	
  explicit	
  code	
  instrumentaMon	
  log	
  
     –  SLA	
  –	
  service	
  level	
  response	
  Mme	
  percenMles	
  
     –  Servo	
  annotated	
  JMX	
  extract	
  to	
  Cloudwatch	
  

•  Latency	
  TesMng	
  and	
  InspecMon	
  Infrastructure	
  
     –  Latency	
  Monkey	
  injects	
  random	
  delays	
  and	
  errors	
  into	
  service	
  responses	
  
     –  Base	
  Server	
  Explorer	
  Inspect	
  client	
  Mmeouts	
  
     –  Global	
  property	
  management	
  to	
  change	
  client	
  Mmeouts	
  
Interprocess	
  Communica2on	
  
•  Discovery	
  Service	
  registry	
  for	
  “applicaMons”	
  
    –  “here	
  I	
  am”	
  call	
  every	
  30s,	
  drop	
  a7er	
  3	
  missed	
  
    –  “where	
  is	
  everyone”	
  call	
  
    –  Redundant,	
  distributed,	
  moving	
  to	
  Zookeeper	
  

•  NIWS	
  –	
  Ne:lix	
  Internal	
  Web	
  Service	
  client	
  
    –  So7ware	
  Middle	
  Tier	
  Load	
  Balancer	
  
    –  Failure	
  retry	
  moves	
  to	
  next	
  instance	
  
    –  Many	
  opMons	
  for	
  encoding,	
  etc.	
  
Security	
  Key	
  Management	
  
•  AKMS	
  
    –  Dynamic	
  Key	
  Management	
  interface	
  
    –  Update	
  AWS	
  keys	
  at	
  runMme,	
  no	
  restart	
  
    –  All	
  keys	
  stored	
  securely,	
  none	
  on	
  disk	
  or	
  in	
  AMI	
  

•  Cryptex	
  -­‐	
  Flexible	
  key	
  store	
  
    –  Low	
  grade	
  keys	
  processed	
  in	
  client	
  
    –  Medium	
  grade	
  keys	
  processed	
  by	
  Cryptex	
  service	
  
    –  High	
  grade	
  keys	
  processed	
  by	
  hardware	
  (Ingrian)	
  
AWS	
  Persistence	
  Services	
  
•  SimpleDB	
  
    –  Got	
  us	
  started,	
  migrated	
  to	
  Cassandra	
  now	
  
    –  NFSDB	
  -­‐	
  Instrumented	
  wrapper	
  library	
  
    –  Domain	
  and	
  Item	
  sharding	
  (workarounds)	
  

•  S3	
  
    –  Upgraded/Instrumented	
  JetS3t	
  based	
  interface	
  
    –  Supports	
  mulMpart	
  upload	
  and	
  5TB	
  files	
  
    –  Global	
  S3	
  endpoint	
  management	
  
Ne5lix	
  Pla5orm	
  Persistence	
  
•  Ephemeral	
  VolaMle	
  Cache	
  –	
  evcache	
  
    –  Discovery-­‐aware	
  memcached	
  based	
  backend	
  
    –  Client	
  abstracMons	
  for	
  zone	
  aware	
  replicaMon	
  
    –  OpMon	
  to	
  write	
  to	
  all	
  zones,	
  fast	
  read	
  from	
  local	
  

•  Cassandra	
  
    –  Highly	
  available	
  and	
  scalable	
  (more	
  later…)	
  
•  MongoDB	
  
    –  Complex	
  object/query	
  model	
  for	
  small	
  scale	
  use	
  
•  MySQL	
  
    –  Hard	
  to	
  scale,	
  legacy	
  and	
  small	
  relaMonal	
  models	
  
Priam	
  –	
  Cassandra	
  AutomaMon	
  
                Available	
  at	
  h=p://github.com/ne:lix	
  

•    Ne:lix	
  Pla:orm	
  Tomcat	
  Code	
  
•    Zero	
  touch	
  auto-­‐configuraMon	
  
•    State	
  management	
  for	
  Cassandra	
  JVM	
  
•    Token	
  allocaMon	
  and	
  assignment	
  
•    Broken	
  node	
  auto-­‐replacement	
  
•    Full	
  and	
  incremental	
  backup	
  to	
  S3	
  
•    Restore	
  sequencing	
  from	
  S3	
  
•    Grow/Shrink	
  Cassandra	
  “ring”	
  
Astyanax	
  
                         Available	
  at	
  h=p://github.com/ne:lix	
  

•  Cassandra	
  java	
  client	
  
•  API	
  abstracMon	
  on	
  top	
  of	
  Thri7	
  protocol	
  
•  “Fixed”	
  ConnecMon	
  Pool	
  abstracMon	
  (vs.	
  Hector)	
  
      –    Round	
  robin	
  with	
  Failover	
  
      –    Retry-­‐able	
  operaMons	
  not	
  Med	
  to	
  a	
  connecMon	
  
      –    Ne:lix	
  PaaS	
  Discovery	
  service	
  integraMon	
  
      –    Host	
  reconnect	
  (fixed	
  interval	
  or	
  exponenMal	
  backoff)	
  
      –    Token	
  aware	
  to	
  save	
  a	
  network	
  hop	
  –	
  lower	
  latency	
  
      –    Latency	
  aware	
  to	
  avoid	
  compacMng/repairing	
  nodes	
  –	
  lower	
  variance	
  
•    Batch	
  mutaMon:	
  set,	
  put,	
  delete,	
  increment	
  
•    Simplified	
  use	
  of	
  serializers	
  via	
  method	
  overloading	
  (vs.	
  Hector)	
  
•    ConnecMonPoolMonitor	
  interface	
  for	
  counters	
  and	
  tracers	
  
•    Composite	
  Column	
  Names	
  replacing	
  deprecated	
  SuperColumns	
  
Astyanax	
  Query	
  Example	
  
Paginate	
  through	
  all	
  columns	
  in	
  a	
  row	
  
ColumnList<String>	
  columns;	
  
int	
  pageize	
  =	
  10;	
  
try	
  {	
  
	
  	
  	
  	
  RowQuery<String,	
  String>	
  query	
  =	
  keyspace	
  
	
  	
  	
  	
  	
  	
  	
  	
  .prepareQuery(CF_STANDARD1)	
  
	
  	
  	
  	
  	
  	
  	
  	
  .getKey("A")	
  
	
  	
  	
  	
  	
  	
  	
  	
  .setIsPaginaMng()	
  
	
  	
  	
  	
  	
  	
  	
  	
  .withColumnRange(new	
  RangeBuilder().setMaxSize(pageize).build());	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  while	
  (!(columns	
  =	
  query.execute().getResult()).isEmpty())	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  for	
  (Column<String>	
  c	
  :	
  columns)	
  {	
  
	
  	
  	
  	
  	
  	
  	
  	
  }	
  
	
  	
  	
  	
  }	
  
}	
  catch	
  (ConnecMonExcepMon	
  e)	
  {	
  
} 	
  	
  
	
  
High	
  Availability	
  
•  Cassandra	
  stores	
  3	
  local	
  copies,	
  1	
  per	
  zone	
  
       –  Synchronous	
  access,	
  durable,	
  highly	
  available	
  
       –  Read/Write	
  One	
  fastest,	
  least	
  consistent	
  -­‐	
  ~1ms	
  
       –  Read/Write	
  Quorum	
  2	
  of	
  3,	
  consistent	
  -­‐	
  ~3ms	
  

•  AWS	
  Availability	
  Zones	
  
       –  Separate	
  buildings	
  
       –  Separate	
  power	
  etc.	
  
       –  Fairly	
  close	
  together	
  
	
  
“TradiMonal”	
  Cassandra	
  Write	
  Data	
  Flows	
  
            Single	
  Region,	
  MulMple	
  Availability	
  Zone,	
  Not	
  Token	
  Aware	
  

                                                               Cassandra	
  
                                                               • Disks	
  
                                                               • Zone	
  A	
  
                                                              2	
                 2	
  
                                                                        4	
   2	
  
1.  Client	
  Writes	
  to	
  any	
     Cassandra	
  3	
                                  3	
  
                                                                                           Cassandra	
         If	
  a	
  node	
  goes	
  offline,	
  
    Cassandra	
  Node	
                 • Disks	
   5                                      • Disks	
   5	
     hinted	
  handoff	
  
2.  Coordinator	
  Node	
               • Zone	
  C	
                  1                   • Zone	
  A	
       completes	
  the	
  write	
  
    replicates	
  to	
  nodes	
                                                                                when	
  the	
  node	
  comes	
  
    and	
  Zones	
  
                                                             Non	
  Token	
                                    back	
  up.	
  
3.  Nodes	
  return	
  ack	
  to	
  
                                                              Aware	
                                          	
  
    coordinator	
                                             Clients	
                                        Requests	
  can	
  choose	
  to	
  
4.  Coordinator	
  returns	
                                                                 3	
               wait	
  for	
  one	
  node,	
  a	
  
                                        Cassandra	
                                        Cassandra	
  
    ack	
  to	
  client	
               • Disks	
                                          • Disks	
   5	
     quorum,	
  or	
  all	
  nodes	
  to	
  
5.  Data	
  wri=en	
  to	
              • Zone	
  C	
                                      • Zone	
  B	
       ack	
  the	
  write	
  
    internal	
  commit	
  log	
                                                                                	
  
    disk	
  (no	
  more	
  than	
                              Cassandra	
                                     SSTable	
  disk	
  writes	
  and	
  
                                                               • Disks	
  
    10	
  seconds	
  later)	
                                  • Zone	
  B	
  
                                                                                                               compacMons	
  occur	
  
                                                                                                               asynchronously	
  
Astyanax	
  -­‐	
  Cassandra	
  Write	
  Data	
  Flows	
  
                Single	
  Region,	
  MulMple	
  Availability	
  Zone,	
  Token	
  Aware	
  

                                                            Cassandra	
  
                                                            • Disks	
  
                                                            • Zone	
  A	
  

1.  Client	
  Writes	
  to	
           Cassandra	
  2	
                       2	
  
                                                                               Cassandra	
         If	
  a	
  node	
  goes	
  offline,	
  
    nodes	
  and	
  Zones	
            • Disks	
   3                           • Disks	
   3	
     hinted	
  handoff	
  
2.  Nodes	
  return	
  ack	
  to	
     • Zone	
  C	
                1          • Zone	
  A	
       completes	
  the	
  write	
  
    client	
  
3.  Data	
  wri=en	
  to	
  
                                                            Token	
                                when	
  the	
  node	
  comes	
  
                                                                                                   back	
  up.	
  
    internal	
  commit	
  log	
                             Aware	
                                	
  
    disks	
  (no	
  more	
  than	
                          Clients	
            2	
  
                                                                                                   Requests	
  can	
  choose	
  to	
  
    10	
  seconds	
  later)	
          Cassandra	
                             Cassandra	
         wait	
  for	
  one	
  node,	
  a	
  
                                       • Disks	
                               • Disks	
   3	
     quorum,	
  or	
  all	
  nodes	
  to	
  
                                       • Zone	
  C	
                           • Zone	
  B	
       ack	
  the	
  write	
  
                                                                                                   	
  
                                                            Cassandra	
                            SSTable	
  disk	
  writes	
  and	
  
                                                            • Disks	
  
                                                            • Zone	
  B	
  
                                                                                                   compacMons	
  occur	
  
                                                                                                   asynchronously	
  
Data	
  Flows	
  for	
  MulM-­‐Region	
  Writes	
  
              Token	
  Aware,	
  Consistency	
  Level	
  =	
  Local	
  Quorum	
  

1.  Client	
  writes	
  to	
  local	
  replicas	
                                If	
  a	
  node	
  or	
  region	
  goes	
  offline,	
  hinted	
  handoff	
  
2.  Local	
  write	
  acks	
  returned	
  to	
                                   completes	
  the	
  write	
  when	
  the	
  node	
  comes	
  back	
  up.	
  
    Client	
  which	
  conMnues	
  when	
                                        Nightly	
  global	
  compare	
  and	
  repair	
  jobs	
  ensure	
  
    2	
  of	
  3	
  local	
  nodes	
  are	
                                      everything	
  stays	
  consistent.	
  
    commi=ed	
  
3.  Local	
  coordinator	
  writes	
  to	
  
    remote	
  coordinator.	
  	
                                                  Cassandra	
                           100+ms	
  latency	
  
4.  When	
  data	
  arrives,	
  remote	
  
                                                                                                                                                                Cassandra	
  
                                                                                  •  Disks	
                                                                    •  Disks	
  
                                                                                  •  Zone	
  A	
                                                                •  Zone	
  A	
  

    coordinator	
  node	
  acks	
  and	
              Cassandra	
        2	
                          2	
  
                                                                                                     Cassandra	
                           Cassandra	
                             4	
  
                                                                                                                                                                                    Cassandra	
  
                                                                6	
                                                6	
   3	
            5	
   Disks	
  6	
  
    copies	
  to	
  other	
  remote	
  zones	
                                                                                                                                              6	
  
                                                      •  Disks	
                                     •  Disks	
  
                                                      •  Zone	
  C	
                                 •  Zone	
  A	
  
                                                                                                                                         • 
                                                                                                                                           •  Zone	
  C	
                          4	
  Disks	
  A	
  
                                                                                                                                                                                    • 
                                                                                                                                                                                    •  Zone	
  
                                                                                           1	
  
                                                                                                                                                                                           4	
  
5.  Remote	
  nodes	
  ack	
  to	
  local	
                                        US	
                                                                          EU	
  
    coordinator	
                                                                Clients	
                                                                     Clients	
  
                                                      Cassandra	
                                          2	
  
                                                                                                     Cassandra	
                           Cassandra	
                              5	
  
                                                                                                                                                                                    Cassandra	
  
6.  Data	
  flushed	
  to	
  internal	
                •  Disks	
  
                                                      •  Zone	
  C	
  
                                                                                                     •  Disks	
  
                                                                                                                   6	
  
                                                                                                     •  Zone	
  B	
  
                                                                                                                                           •  Disks	
  
                                                                                                                                           •  Zone	
  C	
  
                                                                                                                                                                                    •  Disks	
  6	
  
                                                                                                                                                                                    •  Zone	
  B	
  

    commit	
  log	
  disks	
  (no	
  more	
                                       Cassandra	
                                                                   Cassandra	
  

    than	
  10	
  seconds	
  later)	
  
                                                                                  •  Disks	
                                                                    •  Disks	
  
                                                                                  •  Zone	
  B	
                                                                •  Zone	
  B	
  
Part	
  2.	
  Running	
  the	
  Show	
  
  Operator	
  Viewpoint	
  
Rules	
  of	
  the	
  Roadie	
  
•    Don’t	
  lose	
  stuff	
  
•    Make	
  sure	
  it	
  scales	
  
•    Figure	
  out	
  when	
  it	
  breaks	
  and	
  what	
  broke	
  
•    Yell	
  at	
  the	
  right	
  guy	
  to	
  fix	
  it	
  
•    Keep	
  everything	
  organized	
  
Cassandra	
  Backup	
  	
  
•  Full	
  Backup	
                                                                      Cassandra	
  

                                                                  Cassandra	
                                   Cassandra	
  

    –  Time	
  based	
  snapshot	
  
    –  SSTable	
  compress	
  -­‐>	
  S3	
        Cassandra	
                                                                   Cassandra	
  




•  Incremental	
                                                                           S3	
  
                                                                                         Backup	
  
                                               Cassandra	
                                                                         Cassandra	
  

    –  SSTable	
  write	
  triggers	
  
       compressed	
  copy	
  to	
  S3	
                  Cassandra	
                                                     Cassandra	
  


•  Archive	
                                                                 Cassandra	
             Cassandra	
  


    –  Copy	
  cross	
  region	
  
                                                      A	
  
ETL	
  for	
  Cassandra	
  
•    Data	
  is	
  de-­‐normalized	
  over	
  many	
  clusters!	
  
•    Too	
  many	
  to	
  restore	
  from	
  backups	
  for	
  ETL	
  
•    SoluMon	
  –	
  read	
  backup	
  files	
  using	
  Hadoop	
  
•    Aegisthus	
  
      –  h=p://techblog.ne:lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html	
  

      –  High	
  throughput	
  raw	
  SSTable	
  processing	
  
      –  Re-­‐normalizes	
  many	
  clusters	
  to	
  a	
  consistent	
  view	
  
      –  Extract,	
  Transform,	
  then	
  Load	
  into	
  Teradata	
  
Cassandra	
  Archive	
                                             A	
  

                     Appropriate	
  level	
  of	
  paranoia	
  needed…                       	
  
•  Archive	
  could	
  be	
  un-­‐readable	
  
     –  Restore	
  S3	
  backups	
  weekly	
  from	
  prod	
  to	
  test,	
  and	
  daily	
  ETL	
  

•  Archive	
  could	
  be	
  stolen	
  
     –  PGP	
  Encrypt	
  archive	
  

•  AWS	
  East	
  Region	
  could	
  have	
  a	
  problem	
  
     –  Copy	
  data	
  to	
  AWS	
  West	
  

•  ProducMon	
  AWS	
  Account	
  could	
  have	
  an	
  issue	
  
     –  Separate	
  Archive	
  account	
  with	
  no-­‐delete	
  S3	
  ACL	
  

•  AWS	
  S3	
  could	
  have	
  a	
  global	
  problem	
  
     –  Create	
  an	
  extra	
  copy	
  on	
  a	
  different	
  cloud	
  vendor….	
  
Tools	
  and	
  AutomaMon	
  
•  Developer	
  and	
  Build	
  Tools	
  
      –  Jira,	
  Perforce,	
  Eclipse,	
  Jenkins,	
  Ivy,	
  ArMfactory	
  
      –  Builds,	
  creates	
  .war	
  file,	
  .rpm,	
  bakes	
  AMI	
  and	
  launches	
  

•  Custom	
  Ne:lix	
  ApplicaMon	
  Console	
  
      –  AWS	
  Features	
  at	
  Enterprise	
  Scale	
  (hide	
  the	
  AWS	
  security	
  keys!)	
  
      –  Auto	
  Scaler	
  Group	
  is	
  unit	
  of	
  deployment	
  to	
  producMon	
  

•  Open	
  Source	
  +	
  Support	
  
      –  Apache,	
  Tomcat,	
  Cassandra,	
  Hadoop	
  
      –  Datastax	
  support	
  for	
  Cassandra,	
  AWS	
  support	
  for	
  Hadoop	
  via	
  EMR	
  

•  Monitoring	
  Tools	
  
      –  Alert	
  processing	
  gateway	
  into	
  Pagerduty	
  
      –  AppDynamics	
  –	
  Developer	
  focus	
  for	
  cloud	
  h=p://appdynamics.com	
  
Scalability	
  TesMng	
  
•  Cloud	
  Based	
  TesMng	
  –	
  fricMonless,	
  elasMc	
  
    –  Create/destroy	
  any	
  sized	
  cluster	
  in	
  minutes	
  
    –  Many	
  test	
  scenarios	
  run	
  in	
  parallel	
  

•  Test	
  Scenarios	
  
    –  Internal	
  app	
  specific	
  tests	
  
    –  Simple	
  “stress”	
  tool	
  provided	
  with	
  Cassandra	
  

•  Scale	
  test,	
  keep	
  making	
  the	
  cluster	
  bigger	
  
    –  Check	
  that	
  tooling	
  and	
  automaMon	
  works…	
  
    –  How	
  many	
  ten	
  column	
  row	
  writes/sec	
  can	
  we	
  do?	
  
<DrEvil>ONE	
  MILLION</DrEvil>	
  
Scale-­‐Up	
  Linearity	
  
  h=p://techblog.ne:lix.com/2011/11/benchmarking-­‐cassandra-­‐scalability-­‐on.html	
  


                        Client	
  Writes/s	
  by	
  node	
  count	
  –	
  Replica2on	
  Factor	
  =	
  3	
  
1200000	
  
                                                                                                   1099837	
  
1000000	
  

 800000	
  

 600000	
  
                                                              537172	
  
 400000	
                                        366828	
  

 200000	
                           174373	
  

        0	
  
                0	
             50	
         100	
        150	
            200	
     250	
        300	
          350	
  
Availability	
  and	
  Resilience	
  
Chaos	
  Monkey	
  
•  Computers	
  (Datacenter	
  or	
  AWS)	
  randomly	
  die	
  
    –  Fact	
  of	
  life,	
  but	
  too	
  infrequent	
  to	
  test	
  resiliency	
  
•  Test	
  to	
  make	
  sure	
  systems	
  are	
  resilient	
  
    –  Allow	
  any	
  instance	
  to	
  fail	
  without	
  customer	
  impact	
  
•  Chaos	
  Monkey	
  hours	
  
    –  Monday-­‐Thursday	
  9am-­‐3pm	
  random	
  instance	
  kill	
  
•  ApplicaMon	
  configuraMon	
  opMon	
  
    –  Apps	
  now	
  have	
  to	
  opt-­‐out	
  from	
  Chaos	
  Monkey	
  
Responsibility	
  and	
  Experience	
  
•  Make	
  developers	
  responsible	
  for	
  failures	
  
    –  Then	
  they	
  learn	
  and	
  write	
  code	
  that	
  doesn’t	
  fail	
  
•  Use	
  Incident	
  Reviews	
  to	
  find	
  gaps	
  to	
  fix	
  
    –  Make	
  sure	
  its	
  not	
  about	
  finding	
  “who	
  to	
  blame”	
  
•  Keep	
  Mmeouts	
  short,	
  fail	
  fast	
  
    –  Don’t	
  let	
  cascading	
  Mmeouts	
  stack	
  up	
  
•  Make	
  configuraMon	
  opMons	
  dynamic	
  
    –  You	
  don’t	
  want	
  to	
  push	
  code	
  to	
  tweak	
  an	
  opMon	
  
Resilient	
  Design	
  –	
  Circuit	
  Breakers	
  
h=p://techblog.ne:lix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html	
  
PaaS	
  OperaMonal	
  Model	
  
•  Developers	
  
   –  Provision	
  and	
  run	
  their	
  own	
  code	
  in	
  producMon	
  
   –  Take	
  turns	
  to	
  be	
  on	
  call	
  if	
  it	
  breaks	
  (pagerduty)	
  
   –  Configure	
  autoscalers	
  to	
  handle	
  capacity	
  needs	
  

•  DevOps	
  and	
  PaaS	
  (aka	
  NoOps)	
  
   –  DevOps	
  is	
  used	
  to	
  build	
  and	
  run	
  the	
  PaaS	
  
   –  PaaS	
  constrains	
  Dev	
  to	
  use	
  automaMon	
  instead	
  
   –  PaaS	
  puts	
  more	
  responsibility	
  on	
  Dev,	
  with	
  tools	
  
What’s	
  Le7	
  for	
  Corp	
  IT?	
  
•  Corporate	
  Security	
  and	
  Network	
  Management	
  
    –  Billing	
  and	
  remnants	
  of	
  streaming	
  service	
  back-­‐ends	
  in	
  DC	
  
•  Running	
  Ne:lix’	
  DVD	
  Business	
  
    –    Tens	
  of	
  Oracle	
  instances	
                          Corp	
  WiFi	
  Performance	
  
    –    Hundreds	
  of	
  MySQL	
  instances	
  
    –    Thousands	
  of	
  VMWare	
  VMs	
  
    –    Zabbix,	
  CacM,	
  Splunk,	
  Puppet	
  
•  Employee	
  ProducMvity	
  
    –    Building	
  networks	
  and	
  WiFi	
  
    –    SaaS	
  OneLogin	
  SSO	
  Portal	
  
    –    Evernote	
  Premium,	
  Safari	
  Online	
  Bookshelf,	
  Dropbox	
  for	
  Teams	
  
    –    Google	
  Enterprise	
  Apps,	
  Workday	
  HCM/Expense,	
  Box.com	
  
    –    Many	
  more	
  SaaS	
  migraMons	
  coming…	
  
ImplicaMons	
  for	
  IT	
  OperaMons	
  
•  Cloud	
  is	
  run	
  by	
  developer	
  organizaMon	
  
     –  Product	
  group’s	
  “IT	
  department”	
  is	
  the	
  AWS	
  API	
  and	
  PaaS	
  
     –  CorpIT	
  handles	
  billing	
  and	
  some	
  security	
  funcMons	
  

Cloud	
  capacity	
  is	
  10x	
  bigger	
  than	
  Datacenter	
  
     –  Datacenter	
  oriented	
  IT	
  didn’t	
  scale	
  up	
  as	
  we	
  grew	
  
     –  We	
  moved	
  a	
  few	
  people	
  out	
  of	
  IT	
  to	
  do	
  DevOps	
  for	
  our	
  PaaS	
  

•  TradiMonal	
  IT	
  Roles	
  and	
  Silos	
  are	
  going	
  away	
  
     –  We	
  don’t	
  have	
  SA,	
  DBA,	
  Storage,	
  Network	
  admins	
  for	
  cloud	
  
     –  Developers	
  deploy	
  and	
  “run	
  what	
  they	
  wrote”	
  in	
  producMon	
  
Ne:lix	
  PaaS	
  OrganizaMon	
  
  Developer	
  Org	
  ReporMng	
  into	
  Product	
  Development,	
  not	
  ITops                                                                          	
  

                 Ne:lix	
  Cloud	
  Pla:orm	
  Team	
  
 Cloud	
  Ops	
                                       Build	
  Tools	
              Pla:orm	
  and	
  
                                                                                                               Cloud	
                  Cloud	
  
 Reliability	
              Architecture	
                and	
                       Database	
  
                                                                                                            Performance	
              SoluMons	
  
Engineering	
                                         AutomaMon	
                   Engineering	
  


                                                       Perforce	
  Jenkins	
          Pla:orm	
  jars	
        Cassandra	
  
                            Future	
  planning	
       ArMfactory	
  JIRA	
                                  Benchmarking	
              Monitoring	
  
  Alert	
  RouMng	
                                                                     Key	
  store	
  
                             Security	
  Arch	
                                                                                           Monkeys	
  
Incident	
  Lifecycle	
                               Base	
  AMI,	
  Bakery	
         Zookeeper	
           JVM	
  GC	
  Tuning	
  
                                Efficiency	
           Ne:lix	
  App	
  Console	
                               Wiresharking	
             Entrypoints	
  
                                                                                       Cassandra	
  



                               AWS	
  VPC	
  
    PagerDuty	
               Hyperguard	
                  AWS	
  API	
             AWS	
  Instances	
      AWS	
  Instances	
        AWS	
  Instances	
  
                             Powerpoint	
  J	
  
Part	
  3.	
  Making	
  the	
  Instruments	
  
          Builder	
  Viewpoint	
  
Components	
  
•    ConMnuous	
  build	
  framework	
  turns	
  code	
  into	
  AMIs	
  
•    AWS	
  accounts	
  for	
  test,	
  producMon,	
  etc.	
  
•    Cloud	
  access	
  gateway	
  
•    Service	
  registry	
  
•    ConfiguraMon	
  properMes	
  service	
  
•    Persistence	
  services	
  
•    Monitoring,	
  alert	
  forwarding	
  
•    Backups,	
  archives	
  
Common	
  Build	
  Framework	
  
                     Extracted	
  from	
  
“Building	
  and	
  Deploying	
  Ne:lix	
  in	
  the	
  Cloud”	
  
     by	
  @bmoyles	
  and	
  @garethbowles	
  
             On	
  slideshare.net/ne:lix	
  	
  
Build	
  Pipeline	
  

                                            ArMfactory	
                                      yum	
  

                                                 libraries	
  
  Jenkins	
  
  CBF	
  steps	
  
                       resolve	
                 compile	
                   publish	
                report	
  

         sync	
                      check	
                     build	
                   test	
  

               source	
  


Perforce	
  
              GitHub	
  
Jenkins	
  Architecture	
  
  x86_64	
  slave	
  11	
  
   x86_64	
  slave	
   	
   1	
  
    x86_64	
  slave	
  
  buildnode01	
   slave	
  
   buildnode01	
   1	
  
     x86_64	
  slave	
  
        Standard	
  
    buildnode01	
                                                                   custom	
  slaves	
  
                                                                                    custom	
  slaves	
  
     buildnode01	
  
               group	
                                                              custom	
  slaves	
  
                                                                                    misc.	
  architecture	
  
                                                                                    custom	
  slaves	
  
                                                                                    misc.	
  architecture	
  
        Amazon	
  Linux	
                                                           misc.	
  architecture	
  
                                                                                    custom	
  slaves	
  
                                                    Single	
  Master	
              misc.	
  architecture	
  
                                                                                    Ad-­‐hoc	
  slaves	
  
            m1.xlarge	
                                                             misc.	
  architecture	
  
                                                   Red	
  Hat	
  Linux	
            misc.	
  O/S	
  &	
  
                                             2x	
  quad	
  core	
  x86_64	
         architectures	
  
                                                      26G	
  RAM	
  


    x86_64	
  slave	
  11	
  
     x86_64	
  slave	
   	
  slave	
  
           Custom	
                                                             ~40	
  custom	
  slaves	
  
    buildnode01	
   1	
  
      x86_64	
  slave	
  
     buildnode01	
  
                  group	
  
      buildnode01	
                                                             maintained	
  by	
  product	
  
           Amazon	
  Linux	
                                                    teams	
  
                 various	
  

us-­‐west-­‐1	
  VPC	
                   Ne:lix	
  data	
  center	
              Ne:lix	
  data	
  center	
  and	
  
                                                                                 office	
  
Other	
  Uses	
  of	
  Jenkins	
  

Maintence	
  of	
  test	
  and	
  prod	
  Cassandra	
  clusters	
  
Automated	
  integraMon	
  tests	
  for	
  bake	
  and	
  deploy	
  	
  
ProducMon	
  bake	
  and	
  deployment	
  
Housekeeping	
  of	
  the	
  build	
  /	
  deploy	
  infrastructure	
  
Ne:lix	
  Extensions	
  to	
  Jenkins	
  

"  Job	
  DSL	
  plugin:	
  allow	
  jobs	
  to	
  be	
  set	
  up	
  with	
  
   minimal	
  definiMon,	
  using	
  templates	
  and	
  a	
  
   Groovy-­‐based	
  DSL	
  

"  Housekeeping	
  and	
  maintenance	
  processes	
  
   implemented	
  as	
  Jenkins	
  jobs,	
  system	
  Groovy	
  
   scripts	
  
The	
  DynaSlave	
  Plugin	
  
                         What	
  We	
  Have	
  
"   Exposes	
  a	
  new	
  endpoint	
  in	
  Jenkins	
  that	
  EC2	
  instances	
  
    in	
  VPC	
  use	
  for	
  registraMon	
  

"   Allows	
  a	
  slave	
  to	
  name	
  itself,	
  label	
  itself,	
  tell	
  Jenkins	
  
    how	
  many	
  executors	
  it	
  can	
  support	
  

"   EC2	
  ==	
  Ephemeral.	
  Disconnected	
  nodes	
  that	
  are	
  gone	
  
    for	
  >	
  30	
  mins	
  are	
  reaped	
  

"   Sizing	
  handled	
  by	
  EC2	
  ASGs,	
  tweaks	
  passed	
  through	
  via	
  
    user	
  data	
  (labels,	
  names,	
  etc)	
  
The	
  DynaSlave	
  Plugin	
  
                         What’s	
  Next	
  
"  Enhanced	
  security/registraMon	
  of	
  nodes	
  
"  Dynamic	
  resource	
  management	
  
    "  have	
  Jenkins	
  respond	
  to	
  build	
  demand	
  	
  
"  Slave	
  groups	
  
    "  Allows	
  us	
  to	
  create	
  specialized	
  pools	
  of	
  build	
  nodes	
  
"  Refresh	
  mechanism	
  for	
  slave	
  tools	
  
    "  JDKs,	
  Ant	
  versions,	
  etc.	
  
"  Give	
  it	
  back	
  to	
  the	
  community	
  
    "  watch	
  techblog.ne:lix.com!	
  
The	
  Bakery	
  
•  Create	
  base	
  AMIs	
  
    –  We	
  have	
  CentOS,	
  Ubuntu	
  and	
  Windows	
  base	
  AMIs	
  
    –  All	
  the	
  generic	
  code,	
  apache,	
  tomcat	
  etc.	
  
    –  Standard	
  system	
  and	
  applicaMon	
  monitoring	
  tools	
  
    –  Update	
  ~monthly	
  with	
  patches	
  and	
  new	
  versions	
  

•  Add	
  yummy	
  topping	
  and	
  bake	
  
    –  Build	
  app	
  specific	
  AMI	
  including	
  all	
  code	
  etc.	
  
    –  Bakery	
  mounts	
  EBS	
  snapshot,	
  installs	
  and	
  bakes	
  
    –  One	
  bakery	
  per	
  region,	
  delivers	
  into	
  paastest	
  
    –  Tweak	
  config	
  and	
  publish	
  AMI	
  to	
  paasprod	
  
AWS	
  Accounts	
  
Accounts	
  Isolate	
  Concerns	
  
•  paastest	
  –	
  for	
  development	
  and	
  tesMng	
  
     –  Fully	
  funcMonal	
  deployment	
  of	
  all	
  services	
  
     –  Developer	
  tagged	
  “stacks”	
  for	
  separaMon	
  

•  paasprod	
  –	
  for	
  producMon	
  
     –  Autoscale	
  groups	
  only,	
  isolated	
  instances	
  are	
  terminated	
  
     –  Alert	
  rouMng,	
  backups	
  enabled	
  by	
  default	
  

•  paasaudit	
  –	
  for	
  sensiMve	
  services	
  
     –  To	
  support	
  SOX,	
  PCI,	
  etc.	
  
     –  Extra	
  access	
  controls,	
  audiMng	
  

•  paasarchive	
  –	
  for	
  disaster	
  recovery	
  
     –  Long	
  term	
  archive	
  of	
  backups	
  
     –  Different	
  region,	
  perhaps	
  different	
  vendor	
  
ReservaMons	
  and	
  Billing	
  
•  Consolidated	
  Billing	
  
       –  Combine	
  all	
  accounts	
  into	
  one	
  bill	
  
       –  Pooled	
  capacity	
  for	
  bigger	
  volume	
  discounts	
  
       h=p://docs.amazonwebservices.com/AWSConsolidatedBilling/1.0/AWSConsolidatedBillingGuide.html	
  




•  ReservaMons	
  
       –  Save	
  up	
  to	
  71%	
  on	
  your	
  baseline	
  load	
  
       –  Priority	
  when	
  you	
  request	
  reserved	
  capacity	
  
       –  Unused	
  reservaMons	
  are	
  shared	
  across	
  accounts	
  
	
  
Cloud	
  Access	
  Gateway	
  
•  Datacenter	
  or	
  office	
  based	
  
    –  A	
  separate	
  VM	
  for	
  each	
  AWS	
  account	
  
    –  Two	
  per	
  account	
  for	
  high	
  availability	
  
    –  Mount	
  NFS	
  shared	
  home	
  directories	
  for	
  developers	
  
    –  Instances	
  trust	
  the	
  gateway	
  via	
  a	
  security	
  group	
  

•  Manage	
  how	
  developers	
  login	
  to	
  cloud	
  
    –  Access	
  control	
  via	
  ldap	
  group	
  membership	
  
    –  Audit	
  logs	
  of	
  every	
  login	
  to	
  the	
  cloud	
  
    –  Similar	
  to	
  awsfabrictasks	
  ssh	
  wrapper	
  
    h=p://readthedocs.org/docs/awsfabrictasks/en/latest/	
  
Cloud	
  Access	
  Control	
  
developers	
  
                 Cloud	
  Access	
  
                                       www-­‐     •  Userid	
  wwwprod	
  
                 ssh	
  Gateway	
      prod	
  
                                                       Security	
  groups	
  don’t	
  allow	
  
                                                       ssh	
  between	
  instances	
  

                                              Dal-­‐        •  Userid	
  dalprod	
  
                                              prod	
  


                                       Cass-­‐    •  Userid	
  cassprod	
  
                                       prod	
  
Now	
  Add	
  Code	
  

 Ne:lix	
  has	
  open	
  sourced	
  a	
  lot	
  of	
  
what	
  you	
  need,	
  more	
  is	
  on	
  the	
  way…	
  	
  
Ne:lix	
  Open	
  Source	
  Strategy	
  
•  Release	
  PaaS	
  Components	
  git-­‐by-­‐git	
  
    –  Source	
  at	
  github.com/ne:lix	
  –	
  we	
  build	
  from	
  it…	
  
    –  Intros	
  and	
  techniques	
  at	
  techblog.ne:lix.com	
  
    –  Blog	
  post	
  or	
  new	
  code	
  every	
  few	
  weeks	
  


•  MoMvaMons	
  
    –  Give	
  back	
  to	
  Apache	
  licensed	
  OSS	
  community	
  
    –  MoMvate,	
  retain,	
  hire	
  top	
  engineers	
  
    –  “Peer	
  pressure”	
  code	
  cleanup,	
  external	
  contribuMons	
  
Open	
  Source	
  Projects	
  and	
  Posts	
  
       Legend	
  
  Github	
  /	
  Techblog	
            Priam	
                              Exhibitor	
                     Servo	
  and	
  Autoscaling	
  
                                Cassandra	
  as	
  a	
  Service	
     Zookeeper	
  as	
  a	
  Service	
               Scripts	
  
Apache	
  ContribuMons	
  
                                      Astyanax	
                                                                        Honu	
  
                                                                             Curator	
  
    Techblog	
  Post	
           Cassandra	
  client	
  for	
                                                 Log4j	
  streaming	
  to	
  
                                                                        Zookeeper	
  Pa=erns	
  
                                        Java	
                                                                     Hadoop	
  
     Coming	
  Soon	
  
                                                                            EVCache	
  
                                     CassJMeter	
                                                              Circuit	
  Breaker	
  
                                                                          Memcached	
  as	
  a	
  
                                 Cassandra	
  test	
  suite	
                                               Robust	
  service	
  pa=ern	
  
                                                                             Service	
  

                                       Cassandra	
                                                                 Asgard	
  
                                                                          Discovery	
  Service	
  
                                   MulM-­‐region	
  EC2	
                                                   AutoScaleGroup	
  based	
  
                                                                              Directory	
  
                                   datastore	
  support	
                                                       AWS	
  console	
  

                                       Aegisthus	
  
                                                                           ConfiguraMon	
                       Chaos	
  Monkey	
  
                                     Hadoop	
  ETL	
  for	
              ProperMes	
  Service	
             Robustness	
  verificaMon	
  
                                       Cassandra	
  
Asgard	
  
                               Not	
  quite	
  out	
  yet…	
  

•  Runs	
  in	
  a	
  VM	
  in	
  our	
  datacenter	
  
    –  So	
  it	
  can	
  deploy	
  to	
  an	
  empty	
  account	
  
    –  Groovy/Grails/JVM	
  based	
  
    –  Supports	
  all	
  AWS	
  regions	
  on	
  a	
  global	
  basis	
  

•  Hides	
  the	
  AWS	
  credenMals	
  
    –  Use	
  AWS	
  IAM	
  to	
  issue	
  restricted	
  keys	
  for	
  Asgard	
  
    –  Each	
  Asgard	
  instance	
  manages	
  one	
  account	
  
    –  One	
  install	
  each	
  for	
  paastest,	
  paasprod,	
  paasaudit	
  
“Discovery”	
  -­‐	
  Service	
  Directory	
  
•  Map	
  an	
  instance	
  to	
  a	
  service	
  type	
  
    –  Load	
  balance	
  over	
  clusters	
  of	
  instances	
  
    –  Private	
  namespace,	
  so	
  DNS	
  isn’t	
  useful	
  
    –  FoundaMon	
  service,	
  first	
  to	
  deploy	
  

•  Highly	
  available	
  distributed	
  coordinaMon	
  
    –  Deploy	
  one	
  Apache	
  Zookeeper	
  instance	
  per	
  zone	
  
    –  Ne:lix	
  Curator	
  includes	
  simple	
  discovery	
  service	
  
    –  Ne:lix	
  Exhibitor	
  manages	
  Zookeeper	
  reliably	
  
ConfiguraMon	
  ProperMes	
  Service	
  
•  Dynamic	
  hierarchical	
  &	
  propagates	
  in	
  seconds	
  
       –  Client	
  Mmeouts,	
  feature	
  set	
  enables	
  
       –  Region	
  specific	
  service	
  endpoints	
  
       –  Cassandra	
  token	
  assignments	
  etc.	
  etc.	
  

•  Used	
  to	
  configure	
  everything	
  
       –  So	
  everything	
  depends	
  on	
  it…	
  
       –  Coming	
  soon	
  to	
  github	
  
       –  Pluggable	
  backend	
  storage	
  interface	
  
	
  
Persistence	
  services	
  
•  Use	
  SimpleDB	
  as	
  a	
  bootstrap	
  
       –  Good	
  use	
  case	
  for	
  DynamoDB	
  or	
  SimpleDB	
  


•  Ne:lix	
  Priam	
  
       –  Cassandra	
  automaMon	
  
	
  
Monitoring,	
  alert	
  forwarding	
  
•  MulMple	
  monitoring	
  systems	
  
    –  Internally	
  developed	
  data	
  collecMon	
  runs	
  on	
  AWS	
  
    –  AppDynamics	
  APM	
  product	
  runs	
  as	
  external	
  SaaS	
  
    –  When	
  one	
  breaks	
  the	
  other	
  is	
  usually	
  OK…	
  

•  Alerts	
  routed	
  to	
  the	
  developer	
  of	
  that	
  app	
  
    –  Alert	
  gateway	
  combines	
  alerts	
  from	
  all	
  sources	
  
    –  DeduplicaMon,	
  source	
  quenching,	
  rouMng	
  
    –  Warnings	
  sent	
  via	
  email,	
  criMcal	
  via	
  pagerduty	
  
Backups,	
  archives	
  
•  Cassandra	
  Backup	
  via	
  Priam	
  to	
  S3	
  bucket	
  
    –  Create	
  versioned	
  S3	
  bucket	
  with	
  TTL	
  opMon	
  
    –  Setup	
  service	
  to	
  encrypt	
  and	
  copy	
  to	
  archive	
  


•  Archive	
  Account	
  with	
  Read/Write	
  ACL	
  to	
  prod	
  
    –  Setup	
  in	
  a	
  different	
  AWS	
  region	
  from	
  producMon	
  
    –  Create	
  versioned	
  S3	
  bucket	
  with	
  TTL	
  opMon	
  
Chaos	
  Monkey	
  
•    Install	
  it	
  on	
  day	
  1	
  in	
  test	
  and	
  producMon	
  
•    Prevents	
  people	
  from	
  doing	
  local	
  persistence	
  
•    Kill	
  anything	
  not	
  protected	
  by	
  an	
  ASG	
  
•    Supports	
  whitelist	
  for	
  temporary	
  do-­‐not-­‐kill	
  

•  Open	
  source	
  soon,	
  code	
  cleanup	
  in	
  progress…	
  
You	
  take	
  it	
  from	
  here…	
  
•    Keep	
  watching	
  github	
  for	
  more	
  goodies	
  
•    Add	
  your	
  own	
  code	
  
•    Let	
  us	
  know	
  what	
  you	
  find	
  useful	
  
•    Bugs,	
  patches	
  and	
  addiMons	
  all	
  welcome	
  
•    See	
  you	
  at	
  AWS	
  Re:Invent?	
  
Roadmap	
  for	
  2012	
  
•    More	
  resiliency	
  and	
  improved	
  availability	
  
•    More	
  automaMon,	
  orchestraMon	
  
•    “Hardening”	
  the	
  pla:orm,	
  code	
  clean-­‐up	
  
•    Lower	
  latency	
  for	
  web	
  services	
  and	
  devices	
  
•    IPv6	
  support	
  
•    More	
  open	
  sourced	
  components	
  
Wrap	
  Up	
  
                             	
  
                             	
  
   Answer	
  your	
  remaining	
  quesMons…	
  
                             	
  
What	
  was	
  missing	
  that	
  you	
  wanted	
  to	
  cover?	
  
Takeaway	
  
                                                     	
  
 NeVlix	
  has	
  built	
  and	
  deployed	
  a	
  scalable	
  global	
  PlaVorm	
  as	
  a	
  Service.	
  
                                                     	
  
Key	
  components	
  of	
  the	
  NeVlix	
  PaaS	
  are	
  being	
  released	
  as	
  Open	
  Source	
  
                   projects	
  so	
  you	
  can	
  build	
  your	
  own	
  custom	
  PaaS.	
  
                                                     	
  
                                  h=p://github.com/Ne:lix	
  
                                 h=p://techblog.ne:lix.com	
  
                                 h=p://slideshare.net/Ne:lix	
  
                                               	
  
                          h=p://www.linkedin.com/in/adriancockcro7	
  
                                  @adrianco	
  #ne:lixcloud	
  
                                               	
  
                                    End	
  of	
  Part	
  3	
  of	
  3	
  
Netflix Architecture Tutorial at Gluecon
You	
  want	
  an	
  Encore?	
  

   If	
  there	
  is	
  enough	
  Mme…	
  (there	
  wasn’t)	
  
Something	
  for	
  the	
  hard	
  core	
  complex	
  adapMve	
  
              systems	
  people	
  to	
  digest.	
  
A	
  Discussion	
  of	
  Workloads	
  and	
  
         How	
  They	
  Behave	
  
Workload	
  CharacterisMcs	
  

    •  A	
  quick	
  tour	
  through	
  a	
  taxonomy	
  of	
  
       workload	
  types	
  

    •  Start	
  with	
  the	
  easy	
  ones	
  and	
  work	
  up	
  

    •  Why	
  personalized	
  workloads	
  are	
  different	
  
       and	
  hard	
  

    •  Some	
  examples	
  and	
  coping	
  strategies	
  

5/15/12	
                                                              Slide	
  254	
  
Simple	
  Random	
  Arrivals	
  


    •  Random	
  arrival	
  of	
  transacMons	
  with	
  fixed	
  mean	
  
       service	
  Mme	
  
              –  Li=le’s	
  Law:	
  QueueLength	
  =	
  Throughput	
  *	
  Response	
  
              –  UMlizaMon	
  Law:	
  UMlizaMon	
  =	
  Throughput	
  *	
  ServiceTime	
  

    •  Complex	
  models	
  are	
  o7en	
  reduced	
  to	
  this	
  model	
  
              –  By	
  averaging	
  over	
  longer	
  Mme	
  periods	
  since	
  the	
  formulas	
  
                 only	
  work	
  if	
  you	
  have	
  stable	
  averages	
  
              –  By	
  wishful	
  thinking	
  (i.e.	
  how	
  to	
  fool	
  yourself)	
  

5/15/12	
                                                                                    Slide	
  255	
  
Mixed	
  random	
  arrivals	
  of	
  transacMons	
  
                  with	
  stable	
  mean	
  service	
  Mmes	
  
    •  Think	
  of	
  the	
  grocery	
  store	
  checkout	
  analogy	
  
              –  Trolleys	
  full	
  of	
  shopping	
  vs.	
  baskets	
  full	
  of	
  shopping	
  
              –  Baskets	
  are	
  quick	
  to	
  service,	
  but	
  get	
  stuck	
  behind	
  carts	
  
              –  RelaMve	
  mixture	
  of	
  transacMon	
  types	
  starts	
  to	
  ma=er	
  

    •  Many	
  transacMonal	
  systems	
  handle	
  a	
  mixture	
  
              –  Databases,	
  web	
  services	
  

    •  Consider	
  separaMng	
  fast	
  and	
  slow	
  transacMons	
  
              –  So	
  that	
  we	
  have	
  a	
  “10	
  items	
  or	
  less”	
  line	
  just	
  for	
  baskets	
  
              –  Separate	
  pools	
  of	
  servers	
  for	
  different	
  services	
  
              –  The	
  old	
  rule	
  -­‐	
  don’t	
  mix	
  OLTP	
  with	
  DSS	
  queries	
  in	
  databases	
  

    •  Performance	
  is	
  o7en	
  thread-­‐limited	
  
              –  Thread	
  limit	
  and	
  slow	
  transacMons	
  constrains	
  maximum	
  throughput	
  

    •  Model	
  mix	
  using	
  analyMcal	
  solvers	
  (e.g.	
  PDQ	
  perfdynamics.com)	
  

5/15/12	
                                                                                                             Slide	
  256	
  
Load	
  dependent	
  servers	
  –	
  varying	
  
                  mean	
  service	
  Mmes	
  
•  Mean	
  service	
  Mme	
  may	
  increase	
  at	
  high	
  throughput	
  
       –  Due	
  to	
  non-­‐scalable	
  algorithms,	
  lock	
  contenMon	
  
       –  System	
  runs	
  out	
  of	
  memory	
  and	
  starts	
  paging	
  or	
  frequent	
  GC	
  

•  Mean	
  service	
  Mme	
  may	
  also	
  decrease	
  at	
  high	
  throughput	
  
       –  Elevator	
  seek	
  and	
  write	
  cancellaMon	
  opMmizaMons	
  in	
  storage	
  
       –  Load	
  shedding	
  and	
  simplified	
  fallback	
  modes	
  

•  Systems	
  have	
  “Mpping	
  points”	
  if	
  the	
  service	
  Mme	
  increases	
  
       –      Hysteresis	
  means	
  they	
  don’t	
  come	
  back	
  when	
  load	
  drops	
  
       –      This	
  is	
  why	
  you	
  have	
  to	
  kill	
  catatonic	
  systems	
  
       –      Best	
  designs	
  shed	
  load	
  to	
  be	
  stable	
  at	
  the	
  limit	
  –	
  circuit	
  breaker	
  pa=ern	
  
       –      PracMcal	
  opMon	
  is	
  to	
  try	
  to	
  avoid	
  Mpping	
  points	
  by	
  reducing	
  variance	
  
	
  
•  Model	
  using	
  discrete	
  event	
  simulaMon	
  tools	
  
       –  Behaviour	
  is	
  non-­‐linear	
  and	
  hard	
  to	
  model	
  

5/15/12	
                                                                                                                            Slide	
  257	
  
Self-­‐similar	
  /	
  fractal	
  workloads	
  
•  Bursty	
  rather	
  than	
  random	
  arrival	
  rates	
  

•  Self-­‐similar	
  
       –  Looks	
  “random”	
  at	
  close	
  up,	
  stays	
  “random”	
  as	
  you	
  zoom	
  out	
  
       –  Work	
  arrives	
  in	
  bursts,	
  transacMons	
  aren’t	
  independent	
  
       –  Bursts	
  cluster	
  together	
  in	
  super-­‐bursts,	
  etc.	
  

•  Network	
  packet	
  streams	
  tend	
  to	
  be	
  fractal	
  

•  Common	
  in	
  pracMce,	
  too	
  hard	
  to	
  model	
  
       –  Probably	
  the	
  most	
  common	
  reason	
  why	
  your	
  model	
  is	
  wrong!	
  


5/15/12	
                                                                                  Slide	
  258	
  
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon
Netflix Architecture Tutorial at Gluecon

More Related Content

Netflix Architecture Tutorial at Gluecon

  • 1. Cloud  Architecture  Tutorial   Construc2ng  Cloud  Architecture  the  Ne5lix  Way   Gluecon  May  23rd,  2012   Adrian  Cockcro7   @adrianco  #ne:lixcloud   h=p://www.linkedin.com/in/adriancockcro7  
  • 3. Tutorial  Abstract  –  Set  Context   •  Dispensing  with  the  usual  quesMons:  “Why  Ne:lix,  why  cloud,  why  AWS?”  as  they  are  old  hat  now.   •  This  tutorial  explains  how  developers  use  the  Ne:lix  cloud,  and  how  it  is  built  and  operated.   •  The  real  meat  of  the  tutorial  comes  when  we  look  at  how  to  construct  an  applicaMon  with  a  host  of   important  properMes:  elasMc,  dynamic,  scalable,  agile,  fast,  cheap,  robust,  durable,  observable,   secure.  Over  the  last  three  years  Ne:lix  has  figured  out  cloud  based  soluMons  with  these   properMes,  deployed  them  globally  at  large  scale  and  refined  them  into  a  global  Java  oriented   Pla:orm  as  a  Service.  The  PaaS  is  based  on  low  cost  open  source  building  blocks  such  as  Apache   Tomcat,  Apache  Cassandra,  and  Memcached.  Components  of  this  pla:orm  are  in  the  process  of   being  open-­‐sourced  by  Ne:lix,  so  that  other  companies  can  get  a  start  on  building  their  own   customized  PaaS  that  leverages  advanced  features  of  AWS  and  supports  rapid  agile  development.   •  The  architecture  is  described  in  terms  of  anM-­‐pa=erns  -­‐  things  to  avoid  in  the  datacenter  to  cloud   transiMon.  A  scalable  global  persistence  Mer  based  on  Cassandra  provides  a  highly  available  and   durable  under-­‐pinning.  Lessons  learned  will  cover  soluMons  to  common  problems,  availability  and   robustness,  observability.  A=endees  should  leave  the  tutorial  with  a  clear  understanding  of  what  is   different  about  the  Ne:lix  cloud  architecture,  how  it  empowers  and  supports  developers,  and  a  set   of  flexible  and  scalable  open  source  building  blocks  that  can  be  used  to  construct  their  own  cloud   pla:orm.  
  • 4. PresentaMon  vs.  Tutorial   •  PresentaMon   –  Short  duraMon,  focused  subject   –  One  presenter  to  many  anonymous  audience   –  A  few  quesMons  at  the  end   •  Tutorial   –  Time  to  explore  in  and  around  the  subject   –  Tutor  gets  to  know  the  audience   –  Discussion,  rat-­‐holes,  “bring  out  your  dead”  
  • 5. Cloud  Tutorial  SecMons   Intro:  Who  are  you,  what  are  your  quesMons?     Part  1  –  WriMng  and  Performing    Developer  Viewpoint     Part  2  –  Running  the  Show    Operator  Viewpoint     Part  3  –  Making  the  Instruments    Builder  Viewpoint  
  • 6. Adrian  Cockcro7   •  Director,  Architecture  for  Cloud  Systems,  Ne:lix  Inc.   –  Previously  Director  for  PersonalizaMon  Pla:orm   •  DisMnguished  Availability  Engineer,  eBay  Inc.  2004-­‐7   –  Founding  member  of  eBay  Research  Labs   •  DisMnguished  Engineer,  Sun  Microsystems  Inc.  1988-­‐2004   –  2003-­‐4  Chief  Architect  High  Performance  Technical  CompuMng   –  2001  Author:  Capacity  Planning  for  Web  Services   –  1999  Author:  Resource  Management   –  1995  &  1998  Author:  Sun  Performance  and  Tuning   –  1996  Japanese  EdiMon  of  Sun  Performance  and  Tuning   •   SPARC  &  Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)   •  Heavy  Metal  Bass  Guitarist  in  “Black  Tiger”  1980-­‐1982   –  Influenced  by  Van  Halen,  Yesterday  &  Today,  AC/DC   •  More   –  Twi=er  @adrianco  –  Blog  h=p://perfcap.blogspot.com   –  PresentaMons  at  h=p://www.slideshare.net/adrianco  
  • 7. A=endee  IntroducMons   •  Who  are  you,  where  do  you  work   •  Why  are  you  here  today,  what  do  you  need   •  “Bring  out  your  dead”   –  Do  you  have  a  specific  problem  or  quesMon?   –  One  sentence  elevator  pitch   •  What  instrument  do  you  play?    
  • 8. WriMng  and  Performing   Developer  Viewpoint   Part  1  of  3  
  • 9. Van  Halen   Audience  and  Fans   Listen  to  Songs  and  Albums   Wri=en  and  Played  by  Van  Halen   Using  Instruments  and  Studios  
  • 10. Developers   Toons  from  gapingvoid.com   Customers   Use  Products   Built  by  Developers   That  run  on  Infrastructure  
  • 11. Why  Use  Cloud?     “Runnin’  with  the  Devil  –  Van  Halen”  
  • 12. Things  we  don’t  do   “Unchained  –  Van  Halen”  
  • 13. What  do  developers  care  about?   “Right  Now  –  Van  Halen”  
  • 15. Keeping  up  with  Developer  Trends   In  producMon   at  Ne:lix   •  Big  Data/Hadoop   2009   •  Cloud   2009   •  ApplicaMon  Performance  Management   2010   •  Integrated  DevOps  PracMces   2010   •  ConMnuous  IntegraMon/Delivery   2010   •  NoSQL   2010   •  Pla:orm  as  a  Service   2010   •  Social  coding,  open  development/github   2011  
  • 16. AWS  specific  feature  dependence….     “Why  can’t  this  be  love?  –  Van  Halen”  
  • 17. Portability  vs.  FuncMonality   •  Portability  –  the  OperaMons  focus   –  Avoid  vendor  lock-­‐in   –  Support  datacenter  based  use  cases   –  Possible  operaMons  cost  savings   •  FuncMonality  –  the  Developer  focus   –  Less  complex  test  and  debug,  one  mature  supplier   –  Faster  Mme  to  market  for  your  products   –  Possible  developer  cost  savings  
  • 18. Portable  PaaS   •  Portable  IaaS  Base  -­‐  some  AWS  compaMbility   –  Eucalyptus  –  AWS  licensed  compaMble  subset   –  CloudStack  –  Citrix  Apache  project   –  OpenStack  –  Rackspace,  Cloudscaling,  HP  etc.   •  Portable  PaaS   –  Cloud  Foundry  -­‐  run  it  yourself  in  your  DC   –  AppFog  and  Stackato  –  Cloud  Foundry/Openstack   –  Vendor  opMons:  Rightscale,  Enstratus,  Smartscale  
  • 19. FuncMonal  PaaS   •  IaaS  base  -­‐  all  the  features  of  AWS   –  Very  large  scale,  mature,  global,  evolving  rapidly   –  ELB,  Autoscale,  VPC,  SQS,  EIP,  EMR,  DynamoDB  etc.   –  Large  files  and  mulMpart  writes  in  S3   •  FuncMonal  PaaS  –  based  on  Ne:lix  features   –  Very  large  scale,  mature,  flexible,  customizable   –  Asgard  console,  Monkeys,  Big  data  tools   –  Cassandra/Zookeeper  data  store  automaMon  
  • 20. Developers  choose  FuncMonal     Don’t  let  the  roadie  write  the  set  list!   (yes  you  do  need  all  those  guitars  on  tour…)  
  • 21. Freedom  and  Responsibility   •  Developers  leverage  cloud  to  get  freedom   –  Agility  of  a  single  organizaMon,  no  silos   •  But  now  developers  are  responsible   –  For  compliance,  performance,  availability  etc.   “As  far  as  my  rehab  is  concerned,  it  is  within  my   ability  to  change  and  change  for  the  beNer  -­‐  Eddie   Van  Halen”    
  • 22. Amazon Cloud Terminology Reference See http://aws.amazon.com/ This is not a full list of Amazon Web Service features •  AWS  –  Amazon  Web  Services  (common  name  for  Amazon  cloud)   •  AMI  –  Amazon  Machine  Image  (archived  boot  disk,  Linux,  Windows  etc.  plus  applicaMon  code)   •  EC2  –  ElasMc  Compute  Cloud   –  Range  of  virtual  machine  types  m1,  m2,  c1,  cc,  cg.  Varying  memory,  CPU  and  disk  configuraMons.   –  Instance  –  a  running  computer  system.  Ephemeral,  when  it  is  de-­‐allocated  nothing  is  kept.   –  Reserved  Instances  –  pre-­‐paid  to  reduce  cost  for  long  term  usage   –  Availability  Zone  –  datacenter  with  own  power  and  cooling  hosMng  cloud  instances   –  Region  –  group  of  Avail  Zones  –  US-­‐East,  US-­‐West,  EU-­‐Eire,  Asia-­‐Singapore,  Asia-­‐Japan,  SA-­‐Brazil,  US-­‐Gov   •  ASG  –  Auto  Scaling  Group  (instances  booMng  from  the  same  AMI)   •  S3  –  Simple  Storage  Service  (h=p  access)   •  EBS  –  ElasMc  Block  Storage  (network  disk  filesystem  can  be  mounted  on  an  instance)   •  RDS  –  RelaMonal  Database  Service  (managed  MySQL  master  and  slaves)   •  DynamoDB/SDB  –  Simple  Data  Base  (hosted  h=p  based  NoSQL  datastore,  DynamoDB  replaces  SDB)   •  SQS  –  Simple  Queue  Service  (h=p  based  message  queue)   •  SNS  –  Simple  NoMficaMon  Service  (h=p  and  email  based  topics  and  messages)   •  EMR  –  ElasMc  Map  Reduce  (automaMcally  managed  Hadoop  cluster)   •  ELB  –  ElasMc  Load  Balancer   •  EIP  –  ElasMc  IP  (stable  IP  address  mapping  assigned  to  instance  or  ELB)   •  VPC  –  Virtual  Private  Cloud  (single  tenant,  more  flexible  network  and  security  constructs)   •  DirectConnect  –  secure  pipe  from  AWS  VPC  to  external  datacenter   •  IAM  –  IdenMty  and  Access  Management  (fine  grain  role  based  security  keys)  
  • 23. Ne:lix  Deployed  on  AWS   2009   2009   2010   2010   2010   2011   Content   Logs   Play   WWW   API   CS   Content   S3   InternaMonal   Management   DRM   Sign-­‐Up   Metadata   CS  lookup   Terabytes   EC2   Device   DiagnosMcs   EMR   CDN  rouMng   Search   Config   &  AcMons   Encoding   S3   Movie   TV  Movie   Customer   Hive  &  Pig   Bookmarks   Choosing   Choosing   Call  Log   Petabytes   Business   Social   Logging   RaMngs   Facebook   CS  AnalyMcs   Intelligence   CDNs   ISPs   Terabits   Customers  
  • 24. Datacenter  to  Cloud  TransiMon  Goals   “Go  ahead  and  Jump  –  Van  Halen”   •  Faster   –  Lower  latency  than  the  equivalent  datacenter  web  pages  and  API  calls   –  Measured  as  mean  and  99th  percenMle   –  For  both  first  hit  (e.g.  home  page)  and  in-­‐session  hits  for  the  same  user   •  Scalable   –  Avoid  needing  any  more  datacenter  capacity  as  subscriber  count  increases   –  No  central  verMcally  scaled  databases   –  Leverage  AWS  elasMc  capacity  effecMvely   •  Available   –  SubstanMally  higher  robustness  and  availability  than  datacenter  services   –  Leverage  mulMple  AWS  availability  zones   –  No  scheduled  down  Mme,  no  central  database  schema  to  change   •  ProducMve   –  OpMmize  agility  of  a  large  development  team  with  automaMon  and  tools   –  Leave  behind  complex  tangled  datacenter  code  base  (~8  year  old  architecture)   –  Enforce  clean  layered  interfaces  and  re-­‐usable  components  
  • 25. Datacenter  AnM-­‐Pa=erns   What  do  we  currently  do  in  the   datacenter  that  prevents  us  from   meeMng  our  goals?   “Me  Wise  Magic  –  Van  Halen”    
  • 26. Ne:lix  Datacenter  vs.  Cloud  Arch   Central  SQL  Database   Distributed  Key/Value  NoSQL   SMcky  In-­‐Memory  Session   Shared  Memcached  Session   Cha=y  Protocols   Latency  Tolerant  Protocols   Tangled  Service  Interfaces   Layered  Service  Interfaces   Instrumented  Code   Instrumented  Service  Pa=erns   Fat  Complex  Objects   Lightweight  Serializable  Objects   Components  as  Jar  Files   Components  as  Services  
  • 27. The  Central  SQL  Database   •  Datacenter  has  a  central  database   –  Everything  in  one  place  is  convenient  unMl  it  fails   •  Schema  changes  require  downMme   –  Customers,  movies,  history,  configuraMon     AnS-­‐paNern  impacts  scalability,  availability  
  • 28. The  Distributed  Key-­‐Value  Store   •  Cloud  has  many  key-­‐value  data  stores   –  More  complex  to  keep  track  of,  do  backups  etc.   –  Each  store  is  much  simpler  to  administer   DBA   –  Joins  take  place  in  java  code   –  No  schema  to  change,  no  scheduled  downMme   •  Minimum  Latency  for  Simple  Requests   –  Memcached  is  dominated  by  network  latency  <1ms   –  Cassandra  cross  zone  replicaMon  around  one  millisecond   –  DynamoDB  replicaMon  and  auth  overheads  around  5ms   –  SimpleDB  higher  replicaMon  and  auth  overhead  >10ms  
  • 29. The  SMcky  Session   •  Datacenter  SMcky  Load  Balancing   –  Efficient  caching  for  low  latency   –  Tricky  session  handling  code   •  Encourages  concentrated  funcMonality   –  one  service  that  does  everything   –  Middle  Mer  load  balancer  had  issues  in  pracMce     AnS-­‐paNern  impacts  producSvity,  availability  
  • 30. Shared  Session  State   •  ElasMc  Load  Balancer     –  We  don’t  use  the  cookie  based  rouMng  opMon   –  External  “session  caching”  with  memcached   •  More  flexible  fine  grain  services   –  Any  instance  can  serve  any  request   –  Works  be=er  with  auto-­‐scaled  instance  counts  
  • 31. Cha=y  Opaque  and  Bri=le  Protocols   •  Datacenter  service  protocols   –  Assumed  low  latency  for  many  simple  requests   •  Based  on  serializing  exisMng  java  objects   –  Inefficient  formats   –  IncompaMble  when  definiMons  change     AnS-­‐paNern  causes  producSvity,  latency  and   availability  issues  
  • 32. Robust  and  Flexible  Protocols   •  Cloud  service  protocols   –  JSR311/Jersey  is  used  for  REST/HTTP  service  calls   –  Custom  client  code  includes  service  discovery   –  Support  complex  data  types  in  a  single  request   •  Apache  Avro   –  Evolved  from  Protocol  Buffers  and  Thri7   –  Includes  JSON  header  defining  key/value  protocol   –  Avro  serializaMon  is  half  the  size  and  several  Mmes   faster  than  Java  serializaMon,  more  work  to  code  
  • 33. Persisted  Protocols   •  Persist  Avro  in  Memcached   –  Save  space/latency  (zigzag  encoding,  half  the  size)   –  New  keys  are  ignored   –  Missing  keys  are  handled  cleanly   •  Avro  protocol  definiMons   –  Less  bri=le  across  versions   –  Can  be  wri=en  in  JSON  or  generated  from  POJOs   –  It’s  hard,  needs  be=er  tooling  
  • 34. Tangled  Service  Interfaces   •  Datacenter  implementaMon  is  exposed   –  Oracle  SQL  queries  mixed  into  business  logic   •  Tangled  code   –  Deep  dependencies,  false  sharing   •  Data  providers  with  sideways  dependencies   –  Everything  depends  on  everything  else   AnS-­‐paNern  affects  producSvity,  availability  
  • 35. Untangled  Service  Interfaces   •  New  Cloud  Code  With  Strict  Layering   –  Compile  against  interface  jar   –  Can  use  spring  runMme  binding  to  enforce   –  Fine  grain  services  as  components   •  Service  interface  is  the  service   –  ImplementaMon  is  completely  hidden   –  Can  be  implemented  locally  or  remotely   –  ImplementaMon  can  evolve  independently  
  • 36. Untangled  Service  Interfaces   Poundcake  –  Van  Halen   Two  layers:   •  SAL  -­‐  Service  Access  Library   –  Basic  serializaMon  and  error  handling   –  REST  or  POJO’s  defined  by  data  provider   •  ESL  -­‐  Extended  Service  Library   –  Caching,  conveniences,  can  combine  several  SALs   –  Exposes  faceted  type  system  (described  later)   –  Interface  defined  by  data  consumer  in  many  cases  
  • 37. Service  InteracMon  Pa=ern   Sample  Swimlane  Diagram  
  • 38. Service  Architecture  Pa=erns   •  Internal  Interfaces  Between  Services   –  Common  pa=erns  as  templates   –  Highly  instrumented,  observable,  analyMcs   –  Service  Level  Agreements  –  SLAs   •  Library  templates  for  generic  features   –  Instrumented  Ne:lix  Base  Servlet  template   –  Instrumented  generic  client  interface  template   –  Instrumented  S3,  SimpleDB,  Memcached  clients  
  • 39. CLIENT   Request  Start   Timestamp,   Client   Inbound   Request  End   outbound   deserialize  end   Timestamp   serialize  start   Mmestamp   Mmestamp   Inbound   Client   deserialize   outbound   start   serialize  end   Mmestamp   Mmestamp   Client  network   receive   Mmestamp   Service  Request   Client  Network   send   Mmestamp   Instruments  Every   Service   network  send   Mmestamp   Step  in  the  call   Service   Network   receive   Mmestamp   Service   Service   outbound   inbound   serialize  end   serialize  start   Mmestamp   Mmestamp   Service   Service   outbound   inbound   serialize  start   SERVICE  execute   serialize  end   request  start   Mmestamp   Mmestamp   Mmestamp,   execute  request   end  Mmestamp  
  • 40. Boundary  Interfaces   •  Isolate  teams  from  external  dependencies   –  Fake  SAL  built  by  cloud  team   –  Real  SAL  provided  by  data  provider  team  later   –  ESL  built  by  cloud  team  using  faceted  objects   •  Fake  data  sources  allow  development  to  start   –  e.g.  Fake  IdenMty  SAL  for  a  test  set  of  customers   –  Development  solidifies  dependencies  early   –  Helps  external  team  provide  the  right  interface  
  • 41. One  Object  That  Does  Everything   Can’t  Get  This  Stuff  No  More  –  Van  Halen   •  Datacenter  uses  a  few  big  complex  objects   –  Good  choice  for  a  small  team  and  one  instance   –  ProblemaMc  for  large  teams  and  many  instances   •  False  sharing  causes  tangled  dependencies   –  Movie  and  Customer  objects  are  foundaMonal   –  UnproducMve  re-­‐integraMon  work     AnS-­‐paNern  impacSng  producSvity  and  availability  
  • 42. An  Interface  For  Each  Component   •  Cloud  uses  faceted  Video  and  Visitor   –  Basic  types  hold  only  the  idenMfier   –  Facets  scope  the  interface  you  actually  need   –  Each  component  can  define  its  own  facets   •  No  false-­‐sharing  and  dependency  chains   –  Type  manager  converts  between  facets  as  needed   –  video.asA(PresentaMonVideo)  for  www   –  video.asA(MerchableVideo)  for  middle  Mer  
  • 43. Stan  Lanning’s  Soap  Box   •  Business  Level  Object  -­‐  Level  Confusion   Listen  to  the  bearded  guru…   –  Don’t  pass  around  IDs  when  you  mean  to  refer  to  the  BLO   •  Using  Basic  Types  helps  the  compiler  help  you   –  Compile  Mme  problems  are  be=er  than  run  Mme  problems   •  More  readable  by  people   –  But  beware  that  asA  operaMons  may  be  a  lot  of  work   •  MulMple-­‐inheritance  for  Java?   –  Kinda-­‐sorta…  
  • 44. Model  Driven  Architecture   •  TradiMonal  Datacenter  PracMces   –  Lots  of  unique  hand-­‐tweaked  systems   –  Hard  to  enforce  pa=erns   –  Some  use  of  Puppet  to  automate  changes   •  Model  Driven  Cloud  Architecture   –  Perforce/Ivy/Jenkins  based  builds  for  everything   –  Every  producMon  instance  is  a  pre-­‐baked  AMI   –  Every  applicaMon  is  managed  by  an  Autoscaler   Every  change  is  a  new  AMI  
  • 45. Ne:lix  PaaS  Principles   •  Maximum  FuncMonality   –  Developer  producMvity  and  agility   •  Leverage  as  much  of  AWS  as  possible   –  AWS  is  making  huge  investments  in  features/scale   •  Interfaces  that  isolate  Apps  from  AWS   –  Avoid  lock-­‐in  to  specific  AWS  API  details   •  Portability  is  a  long  term  goal   –  Gets  easier  as  other  vendors  catch  up  with  AWS  
  • 46. Ne:lix  Global  PaaS  Features   •  Supports  all  AWS  Availability  Zones  and  Regions   •  Supports  mulMple  AWS  accounts  {test,  prod,  etc.}   •  Cross  Region/Acct  Data  ReplicaMon  and  Archiving   •  InternaMonalized,  Localized  and  GeoIP  rouMng   •  Security  is  fine  grain,  dynamic  AWS  keys   •  Autoscaling  to  thousands  of  instances   •  Monitoring  for  millions  of  metrics   •  ProducMve  for  100s  of  developers  on  one  product   •  25M+  users  USA,  Canada,  LaMn  America,  UK,  Eire  
  • 47. Basic  PaaS  EnMMes   •  AWS  Based  EnMMes   –  Instances  and  Machine  Images,  ElasMc  IP  Addresses   –  Security  Groups,  Load  Balancers,  Autoscale  Groups   –  Availability  Zones  and  Geographic  Regions   •  Ne:lix  PaaS  EnMMes   –  ApplicaMons  (registered  services)   –  Clusters  (versioned  Autoscale  Groups  for  an  App)   –  ProperMes  (dynamic  hierarchical  configuraMon)  
  • 48. Core  PaaS  Services   •  AWS  Based  Services   –  S3  storage,  to  5TB  files,  parallel  mulMpart  writes   –  SQS  –  Simple  Queue  Service.  Messaging  layer.   •  Ne:lix  Based  Services   –  EVCache  –  memcached  based  ephemeral  cache   –  Cassandra  –  distributed  persistent  data  store   •  External  Services   –  GeoIP  Lookup  interfaced  to  a  vendor   –  Secure  Keystore  HSM  
  • 49. Instance  Architecture   Linux  Base  AMI  (CentOS  or  Ubuntu)   OpMonal   Apache   frontend,   Java  (JDK  6  or  7)   memcached,   non-­‐java  apps   AppDynamics   Monitoring   appagent   monitoring   Tomcat   Log  rotaMon   ApplicaMon  war  file,  base   Healthcheck,  status   to  S3   GC  and  thread   servlet,  pla:orm,  interface   servlets,  JMX  interface,   AppDynamics   dump  logging   jars  for  dependent  services   Servo  autoscale   machineagent   Epic    
  • 50. Security  Architecture   •  Instance  Level  Security  baked  into  base  AMI   –  Login:  ssh  only  allowed  via  portal  (not  between  instances)   –  Each  app  type  runs  as  its  own  userid  app{test|prod}   •  AWS  Security,  IdenMty  and  Access  Management   –  Each  app  has  its  own  security  group  (firewall  ports)   –  Fine  grain  user  roles  and  resource  ACLs   •  Key  Management   –  AWS  Keys  dynamically  provisioned,  easy  updates   –  High  grade  app  specific  key  management  support  
  • 51. ConMnuous  IntegraMon  /  Release   Lightweight  process  scales  as  the  organizaMon  grows   •  No  centralized  two-­‐week  sprint/release  “train”   •  Thousands  of  builds  a  day,  tens  of  releases   •  Engineers  release  at  their  own  pace   •  Unit  of  release  is  a  web  service,  over  200  so  far…   •  Dependencies  handled  as  excepMons  
  • 52. Hello  World?   Ge•ng  started  for  a  new  developer…   •  Register  the  “helloadrian”  app  name  in  Asgard   •  Get  the  example  helloworld  code  from  perforce   •  Edit  some  properMes  to  update  the  name  etc.   •  Check-­‐in  the  changes   •  Clone  a  Jenkins  build  job   •  Build  the  code   •  Bake  the  code  into  an  Amazon  Machine  Image   •  Use  Asgard  to  setup  an  AutoScaleGroup  with  the  AMI   •  Check  instance  healthcheck  is  “Up”  using  Asgard   •  Hit  the  URL  to  get  “HTTP  200,  Hello”  back  
  • 53. Register  new  applicaMon  name     naming  rules:  all  lower  case  with  underscore,  no  spaces  or  dashes
  • 59. Portals  and  Explorers   •  Ne:lix  ApplicaMon  Console  (Asgard/NAC)   –  Primary  AWS  provisioning/config  interface   •  AWS  Usage  Analyzer   –  Breaks  down  costs  by  applicaMon  and  resource   •  Cassandra  Explorer   –  Browse  clusters,  keyspaces,  column  families   •  Base  Server  Explorer   –  Browse  service  endpoints  configuraMon,  perf  
  • 60. AWS  Usage   for  test,  carefully  omi•ng  any  $  numbers…  
  • 61. Pla:orm  Services   •  Discovery  –  service  registry  for  “ApplicaMons”   •  IntrospecMon  –  Entrypoints   •  Cryptex  –  Dynamic  security  key  management   •  Geo  –  Geographic  IP  lookup   •  ConfiguraMon  Service  –  Dynamic  properMes   •  LocalizaMon  –  manage  and  lookup  local  translaMons   •  Evcache  –  ephemeral  volaMle  cache   •  Cassandra  –  Cross  zone/region  distributed  data  store   •  Zookeeper  –  Distributed  CoordinaMon  (Curator)   •  Various  proxies  –  access  to  old  datacenter  stuff  
  • 62. IntrospecMon  -­‐  Entrypoints   •  REST  API  for  tools,  apps,  explorers,  monkeys…   –  E.g.  GET  /REST/v1/instance/$INSTANCE_ID   •  AWS  Resources   –  Autoscaling  Groups,  EIP  Groups,  Instances   •  Ne:lix  PaaS  Resources   –  Discovery  ApplicaMons,  Clusters  of  ASGs,  History   •  Full  History  of  all  Resources   –  Supports  Janitor  Monkey  cleanup  of  unused  resources  
  • 63. Entrypoints  Queries   MongoDB  used  for  low  traffic  complex  queries  against  complex  objects   Descrip2on   Range  expression   Find  all  acMve  instances.     all()   Find  all  instances  associated  with  a  group   %(cloudmonkey)   name.   Find  all  instances  associated  with  a   /^cloudmonkey$/discovery()   discovery  group.     Find  all  auto  scale  groups  with  no  instances.   asg(),-­‐has(INSTANCES;asg())   How  many  instances  are  not  in  an  auto   count(all(),-­‐info(eval(INSTANCES;asg())))     scale  group?   What  groups  include  an  instance?   *(i-­‐4e108521)   What  auto  scale  groups  and  elasMc  load   filter(TYPE;asg,elb;*(i-­‐4e108521))   balancers  include  an  instance?   What  instance  has  a  given  public  ip?   filter(PUBLIC_IP;174.129.188.{0..255};all())  
  • 64. Metrics  Framework   •  System  and  ApplicaMon   –  CollecMon,  AggregaMon,  Querying  and  ReporMng   –  Non-­‐blocking  logging,  avoids  log4j  lock  contenMon   –  Honu-­‐Streaming  -­‐>  S3  -­‐>  EMR  -­‐>  Hive   •  Performance,  Robustness,  Monitoring,  Analysis   –  Tracers,  Counters  –  explicit  code  instrumentaMon  log   –  SLA  –  service  level  response  Mme  percenMles   –  Servo  annotated  JMX  extract  to  Cloudwatch   •  Latency  TesMng  and  InspecMon  Infrastructure   –  Latency  Monkey  injects  random  delays  and  errors  into  service  responses   –  Base  Server  Explorer  Inspect  client  Mmeouts   –  Global  property  management  to  change  client  Mmeouts  
  • 65. Interprocess  Communica2on   •  Discovery  Service  registry  for  “applicaMons”   –  “here  I  am”  call  every  30s,  drop  a7er  3  missed   –  “where  is  everyone”  call   –  Redundant,  distributed,  moving  to  Zookeeper   •  NIWS  –  Ne:lix  Internal  Web  Service  client   –  So7ware  Middle  Tier  Load  Balancer   –  Failure  retry  moves  to  next  instance   –  Many  opMons  for  encoding,  etc.  
  • 66. Security  Key  Management   •  AKMS   –  Dynamic  Key  Management  interface   –  Update  AWS  keys  at  runMme,  no  restart   –  All  keys  stored  securely,  none  on  disk  or  in  AMI   •  Cryptex  -­‐  Flexible  key  store   –  Low  grade  keys  processed  in  client   –  Medium  grade  keys  processed  by  Cryptex  service   –  High  grade  keys  processed  by  hardware  (Ingrian)  
  • 67. AWS  Persistence  Services   •  SimpleDB   –  Got  us  started,  migrated  to  Cassandra  now   –  NFSDB  -­‐  Instrumented  wrapper  library   –  Domain  and  Item  sharding  (workarounds)   •  S3   –  Upgraded/Instrumented  JetS3t  based  interface   –  Supports  mulMpart  upload  and  5TB  files   –  Global  S3  endpoint  management  
  • 68. Ne5lix  Pla5orm  Persistence   •  Ephemeral  VolaMle  Cache  –  evcache   –  Discovery-­‐aware  memcached  based  backend   –  Client  abstracMons  for  zone  aware  replicaMon   –  OpMon  to  write  to  all  zones,  fast  read  from  local   •  Cassandra   –  Highly  available  and  scalable  (more  later…)   •  MongoDB   –  Complex  object/query  model  for  small  scale  use   •  MySQL   –  Hard  to  scale,  legacy  and  small  relaMonal  models  
  • 69. Priam  –  Cassandra  AutomaMon   Available  at  h=p://github.com/ne:lix   •  Ne:lix  Pla:orm  Tomcat  Code   •  Zero  touch  auto-­‐configuraMon   •  State  management  for  Cassandra  JVM   •  Token  allocaMon  and  assignment   •  Broken  node  auto-­‐replacement   •  Full  and  incremental  backup  to  S3   •  Restore  sequencing  from  S3   •  Grow/Shrink  Cassandra  “ring”  
  • 70. Astyanax   Available  at  h=p://github.com/ne:lix   •  Cassandra  java  client   •  API  abstracMon  on  top  of  Thri7  protocol   •  “Fixed”  ConnecMon  Pool  abstracMon  (vs.  Hector)   –  Round  robin  with  Failover   –  Retry-­‐able  operaMons  not  Med  to  a  connecMon   –  Ne:lix  PaaS  Discovery  service  integraMon   –  Host  reconnect  (fixed  interval  or  exponenMal  backoff)   –  Token  aware  to  save  a  network  hop  –  lower  latency   –  Latency  aware  to  avoid  compacMng/repairing  nodes  –  lower  variance   •  Batch  mutaMon:  set,  put,  delete,  increment   •  Simplified  use  of  serializers  via  method  overloading  (vs.  Hector)   •  ConnecMonPoolMonitor  interface  for  counters  and  tracers   •  Composite  Column  Names  replacing  deprecated  SuperColumns  
  • 71. Astyanax  Query  Example   Paginate  through  all  columns  in  a  row   ColumnList<String>  columns;   int  pageize  =  10;   try  {          RowQuery<String,  String>  query  =  keyspace                  .prepareQuery(CF_STANDARD1)                  .getKey("A")                  .setIsPaginaMng()                  .withColumnRange(new  RangeBuilder().setMaxSize(pageize).build());                                      while  (!(columns  =  query.execute().getResult()).isEmpty())  {                  for  (Column<String>  c  :  columns)  {                  }          }   }  catch  (ConnecMonExcepMon  e)  {   }      
  • 72. High  Availability   •  Cassandra  stores  3  local  copies,  1  per  zone   –  Synchronous  access,  durable,  highly  available   –  Read/Write  One  fastest,  least  consistent  -­‐  ~1ms   –  Read/Write  Quorum  2  of  3,  consistent  -­‐  ~3ms   •  AWS  Availability  Zones   –  Separate  buildings   –  Separate  power  etc.   –  Fairly  close  together    
  • 73. “TradiMonal”  Cassandra  Write  Data  Flows   Single  Region,  MulMple  Availability  Zone,  Not  Token  Aware   Cassandra   • Disks   • Zone  A   2   2   4   2   1.  Client  Writes  to  any   Cassandra  3   3   Cassandra   If  a  node  goes  offline,   Cassandra  Node   • Disks   5 • Disks   5   hinted  handoff   2.  Coordinator  Node   • Zone  C   1 • Zone  A   completes  the  write   replicates  to  nodes   when  the  node  comes   and  Zones   Non  Token   back  up.   3.  Nodes  return  ack  to   Aware     coordinator   Clients   Requests  can  choose  to   4.  Coordinator  returns   3   wait  for  one  node,  a   Cassandra   Cassandra   ack  to  client   • Disks   • Disks   5   quorum,  or  all  nodes  to   5.  Data  wri=en  to   • Zone  C   • Zone  B   ack  the  write   internal  commit  log     disk  (no  more  than   Cassandra   SSTable  disk  writes  and   • Disks   10  seconds  later)   • Zone  B   compacMons  occur   asynchronously  
  • 74. Astyanax  -­‐  Cassandra  Write  Data  Flows   Single  Region,  MulMple  Availability  Zone,  Token  Aware   Cassandra   • Disks   • Zone  A   1.  Client  Writes  to   Cassandra  2   2   Cassandra   If  a  node  goes  offline,   nodes  and  Zones   • Disks   3 • Disks   3   hinted  handoff   2.  Nodes  return  ack  to   • Zone  C   1 • Zone  A   completes  the  write   client   3.  Data  wri=en  to   Token   when  the  node  comes   back  up.   internal  commit  log   Aware     disks  (no  more  than   Clients   2   Requests  can  choose  to   10  seconds  later)   Cassandra   Cassandra   wait  for  one  node,  a   • Disks   • Disks   3   quorum,  or  all  nodes  to   • Zone  C   • Zone  B   ack  the  write     Cassandra   SSTable  disk  writes  and   • Disks   • Zone  B   compacMons  occur   asynchronously  
  • 75. Data  Flows  for  MulM-­‐Region  Writes   Token  Aware,  Consistency  Level  =  Local  Quorum   1.  Client  writes  to  local  replicas   If  a  node  or  region  goes  offline,  hinted  handoff   2.  Local  write  acks  returned  to   completes  the  write  when  the  node  comes  back  up.   Client  which  conMnues  when   Nightly  global  compare  and  repair  jobs  ensure   2  of  3  local  nodes  are   everything  stays  consistent.   commi=ed   3.  Local  coordinator  writes  to   remote  coordinator.     Cassandra   100+ms  latency   4.  When  data  arrives,  remote   Cassandra   •  Disks   •  Disks   •  Zone  A   •  Zone  A   coordinator  node  acks  and   Cassandra   2   2   Cassandra   Cassandra   4   Cassandra   6   6   3   5   Disks  6   copies  to  other  remote  zones   6   •  Disks   •  Disks   •  Zone  C   •  Zone  A   •  •  Zone  C   4  Disks  A   •  •  Zone   1   4   5.  Remote  nodes  ack  to  local   US   EU   coordinator   Clients   Clients   Cassandra   2   Cassandra   Cassandra   5   Cassandra   6.  Data  flushed  to  internal   •  Disks   •  Zone  C   •  Disks   6   •  Zone  B   •  Disks   •  Zone  C   •  Disks  6   •  Zone  B   commit  log  disks  (no  more   Cassandra   Cassandra   than  10  seconds  later)   •  Disks   •  Disks   •  Zone  B   •  Zone  B  
  • 76. Part  2.  Running  the  Show   Operator  Viewpoint  
  • 77. Rules  of  the  Roadie   •  Don’t  lose  stuff   •  Make  sure  it  scales   •  Figure  out  when  it  breaks  and  what  broke   •  Yell  at  the  right  guy  to  fix  it   •  Keep  everything  organized  
  • 78. Cassandra  Backup     •  Full  Backup   Cassandra   Cassandra   Cassandra   –  Time  based  snapshot   –  SSTable  compress  -­‐>  S3   Cassandra   Cassandra   •  Incremental   S3   Backup   Cassandra   Cassandra   –  SSTable  write  triggers   compressed  copy  to  S3   Cassandra   Cassandra   •  Archive   Cassandra   Cassandra   –  Copy  cross  region   A  
  • 79. ETL  for  Cassandra   •  Data  is  de-­‐normalized  over  many  clusters!   •  Too  many  to  restore  from  backups  for  ETL   •  SoluMon  –  read  backup  files  using  Hadoop   •  Aegisthus   –  h=p://techblog.ne:lix.com/2012/02/aegisthus-­‐bulk-­‐data-­‐pipeline-­‐out-­‐of.html   –  High  throughput  raw  SSTable  processing   –  Re-­‐normalizes  many  clusters  to  a  consistent  view   –  Extract,  Transform,  then  Load  into  Teradata  
  • 80. Cassandra  Archive   A   Appropriate  level  of  paranoia  needed…   •  Archive  could  be  un-­‐readable   –  Restore  S3  backups  weekly  from  prod  to  test,  and  daily  ETL   •  Archive  could  be  stolen   –  PGP  Encrypt  archive   •  AWS  East  Region  could  have  a  problem   –  Copy  data  to  AWS  West   •  ProducMon  AWS  Account  could  have  an  issue   –  Separate  Archive  account  with  no-­‐delete  S3  ACL   •  AWS  S3  could  have  a  global  problem   –  Create  an  extra  copy  on  a  different  cloud  vendor….  
  • 81. Tools  and  AutomaMon   •  Developer  and  Build  Tools   –  Jira,  Perforce,  Eclipse,  Jenkins,  Ivy,  ArMfactory   –  Builds,  creates  .war  file,  .rpm,  bakes  AMI  and  launches   •  Custom  Ne:lix  ApplicaMon  Console   –  AWS  Features  at  Enterprise  Scale  (hide  the  AWS  security  keys!)   –  Auto  Scaler  Group  is  unit  of  deployment  to  producMon   •  Open  Source  +  Support   –  Apache,  Tomcat,  Cassandra,  Hadoop   –  Datastax  support  for  Cassandra,  AWS  support  for  Hadoop  via  EMR   •  Monitoring  Tools   –  Alert  processing  gateway  into  Pagerduty   –  AppDynamics  –  Developer  focus  for  cloud  h=p://appdynamics.com  
  • 82. Scalability  TesMng   •  Cloud  Based  TesMng  –  fricMonless,  elasMc   –  Create/destroy  any  sized  cluster  in  minutes   –  Many  test  scenarios  run  in  parallel   •  Test  Scenarios   –  Internal  app  specific  tests   –  Simple  “stress”  tool  provided  with  Cassandra   •  Scale  test,  keep  making  the  cluster  bigger   –  Check  that  tooling  and  automaMon  works…   –  How  many  ten  column  row  writes/sec  can  we  do?  
  • 84. Scale-­‐Up  Linearity   h=p://techblog.ne:lix.com/2011/11/benchmarking-­‐cassandra-­‐scalability-­‐on.html   Client  Writes/s  by  node  count  –  Replica2on  Factor  =  3   1200000   1099837   1000000   800000   600000   537172   400000   366828   200000   174373   0   0   50   100   150   200   250   300   350  
  • 86. Chaos  Monkey   •  Computers  (Datacenter  or  AWS)  randomly  die   –  Fact  of  life,  but  too  infrequent  to  test  resiliency   •  Test  to  make  sure  systems  are  resilient   –  Allow  any  instance  to  fail  without  customer  impact   •  Chaos  Monkey  hours   –  Monday-­‐Thursday  9am-­‐3pm  random  instance  kill   •  ApplicaMon  configuraMon  opMon   –  Apps  now  have  to  opt-­‐out  from  Chaos  Monkey  
  • 87. Responsibility  and  Experience   •  Make  developers  responsible  for  failures   –  Then  they  learn  and  write  code  that  doesn’t  fail   •  Use  Incident  Reviews  to  find  gaps  to  fix   –  Make  sure  its  not  about  finding  “who  to  blame”   •  Keep  Mmeouts  short,  fail  fast   –  Don’t  let  cascading  Mmeouts  stack  up   •  Make  configuraMon  opMons  dynamic   –  You  don’t  want  to  push  code  to  tweak  an  opMon  
  • 88. Resilient  Design  –  Circuit  Breakers   h=p://techblog.ne:lix.com/2012/02/fault-­‐tolerance-­‐in-­‐high-­‐volume.html  
  • 89. PaaS  OperaMonal  Model   •  Developers   –  Provision  and  run  their  own  code  in  producMon   –  Take  turns  to  be  on  call  if  it  breaks  (pagerduty)   –  Configure  autoscalers  to  handle  capacity  needs   •  DevOps  and  PaaS  (aka  NoOps)   –  DevOps  is  used  to  build  and  run  the  PaaS   –  PaaS  constrains  Dev  to  use  automaMon  instead   –  PaaS  puts  more  responsibility  on  Dev,  with  tools  
  • 90. What’s  Le7  for  Corp  IT?   •  Corporate  Security  and  Network  Management   –  Billing  and  remnants  of  streaming  service  back-­‐ends  in  DC   •  Running  Ne:lix’  DVD  Business   –  Tens  of  Oracle  instances   Corp  WiFi  Performance   –  Hundreds  of  MySQL  instances   –  Thousands  of  VMWare  VMs   –  Zabbix,  CacM,  Splunk,  Puppet   •  Employee  ProducMvity   –  Building  networks  and  WiFi   –  SaaS  OneLogin  SSO  Portal   –  Evernote  Premium,  Safari  Online  Bookshelf,  Dropbox  for  Teams   –  Google  Enterprise  Apps,  Workday  HCM/Expense,  Box.com   –  Many  more  SaaS  migraMons  coming…  
  • 91. ImplicaMons  for  IT  OperaMons   •  Cloud  is  run  by  developer  organizaMon   –  Product  group’s  “IT  department”  is  the  AWS  API  and  PaaS   –  CorpIT  handles  billing  and  some  security  funcMons   Cloud  capacity  is  10x  bigger  than  Datacenter   –  Datacenter  oriented  IT  didn’t  scale  up  as  we  grew   –  We  moved  a  few  people  out  of  IT  to  do  DevOps  for  our  PaaS   •  TradiMonal  IT  Roles  and  Silos  are  going  away   –  We  don’t  have  SA,  DBA,  Storage,  Network  admins  for  cloud   –  Developers  deploy  and  “run  what  they  wrote”  in  producMon  
  • 92. Ne:lix  PaaS  OrganizaMon   Developer  Org  ReporMng  into  Product  Development,  not  ITops   Ne:lix  Cloud  Pla:orm  Team   Cloud  Ops   Build  Tools   Pla:orm  and   Cloud   Cloud   Reliability   Architecture   and   Database   Performance   SoluMons   Engineering   AutomaMon   Engineering   Perforce  Jenkins   Pla:orm  jars   Cassandra   Future  planning   ArMfactory  JIRA   Benchmarking   Monitoring   Alert  RouMng   Key  store   Security  Arch   Monkeys   Incident  Lifecycle   Base  AMI,  Bakery   Zookeeper   JVM  GC  Tuning   Efficiency   Ne:lix  App  Console   Wiresharking   Entrypoints   Cassandra   AWS  VPC   PagerDuty   Hyperguard   AWS  API   AWS  Instances   AWS  Instances   AWS  Instances   Powerpoint  J  
  • 93. Part  3.  Making  the  Instruments   Builder  Viewpoint  
  • 94. Components   •  ConMnuous  build  framework  turns  code  into  AMIs   •  AWS  accounts  for  test,  producMon,  etc.   •  Cloud  access  gateway   •  Service  registry   •  ConfiguraMon  properMes  service   •  Persistence  services   •  Monitoring,  alert  forwarding   •  Backups,  archives  
  • 95. Common  Build  Framework   Extracted  from   “Building  and  Deploying  Ne:lix  in  the  Cloud”   by  @bmoyles  and  @garethbowles   On  slideshare.net/ne:lix    
  • 96. Build  Pipeline   ArMfactory   yum   libraries   Jenkins   CBF  steps   resolve   compile   publish   report   sync   check   build   test   source   Perforce   GitHub  
  • 97. Jenkins  Architecture   x86_64  slave  11   x86_64  slave     1   x86_64  slave   buildnode01   slave   buildnode01   1   x86_64  slave   Standard   buildnode01   custom  slaves   custom  slaves   buildnode01   group   custom  slaves   misc.  architecture   custom  slaves   misc.  architecture   Amazon  Linux   misc.  architecture   custom  slaves   Single  Master   misc.  architecture   Ad-­‐hoc  slaves   m1.xlarge   misc.  architecture   Red  Hat  Linux   misc.  O/S  &   2x  quad  core  x86_64   architectures   26G  RAM   x86_64  slave  11   x86_64  slave    slave   Custom   ~40  custom  slaves   buildnode01   1   x86_64  slave   buildnode01   group   buildnode01   maintained  by  product   Amazon  Linux   teams   various   us-­‐west-­‐1  VPC   Ne:lix  data  center   Ne:lix  data  center  and   office  
  • 98. Other  Uses  of  Jenkins   Maintence  of  test  and  prod  Cassandra  clusters   Automated  integraMon  tests  for  bake  and  deploy     ProducMon  bake  and  deployment   Housekeeping  of  the  build  /  deploy  infrastructure  
  • 99. Ne:lix  Extensions  to  Jenkins   "  Job  DSL  plugin:  allow  jobs  to  be  set  up  with   minimal  definiMon,  using  templates  and  a   Groovy-­‐based  DSL   "  Housekeeping  and  maintenance  processes   implemented  as  Jenkins  jobs,  system  Groovy   scripts  
  • 100. The  DynaSlave  Plugin   What  We  Have   "   Exposes  a  new  endpoint  in  Jenkins  that  EC2  instances   in  VPC  use  for  registraMon   "   Allows  a  slave  to  name  itself,  label  itself,  tell  Jenkins   how  many  executors  it  can  support   "   EC2  ==  Ephemeral.  Disconnected  nodes  that  are  gone   for  >  30  mins  are  reaped   "   Sizing  handled  by  EC2  ASGs,  tweaks  passed  through  via   user  data  (labels,  names,  etc)  
  • 101. The  DynaSlave  Plugin   What’s  Next   "  Enhanced  security/registraMon  of  nodes   "  Dynamic  resource  management   "  have  Jenkins  respond  to  build  demand     "  Slave  groups   "  Allows  us  to  create  specialized  pools  of  build  nodes   "  Refresh  mechanism  for  slave  tools   "  JDKs,  Ant  versions,  etc.   "  Give  it  back  to  the  community   "  watch  techblog.ne:lix.com!  
  • 102. The  Bakery   •  Create  base  AMIs   –  We  have  CentOS,  Ubuntu  and  Windows  base  AMIs   –  All  the  generic  code,  apache,  tomcat  etc.   –  Standard  system  and  applicaMon  monitoring  tools   –  Update  ~monthly  with  patches  and  new  versions   •  Add  yummy  topping  and  bake   –  Build  app  specific  AMI  including  all  code  etc.   –  Bakery  mounts  EBS  snapshot,  installs  and  bakes   –  One  bakery  per  region,  delivers  into  paastest   –  Tweak  config  and  publish  AMI  to  paasprod  
  • 104. Accounts  Isolate  Concerns   •  paastest  –  for  development  and  tesMng   –  Fully  funcMonal  deployment  of  all  services   –  Developer  tagged  “stacks”  for  separaMon   •  paasprod  –  for  producMon   –  Autoscale  groups  only,  isolated  instances  are  terminated   –  Alert  rouMng,  backups  enabled  by  default   •  paasaudit  –  for  sensiMve  services   –  To  support  SOX,  PCI,  etc.   –  Extra  access  controls,  audiMng   •  paasarchive  –  for  disaster  recovery   –  Long  term  archive  of  backups   –  Different  region,  perhaps  different  vendor  
  • 105. ReservaMons  and  Billing   •  Consolidated  Billing   –  Combine  all  accounts  into  one  bill   –  Pooled  capacity  for  bigger  volume  discounts   h=p://docs.amazonwebservices.com/AWSConsolidatedBilling/1.0/AWSConsolidatedBillingGuide.html   •  ReservaMons   –  Save  up  to  71%  on  your  baseline  load   –  Priority  when  you  request  reserved  capacity   –  Unused  reservaMons  are  shared  across  accounts    
  • 106. Cloud  Access  Gateway   •  Datacenter  or  office  based   –  A  separate  VM  for  each  AWS  account   –  Two  per  account  for  high  availability   –  Mount  NFS  shared  home  directories  for  developers   –  Instances  trust  the  gateway  via  a  security  group   •  Manage  how  developers  login  to  cloud   –  Access  control  via  ldap  group  membership   –  Audit  logs  of  every  login  to  the  cloud   –  Similar  to  awsfabrictasks  ssh  wrapper   h=p://readthedocs.org/docs/awsfabrictasks/en/latest/  
  • 107. Cloud  Access  Control   developers   Cloud  Access   www-­‐ •  Userid  wwwprod   ssh  Gateway   prod   Security  groups  don’t  allow   ssh  between  instances   Dal-­‐ •  Userid  dalprod   prod   Cass-­‐ •  Userid  cassprod   prod  
  • 108. Now  Add  Code   Ne:lix  has  open  sourced  a  lot  of   what  you  need,  more  is  on  the  way…    
  • 109. Ne:lix  Open  Source  Strategy   •  Release  PaaS  Components  git-­‐by-­‐git   –  Source  at  github.com/ne:lix  –  we  build  from  it…   –  Intros  and  techniques  at  techblog.ne:lix.com   –  Blog  post  or  new  code  every  few  weeks   •  MoMvaMons   –  Give  back  to  Apache  licensed  OSS  community   –  MoMvate,  retain,  hire  top  engineers   –  “Peer  pressure”  code  cleanup,  external  contribuMons  
  • 110. Open  Source  Projects  and  Posts   Legend   Github  /  Techblog   Priam   Exhibitor   Servo  and  Autoscaling   Cassandra  as  a  Service   Zookeeper  as  a  Service   Scripts   Apache  ContribuMons   Astyanax   Honu   Curator   Techblog  Post   Cassandra  client  for   Log4j  streaming  to   Zookeeper  Pa=erns   Java   Hadoop   Coming  Soon   EVCache   CassJMeter   Circuit  Breaker   Memcached  as  a   Cassandra  test  suite   Robust  service  pa=ern   Service   Cassandra   Asgard   Discovery  Service   MulM-­‐region  EC2   AutoScaleGroup  based   Directory   datastore  support   AWS  console   Aegisthus   ConfiguraMon   Chaos  Monkey   Hadoop  ETL  for   ProperMes  Service   Robustness  verificaMon   Cassandra  
  • 111. Asgard   Not  quite  out  yet…   •  Runs  in  a  VM  in  our  datacenter   –  So  it  can  deploy  to  an  empty  account   –  Groovy/Grails/JVM  based   –  Supports  all  AWS  regions  on  a  global  basis   •  Hides  the  AWS  credenMals   –  Use  AWS  IAM  to  issue  restricted  keys  for  Asgard   –  Each  Asgard  instance  manages  one  account   –  One  install  each  for  paastest,  paasprod,  paasaudit  
  • 112. “Discovery”  -­‐  Service  Directory   •  Map  an  instance  to  a  service  type   –  Load  balance  over  clusters  of  instances   –  Private  namespace,  so  DNS  isn’t  useful   –  FoundaMon  service,  first  to  deploy   •  Highly  available  distributed  coordinaMon   –  Deploy  one  Apache  Zookeeper  instance  per  zone   –  Ne:lix  Curator  includes  simple  discovery  service   –  Ne:lix  Exhibitor  manages  Zookeeper  reliably  
  • 113. ConfiguraMon  ProperMes  Service   •  Dynamic  hierarchical  &  propagates  in  seconds   –  Client  Mmeouts,  feature  set  enables   –  Region  specific  service  endpoints   –  Cassandra  token  assignments  etc.  etc.   •  Used  to  configure  everything   –  So  everything  depends  on  it…   –  Coming  soon  to  github   –  Pluggable  backend  storage  interface    
  • 114. Persistence  services   •  Use  SimpleDB  as  a  bootstrap   –  Good  use  case  for  DynamoDB  or  SimpleDB   •  Ne:lix  Priam   –  Cassandra  automaMon    
  • 115. Monitoring,  alert  forwarding   •  MulMple  monitoring  systems   –  Internally  developed  data  collecMon  runs  on  AWS   –  AppDynamics  APM  product  runs  as  external  SaaS   –  When  one  breaks  the  other  is  usually  OK…   •  Alerts  routed  to  the  developer  of  that  app   –  Alert  gateway  combines  alerts  from  all  sources   –  DeduplicaMon,  source  quenching,  rouMng   –  Warnings  sent  via  email,  criMcal  via  pagerduty  
  • 116. Backups,  archives   •  Cassandra  Backup  via  Priam  to  S3  bucket   –  Create  versioned  S3  bucket  with  TTL  opMon   –  Setup  service  to  encrypt  and  copy  to  archive   •  Archive  Account  with  Read/Write  ACL  to  prod   –  Setup  in  a  different  AWS  region  from  producMon   –  Create  versioned  S3  bucket  with  TTL  opMon  
  • 117. Chaos  Monkey   •  Install  it  on  day  1  in  test  and  producMon   •  Prevents  people  from  doing  local  persistence   •  Kill  anything  not  protected  by  an  ASG   •  Supports  whitelist  for  temporary  do-­‐not-­‐kill   •  Open  source  soon,  code  cleanup  in  progress…  
  • 118. You  take  it  from  here…   •  Keep  watching  github  for  more  goodies   •  Add  your  own  code   •  Let  us  know  what  you  find  useful   •  Bugs,  patches  and  addiMons  all  welcome   •  See  you  at  AWS  Re:Invent?  
  • 119. Roadmap  for  2012   •  More  resiliency  and  improved  availability   •  More  automaMon,  orchestraMon   •  “Hardening”  the  pla:orm,  code  clean-­‐up   •  Lower  latency  for  web  services  and  devices   •  IPv6  support   •  More  open  sourced  components  
  • 120. Wrap  Up       Answer  your  remaining  quesMons…     What  was  missing  that  you  wanted  to  cover?  
  • 121. Takeaway     NeVlix  has  built  and  deployed  a  scalable  global  PlaVorm  as  a  Service.     Key  components  of  the  NeVlix  PaaS  are  being  released  as  Open  Source   projects  so  you  can  build  your  own  custom  PaaS.     h=p://github.com/Ne:lix   h=p://techblog.ne:lix.com   h=p://slideshare.net/Ne:lix     h=p://www.linkedin.com/in/adriancockcro7   @adrianco  #ne:lixcloud     End  of  Part  3  of  3  
  • 123. You  want  an  Encore?   If  there  is  enough  Mme…  (there  wasn’t)   Something  for  the  hard  core  complex  adapMve   systems  people  to  digest.  
  • 124. A  Discussion  of  Workloads  and   How  They  Behave  
  • 125. Workload  CharacterisMcs   •  A  quick  tour  through  a  taxonomy  of   workload  types   •  Start  with  the  easy  ones  and  work  up   •  Why  personalized  workloads  are  different   and  hard   •  Some  examples  and  coping  strategies   5/15/12   Slide  254  
  • 126. Simple  Random  Arrivals   •  Random  arrival  of  transacMons  with  fixed  mean   service  Mme   –  Li=le’s  Law:  QueueLength  =  Throughput  *  Response   –  UMlizaMon  Law:  UMlizaMon  =  Throughput  *  ServiceTime   •  Complex  models  are  o7en  reduced  to  this  model   –  By  averaging  over  longer  Mme  periods  since  the  formulas   only  work  if  you  have  stable  averages   –  By  wishful  thinking  (i.e.  how  to  fool  yourself)   5/15/12   Slide  255  
  • 127. Mixed  random  arrivals  of  transacMons   with  stable  mean  service  Mmes   •  Think  of  the  grocery  store  checkout  analogy   –  Trolleys  full  of  shopping  vs.  baskets  full  of  shopping   –  Baskets  are  quick  to  service,  but  get  stuck  behind  carts   –  RelaMve  mixture  of  transacMon  types  starts  to  ma=er   •  Many  transacMonal  systems  handle  a  mixture   –  Databases,  web  services   •  Consider  separaMng  fast  and  slow  transacMons   –  So  that  we  have  a  “10  items  or  less”  line  just  for  baskets   –  Separate  pools  of  servers  for  different  services   –  The  old  rule  -­‐  don’t  mix  OLTP  with  DSS  queries  in  databases   •  Performance  is  o7en  thread-­‐limited   –  Thread  limit  and  slow  transacMons  constrains  maximum  throughput   •  Model  mix  using  analyMcal  solvers  (e.g.  PDQ  perfdynamics.com)   5/15/12   Slide  256  
  • 128. Load  dependent  servers  –  varying   mean  service  Mmes   •  Mean  service  Mme  may  increase  at  high  throughput   –  Due  to  non-­‐scalable  algorithms,  lock  contenMon   –  System  runs  out  of  memory  and  starts  paging  or  frequent  GC   •  Mean  service  Mme  may  also  decrease  at  high  throughput   –  Elevator  seek  and  write  cancellaMon  opMmizaMons  in  storage   –  Load  shedding  and  simplified  fallback  modes   •  Systems  have  “Mpping  points”  if  the  service  Mme  increases   –  Hysteresis  means  they  don’t  come  back  when  load  drops   –  This  is  why  you  have  to  kill  catatonic  systems   –  Best  designs  shed  load  to  be  stable  at  the  limit  –  circuit  breaker  pa=ern   –  PracMcal  opMon  is  to  try  to  avoid  Mpping  points  by  reducing  variance     •  Model  using  discrete  event  simulaMon  tools   –  Behaviour  is  non-­‐linear  and  hard  to  model   5/15/12   Slide  257  
  • 129. Self-­‐similar  /  fractal  workloads   •  Bursty  rather  than  random  arrival  rates   •  Self-­‐similar   –  Looks  “random”  at  close  up,  stays  “random”  as  you  zoom  out   –  Work  arrives  in  bursts,  transacMons  aren’t  independent   –  Bursts  cluster  together  in  super-­‐bursts,  etc.   •  Network  packet  streams  tend  to  be  fractal   •  Common  in  pracMce,  too  hard  to  model   –  Probably  the  most  common  reason  why  your  model  is  wrong!   5/15/12   Slide  258