Latest version of the Netflix Cloud Architecture story was given at Gluecon May 23rd 2012. Gluecon rocks, and lots of Van Halen references were added for the occasion. There tradeoff between developer driven high functionality AWS based PaaS, and operations driven low cost portable PaaS is discussed. The three sections cover the developer view, the operator view and the builder view.
1 of 133
Downloaded 949 times
More Related Content
Netflix Architecture Tutorial at Gluecon
1. Cloud
Architecture
Tutorial
Construc2ng
Cloud
Architecture
the
Ne5lix
Way
Gluecon
May
23rd,
2012
Adrian
Cockcro7
@adrianco
#ne:lixcloud
h=p://www.linkedin.com/in/adriancockcro7
3. Tutorial
Abstract
–
Set
Context
• Dispensing
with
the
usual
quesMons:
“Why
Ne:lix,
why
cloud,
why
AWS?”
as
they
are
old
hat
now.
• This
tutorial
explains
how
developers
use
the
Ne:lix
cloud,
and
how
it
is
built
and
operated.
• The
real
meat
of
the
tutorial
comes
when
we
look
at
how
to
construct
an
applicaMon
with
a
host
of
important
properMes:
elasMc,
dynamic,
scalable,
agile,
fast,
cheap,
robust,
durable,
observable,
secure.
Over
the
last
three
years
Ne:lix
has
figured
out
cloud
based
soluMons
with
these
properMes,
deployed
them
globally
at
large
scale
and
refined
them
into
a
global
Java
oriented
Pla:orm
as
a
Service.
The
PaaS
is
based
on
low
cost
open
source
building
blocks
such
as
Apache
Tomcat,
Apache
Cassandra,
and
Memcached.
Components
of
this
pla:orm
are
in
the
process
of
being
open-‐sourced
by
Ne:lix,
so
that
other
companies
can
get
a
start
on
building
their
own
customized
PaaS
that
leverages
advanced
features
of
AWS
and
supports
rapid
agile
development.
• The
architecture
is
described
in
terms
of
anM-‐pa=erns
-‐
things
to
avoid
in
the
datacenter
to
cloud
transiMon.
A
scalable
global
persistence
Mer
based
on
Cassandra
provides
a
highly
available
and
durable
under-‐pinning.
Lessons
learned
will
cover
soluMons
to
common
problems,
availability
and
robustness,
observability.
A=endees
should
leave
the
tutorial
with
a
clear
understanding
of
what
is
different
about
the
Ne:lix
cloud
architecture,
how
it
empowers
and
supports
developers,
and
a
set
of
flexible
and
scalable
open
source
building
blocks
that
can
be
used
to
construct
their
own
cloud
pla:orm.
4. PresentaMon
vs.
Tutorial
• PresentaMon
– Short
duraMon,
focused
subject
– One
presenter
to
many
anonymous
audience
– A
few
quesMons
at
the
end
• Tutorial
– Time
to
explore
in
and
around
the
subject
– Tutor
gets
to
know
the
audience
– Discussion,
rat-‐holes,
“bring
out
your
dead”
5. Cloud
Tutorial
SecMons
Intro:
Who
are
you,
what
are
your
quesMons?
Part
1
–
WriMng
and
Performing
Developer
Viewpoint
Part
2
–
Running
the
Show
Operator
Viewpoint
Part
3
–
Making
the
Instruments
Builder
Viewpoint
6. Adrian
Cockcro7
• Director,
Architecture
for
Cloud
Systems,
Ne:lix
Inc.
– Previously
Director
for
PersonalizaMon
Pla:orm
• DisMnguished
Availability
Engineer,
eBay
Inc.
2004-‐7
– Founding
member
of
eBay
Research
Labs
• DisMnguished
Engineer,
Sun
Microsystems
Inc.
1988-‐2004
– 2003-‐4
Chief
Architect
High
Performance
Technical
CompuMng
– 2001
Author:
Capacity
Planning
for
Web
Services
– 1999
Author:
Resource
Management
– 1995
&
1998
Author:
Sun
Performance
and
Tuning
– 1996
Japanese
EdiMon
of
Sun
Performance
and
Tuning
•
SPARC
&
Solarisパフォーマンスチューニング (サンソフトプレスシリーズ)
• Heavy
Metal
Bass
Guitarist
in
“Black
Tiger”
1980-‐1982
– Influenced
by
Van
Halen,
Yesterday
&
Today,
AC/DC
• More
– Twi=er
@adrianco
–
Blog
h=p://perfcap.blogspot.com
– PresentaMons
at
h=p://www.slideshare.net/adrianco
7. A=endee
IntroducMons
• Who
are
you,
where
do
you
work
• Why
are
you
here
today,
what
do
you
need
• “Bring
out
your
dead”
– Do
you
have
a
specific
problem
or
quesMon?
– One
sentence
elevator
pitch
• What
instrument
do
you
play?
15. Keeping
up
with
Developer
Trends
In
producMon
at
Ne:lix
• Big
Data/Hadoop
2009
• Cloud
2009
• ApplicaMon
Performance
Management
2010
• Integrated
DevOps
PracMces
2010
• ConMnuous
IntegraMon/Delivery
2010
• NoSQL
2010
• Pla:orm
as
a
Service
2010
• Social
coding,
open
development/github
2011
17. Portability
vs.
FuncMonality
• Portability
–
the
OperaMons
focus
– Avoid
vendor
lock-‐in
– Support
datacenter
based
use
cases
– Possible
operaMons
cost
savings
• FuncMonality
–
the
Developer
focus
– Less
complex
test
and
debug,
one
mature
supplier
– Faster
Mme
to
market
for
your
products
– Possible
developer
cost
savings
18. Portable
PaaS
• Portable
IaaS
Base
-‐
some
AWS
compaMbility
– Eucalyptus
–
AWS
licensed
compaMble
subset
– CloudStack
–
Citrix
Apache
project
– OpenStack
–
Rackspace,
Cloudscaling,
HP
etc.
• Portable
PaaS
– Cloud
Foundry
-‐
run
it
yourself
in
your
DC
– AppFog
and
Stackato
–
Cloud
Foundry/Openstack
– Vendor
opMons:
Rightscale,
Enstratus,
Smartscale
19. FuncMonal
PaaS
• IaaS
base
-‐
all
the
features
of
AWS
– Very
large
scale,
mature,
global,
evolving
rapidly
– ELB,
Autoscale,
VPC,
SQS,
EIP,
EMR,
DynamoDB
etc.
– Large
files
and
mulMpart
writes
in
S3
• FuncMonal
PaaS
–
based
on
Ne:lix
features
– Very
large
scale,
mature,
flexible,
customizable
– Asgard
console,
Monkeys,
Big
data
tools
– Cassandra/Zookeeper
data
store
automaMon
20. Developers
choose
FuncMonal
Don’t
let
the
roadie
write
the
set
list!
(yes
you
do
need
all
those
guitars
on
tour…)
21. Freedom
and
Responsibility
• Developers
leverage
cloud
to
get
freedom
– Agility
of
a
single
organizaMon,
no
silos
• But
now
developers
are
responsible
– For
compliance,
performance,
availability
etc.
“As
far
as
my
rehab
is
concerned,
it
is
within
my
ability
to
change
and
change
for
the
beNer
-‐
Eddie
Van
Halen”
22. Amazon Cloud Terminology Reference
See http://aws.amazon.com/ This is not a full list of Amazon Web Service features
• AWS
–
Amazon
Web
Services
(common
name
for
Amazon
cloud)
• AMI
–
Amazon
Machine
Image
(archived
boot
disk,
Linux,
Windows
etc.
plus
applicaMon
code)
• EC2
–
ElasMc
Compute
Cloud
– Range
of
virtual
machine
types
m1,
m2,
c1,
cc,
cg.
Varying
memory,
CPU
and
disk
configuraMons.
– Instance
–
a
running
computer
system.
Ephemeral,
when
it
is
de-‐allocated
nothing
is
kept.
– Reserved
Instances
–
pre-‐paid
to
reduce
cost
for
long
term
usage
– Availability
Zone
–
datacenter
with
own
power
and
cooling
hosMng
cloud
instances
– Region
–
group
of
Avail
Zones
–
US-‐East,
US-‐West,
EU-‐Eire,
Asia-‐Singapore,
Asia-‐Japan,
SA-‐Brazil,
US-‐Gov
• ASG
–
Auto
Scaling
Group
(instances
booMng
from
the
same
AMI)
• S3
–
Simple
Storage
Service
(h=p
access)
• EBS
–
ElasMc
Block
Storage
(network
disk
filesystem
can
be
mounted
on
an
instance)
• RDS
–
RelaMonal
Database
Service
(managed
MySQL
master
and
slaves)
• DynamoDB/SDB
–
Simple
Data
Base
(hosted
h=p
based
NoSQL
datastore,
DynamoDB
replaces
SDB)
• SQS
–
Simple
Queue
Service
(h=p
based
message
queue)
• SNS
–
Simple
NoMficaMon
Service
(h=p
and
email
based
topics
and
messages)
• EMR
–
ElasMc
Map
Reduce
(automaMcally
managed
Hadoop
cluster)
• ELB
–
ElasMc
Load
Balancer
• EIP
–
ElasMc
IP
(stable
IP
address
mapping
assigned
to
instance
or
ELB)
• VPC
–
Virtual
Private
Cloud
(single
tenant,
more
flexible
network
and
security
constructs)
• DirectConnect
–
secure
pipe
from
AWS
VPC
to
external
datacenter
• IAM
–
IdenMty
and
Access
Management
(fine
grain
role
based
security
keys)
23. Ne:lix
Deployed
on
AWS
2009
2009
2010
2010
2010
2011
Content
Logs
Play
WWW
API
CS
Content
S3
InternaMonal
Management
DRM
Sign-‐Up
Metadata
CS
lookup
Terabytes
EC2
Device
DiagnosMcs
EMR
CDN
rouMng
Search
Config
&
AcMons
Encoding
S3
Movie
TV
Movie
Customer
Hive
&
Pig
Bookmarks
Choosing
Choosing
Call
Log
Petabytes
Business
Social
Logging
RaMngs
Facebook
CS
AnalyMcs
Intelligence
CDNs
ISPs
Terabits
Customers
24. Datacenter
to
Cloud
TransiMon
Goals
“Go
ahead
and
Jump
–
Van
Halen”
• Faster
– Lower
latency
than
the
equivalent
datacenter
web
pages
and
API
calls
– Measured
as
mean
and
99th
percenMle
– For
both
first
hit
(e.g.
home
page)
and
in-‐session
hits
for
the
same
user
• Scalable
– Avoid
needing
any
more
datacenter
capacity
as
subscriber
count
increases
– No
central
verMcally
scaled
databases
– Leverage
AWS
elasMc
capacity
effecMvely
• Available
– SubstanMally
higher
robustness
and
availability
than
datacenter
services
– Leverage
mulMple
AWS
availability
zones
– No
scheduled
down
Mme,
no
central
database
schema
to
change
• ProducMve
– OpMmize
agility
of
a
large
development
team
with
automaMon
and
tools
– Leave
behind
complex
tangled
datacenter
code
base
(~8
year
old
architecture)
– Enforce
clean
layered
interfaces
and
re-‐usable
components
25. Datacenter
AnM-‐Pa=erns
What
do
we
currently
do
in
the
datacenter
that
prevents
us
from
meeMng
our
goals?
“Me
Wise
Magic
–
Van
Halen”
26. Ne:lix
Datacenter
vs.
Cloud
Arch
Central
SQL
Database
Distributed
Key/Value
NoSQL
SMcky
In-‐Memory
Session
Shared
Memcached
Session
Cha=y
Protocols
Latency
Tolerant
Protocols
Tangled
Service
Interfaces
Layered
Service
Interfaces
Instrumented
Code
Instrumented
Service
Pa=erns
Fat
Complex
Objects
Lightweight
Serializable
Objects
Components
as
Jar
Files
Components
as
Services
27. The
Central
SQL
Database
• Datacenter
has
a
central
database
– Everything
in
one
place
is
convenient
unMl
it
fails
• Schema
changes
require
downMme
– Customers,
movies,
history,
configuraMon
AnS-‐paNern
impacts
scalability,
availability
28. The
Distributed
Key-‐Value
Store
• Cloud
has
many
key-‐value
data
stores
– More
complex
to
keep
track
of,
do
backups
etc.
– Each
store
is
much
simpler
to
administer
DBA
– Joins
take
place
in
java
code
– No
schema
to
change,
no
scheduled
downMme
• Minimum
Latency
for
Simple
Requests
– Memcached
is
dominated
by
network
latency
<1ms
– Cassandra
cross
zone
replicaMon
around
one
millisecond
– DynamoDB
replicaMon
and
auth
overheads
around
5ms
– SimpleDB
higher
replicaMon
and
auth
overhead
>10ms
29. The
SMcky
Session
• Datacenter
SMcky
Load
Balancing
– Efficient
caching
for
low
latency
– Tricky
session
handling
code
• Encourages
concentrated
funcMonality
– one
service
that
does
everything
– Middle
Mer
load
balancer
had
issues
in
pracMce
AnS-‐paNern
impacts
producSvity,
availability
30. Shared
Session
State
• ElasMc
Load
Balancer
– We
don’t
use
the
cookie
based
rouMng
opMon
– External
“session
caching”
with
memcached
• More
flexible
fine
grain
services
– Any
instance
can
serve
any
request
– Works
be=er
with
auto-‐scaled
instance
counts
31. Cha=y
Opaque
and
Bri=le
Protocols
• Datacenter
service
protocols
– Assumed
low
latency
for
many
simple
requests
• Based
on
serializing
exisMng
java
objects
– Inefficient
formats
– IncompaMble
when
definiMons
change
AnS-‐paNern
causes
producSvity,
latency
and
availability
issues
32. Robust
and
Flexible
Protocols
• Cloud
service
protocols
– JSR311/Jersey
is
used
for
REST/HTTP
service
calls
– Custom
client
code
includes
service
discovery
– Support
complex
data
types
in
a
single
request
• Apache
Avro
– Evolved
from
Protocol
Buffers
and
Thri7
– Includes
JSON
header
defining
key/value
protocol
– Avro
serializaMon
is
half
the
size
and
several
Mmes
faster
than
Java
serializaMon,
more
work
to
code
33. Persisted
Protocols
• Persist
Avro
in
Memcached
– Save
space/latency
(zigzag
encoding,
half
the
size)
– New
keys
are
ignored
– Missing
keys
are
handled
cleanly
• Avro
protocol
definiMons
– Less
bri=le
across
versions
– Can
be
wri=en
in
JSON
or
generated
from
POJOs
– It’s
hard,
needs
be=er
tooling
34. Tangled
Service
Interfaces
• Datacenter
implementaMon
is
exposed
– Oracle
SQL
queries
mixed
into
business
logic
• Tangled
code
– Deep
dependencies,
false
sharing
• Data
providers
with
sideways
dependencies
– Everything
depends
on
everything
else
AnS-‐paNern
affects
producSvity,
availability
35. Untangled
Service
Interfaces
• New
Cloud
Code
With
Strict
Layering
– Compile
against
interface
jar
– Can
use
spring
runMme
binding
to
enforce
– Fine
grain
services
as
components
• Service
interface
is
the
service
– ImplementaMon
is
completely
hidden
– Can
be
implemented
locally
or
remotely
– ImplementaMon
can
evolve
independently
36. Untangled
Service
Interfaces
Poundcake
–
Van
Halen
Two
layers:
• SAL
-‐
Service
Access
Library
– Basic
serializaMon
and
error
handling
– REST
or
POJO’s
defined
by
data
provider
• ESL
-‐
Extended
Service
Library
– Caching,
conveniences,
can
combine
several
SALs
– Exposes
faceted
type
system
(described
later)
– Interface
defined
by
data
consumer
in
many
cases
38. Service
Architecture
Pa=erns
• Internal
Interfaces
Between
Services
– Common
pa=erns
as
templates
– Highly
instrumented,
observable,
analyMcs
– Service
Level
Agreements
–
SLAs
• Library
templates
for
generic
features
– Instrumented
Ne:lix
Base
Servlet
template
– Instrumented
generic
client
interface
template
– Instrumented
S3,
SimpleDB,
Memcached
clients
39. CLIENT
Request
Start
Timestamp,
Client
Inbound
Request
End
outbound
deserialize
end
Timestamp
serialize
start
Mmestamp
Mmestamp
Inbound
Client
deserialize
outbound
start
serialize
end
Mmestamp
Mmestamp
Client
network
receive
Mmestamp
Service
Request
Client
Network
send
Mmestamp
Instruments
Every
Service
network
send
Mmestamp
Step
in
the
call
Service
Network
receive
Mmestamp
Service
Service
outbound
inbound
serialize
end
serialize
start
Mmestamp
Mmestamp
Service
Service
outbound
inbound
serialize
start
SERVICE
execute
serialize
end
request
start
Mmestamp
Mmestamp
Mmestamp,
execute
request
end
Mmestamp
40. Boundary
Interfaces
• Isolate
teams
from
external
dependencies
– Fake
SAL
built
by
cloud
team
– Real
SAL
provided
by
data
provider
team
later
– ESL
built
by
cloud
team
using
faceted
objects
• Fake
data
sources
allow
development
to
start
– e.g.
Fake
IdenMty
SAL
for
a
test
set
of
customers
– Development
solidifies
dependencies
early
– Helps
external
team
provide
the
right
interface
41. One
Object
That
Does
Everything
Can’t
Get
This
Stuff
No
More
–
Van
Halen
• Datacenter
uses
a
few
big
complex
objects
– Good
choice
for
a
small
team
and
one
instance
– ProblemaMc
for
large
teams
and
many
instances
• False
sharing
causes
tangled
dependencies
– Movie
and
Customer
objects
are
foundaMonal
– UnproducMve
re-‐integraMon
work
AnS-‐paNern
impacSng
producSvity
and
availability
42. An
Interface
For
Each
Component
• Cloud
uses
faceted
Video
and
Visitor
– Basic
types
hold
only
the
idenMfier
– Facets
scope
the
interface
you
actually
need
– Each
component
can
define
its
own
facets
• No
false-‐sharing
and
dependency
chains
– Type
manager
converts
between
facets
as
needed
– video.asA(PresentaMonVideo)
for
www
– video.asA(MerchableVideo)
for
middle
Mer
43. Stan
Lanning’s
Soap
Box
• Business
Level
Object
-‐
Level
Confusion
Listen
to
the
bearded
guru…
– Don’t
pass
around
IDs
when
you
mean
to
refer
to
the
BLO
• Using
Basic
Types
helps
the
compiler
help
you
– Compile
Mme
problems
are
be=er
than
run
Mme
problems
• More
readable
by
people
– But
beware
that
asA
operaMons
may
be
a
lot
of
work
• MulMple-‐inheritance
for
Java?
– Kinda-‐sorta…
44. Model
Driven
Architecture
• TradiMonal
Datacenter
PracMces
– Lots
of
unique
hand-‐tweaked
systems
– Hard
to
enforce
pa=erns
– Some
use
of
Puppet
to
automate
changes
• Model
Driven
Cloud
Architecture
– Perforce/Ivy/Jenkins
based
builds
for
everything
– Every
producMon
instance
is
a
pre-‐baked
AMI
– Every
applicaMon
is
managed
by
an
Autoscaler
Every
change
is
a
new
AMI
45. Ne:lix
PaaS
Principles
• Maximum
FuncMonality
– Developer
producMvity
and
agility
• Leverage
as
much
of
AWS
as
possible
– AWS
is
making
huge
investments
in
features/scale
• Interfaces
that
isolate
Apps
from
AWS
– Avoid
lock-‐in
to
specific
AWS
API
details
• Portability
is
a
long
term
goal
– Gets
easier
as
other
vendors
catch
up
with
AWS
46. Ne:lix
Global
PaaS
Features
• Supports
all
AWS
Availability
Zones
and
Regions
• Supports
mulMple
AWS
accounts
{test,
prod,
etc.}
• Cross
Region/Acct
Data
ReplicaMon
and
Archiving
• InternaMonalized,
Localized
and
GeoIP
rouMng
• Security
is
fine
grain,
dynamic
AWS
keys
• Autoscaling
to
thousands
of
instances
• Monitoring
for
millions
of
metrics
• ProducMve
for
100s
of
developers
on
one
product
• 25M+
users
USA,
Canada,
LaMn
America,
UK,
Eire
47. Basic
PaaS
EnMMes
• AWS
Based
EnMMes
– Instances
and
Machine
Images,
ElasMc
IP
Addresses
– Security
Groups,
Load
Balancers,
Autoscale
Groups
– Availability
Zones
and
Geographic
Regions
• Ne:lix
PaaS
EnMMes
– ApplicaMons
(registered
services)
– Clusters
(versioned
Autoscale
Groups
for
an
App)
– ProperMes
(dynamic
hierarchical
configuraMon)
48. Core
PaaS
Services
• AWS
Based
Services
– S3
storage,
to
5TB
files,
parallel
mulMpart
writes
– SQS
–
Simple
Queue
Service.
Messaging
layer.
• Ne:lix
Based
Services
– EVCache
–
memcached
based
ephemeral
cache
– Cassandra
–
distributed
persistent
data
store
• External
Services
– GeoIP
Lookup
interfaced
to
a
vendor
– Secure
Keystore
HSM
49. Instance
Architecture
Linux
Base
AMI
(CentOS
or
Ubuntu)
OpMonal
Apache
frontend,
Java
(JDK
6
or
7)
memcached,
non-‐java
apps
AppDynamics
Monitoring
appagent
monitoring
Tomcat
Log
rotaMon
ApplicaMon
war
file,
base
Healthcheck,
status
to
S3
GC
and
thread
servlet,
pla:orm,
interface
servlets,
JMX
interface,
AppDynamics
dump
logging
jars
for
dependent
services
Servo
autoscale
machineagent
Epic
50. Security
Architecture
• Instance
Level
Security
baked
into
base
AMI
– Login:
ssh
only
allowed
via
portal
(not
between
instances)
– Each
app
type
runs
as
its
own
userid
app{test|prod}
• AWS
Security,
IdenMty
and
Access
Management
– Each
app
has
its
own
security
group
(firewall
ports)
– Fine
grain
user
roles
and
resource
ACLs
• Key
Management
– AWS
Keys
dynamically
provisioned,
easy
updates
– High
grade
app
specific
key
management
support
51. ConMnuous
IntegraMon
/
Release
Lightweight
process
scales
as
the
organizaMon
grows
• No
centralized
two-‐week
sprint/release
“train”
• Thousands
of
builds
a
day,
tens
of
releases
• Engineers
release
at
their
own
pace
• Unit
of
release
is
a
web
service,
over
200
so
far…
• Dependencies
handled
as
excepMons
52. Hello
World?
Ge•ng
started
for
a
new
developer…
• Register
the
“helloadrian”
app
name
in
Asgard
• Get
the
example
helloworld
code
from
perforce
• Edit
some
properMes
to
update
the
name
etc.
• Check-‐in
the
changes
• Clone
a
Jenkins
build
job
• Build
the
code
• Bake
the
code
into
an
Amazon
Machine
Image
• Use
Asgard
to
setup
an
AutoScaleGroup
with
the
AMI
• Check
instance
healthcheck
is
“Up”
using
Asgard
• Hit
the
URL
to
get
“HTTP
200,
Hello”
back
53. Register
new
applicaMon
name
naming
rules:
all
lower
case
with
underscore,
no
spaces
or
dashes
59. Portals
and
Explorers
• Ne:lix
ApplicaMon
Console
(Asgard/NAC)
– Primary
AWS
provisioning/config
interface
• AWS
Usage
Analyzer
– Breaks
down
costs
by
applicaMon
and
resource
• Cassandra
Explorer
– Browse
clusters,
keyspaces,
column
families
• Base
Server
Explorer
– Browse
service
endpoints
configuraMon,
perf
61. Pla:orm
Services
• Discovery
–
service
registry
for
“ApplicaMons”
• IntrospecMon
–
Entrypoints
• Cryptex
–
Dynamic
security
key
management
• Geo
–
Geographic
IP
lookup
• ConfiguraMon
Service
–
Dynamic
properMes
• LocalizaMon
–
manage
and
lookup
local
translaMons
• Evcache
–
ephemeral
volaMle
cache
• Cassandra
–
Cross
zone/region
distributed
data
store
• Zookeeper
–
Distributed
CoordinaMon
(Curator)
• Various
proxies
–
access
to
old
datacenter
stuff
62. IntrospecMon
-‐
Entrypoints
• REST
API
for
tools,
apps,
explorers,
monkeys…
– E.g.
GET
/REST/v1/instance/$INSTANCE_ID
• AWS
Resources
– Autoscaling
Groups,
EIP
Groups,
Instances
• Ne:lix
PaaS
Resources
– Discovery
ApplicaMons,
Clusters
of
ASGs,
History
• Full
History
of
all
Resources
– Supports
Janitor
Monkey
cleanup
of
unused
resources
63. Entrypoints
Queries
MongoDB
used
for
low
traffic
complex
queries
against
complex
objects
Descrip2on
Range
expression
Find
all
acMve
instances.
all()
Find
all
instances
associated
with
a
group
%(cloudmonkey)
name.
Find
all
instances
associated
with
a
/^cloudmonkey$/discovery()
discovery
group.
Find
all
auto
scale
groups
with
no
instances.
asg(),-‐has(INSTANCES;asg())
How
many
instances
are
not
in
an
auto
count(all(),-‐info(eval(INSTANCES;asg())))
scale
group?
What
groups
include
an
instance?
*(i-‐4e108521)
What
auto
scale
groups
and
elasMc
load
filter(TYPE;asg,elb;*(i-‐4e108521))
balancers
include
an
instance?
What
instance
has
a
given
public
ip?
filter(PUBLIC_IP;174.129.188.{0..255};all())
64. Metrics
Framework
• System
and
ApplicaMon
– CollecMon,
AggregaMon,
Querying
and
ReporMng
– Non-‐blocking
logging,
avoids
log4j
lock
contenMon
– Honu-‐Streaming
-‐>
S3
-‐>
EMR
-‐>
Hive
• Performance,
Robustness,
Monitoring,
Analysis
– Tracers,
Counters
–
explicit
code
instrumentaMon
log
– SLA
–
service
level
response
Mme
percenMles
– Servo
annotated
JMX
extract
to
Cloudwatch
• Latency
TesMng
and
InspecMon
Infrastructure
– Latency
Monkey
injects
random
delays
and
errors
into
service
responses
– Base
Server
Explorer
Inspect
client
Mmeouts
– Global
property
management
to
change
client
Mmeouts
65. Interprocess
Communica2on
• Discovery
Service
registry
for
“applicaMons”
– “here
I
am”
call
every
30s,
drop
a7er
3
missed
– “where
is
everyone”
call
– Redundant,
distributed,
moving
to
Zookeeper
• NIWS
–
Ne:lix
Internal
Web
Service
client
– So7ware
Middle
Tier
Load
Balancer
– Failure
retry
moves
to
next
instance
– Many
opMons
for
encoding,
etc.
66. Security
Key
Management
• AKMS
– Dynamic
Key
Management
interface
– Update
AWS
keys
at
runMme,
no
restart
– All
keys
stored
securely,
none
on
disk
or
in
AMI
• Cryptex
-‐
Flexible
key
store
– Low
grade
keys
processed
in
client
– Medium
grade
keys
processed
by
Cryptex
service
– High
grade
keys
processed
by
hardware
(Ingrian)
67. AWS
Persistence
Services
• SimpleDB
– Got
us
started,
migrated
to
Cassandra
now
– NFSDB
-‐
Instrumented
wrapper
library
– Domain
and
Item
sharding
(workarounds)
• S3
– Upgraded/Instrumented
JetS3t
based
interface
– Supports
mulMpart
upload
and
5TB
files
– Global
S3
endpoint
management
68. Ne5lix
Pla5orm
Persistence
• Ephemeral
VolaMle
Cache
–
evcache
– Discovery-‐aware
memcached
based
backend
– Client
abstracMons
for
zone
aware
replicaMon
– OpMon
to
write
to
all
zones,
fast
read
from
local
• Cassandra
– Highly
available
and
scalable
(more
later…)
• MongoDB
– Complex
object/query
model
for
small
scale
use
• MySQL
– Hard
to
scale,
legacy
and
small
relaMonal
models
69. Priam
–
Cassandra
AutomaMon
Available
at
h=p://github.com/ne:lix
• Ne:lix
Pla:orm
Tomcat
Code
• Zero
touch
auto-‐configuraMon
• State
management
for
Cassandra
JVM
• Token
allocaMon
and
assignment
• Broken
node
auto-‐replacement
• Full
and
incremental
backup
to
S3
• Restore
sequencing
from
S3
• Grow/Shrink
Cassandra
“ring”
70. Astyanax
Available
at
h=p://github.com/ne:lix
• Cassandra
java
client
• API
abstracMon
on
top
of
Thri7
protocol
• “Fixed”
ConnecMon
Pool
abstracMon
(vs.
Hector)
– Round
robin
with
Failover
– Retry-‐able
operaMons
not
Med
to
a
connecMon
– Ne:lix
PaaS
Discovery
service
integraMon
– Host
reconnect
(fixed
interval
or
exponenMal
backoff)
– Token
aware
to
save
a
network
hop
–
lower
latency
– Latency
aware
to
avoid
compacMng/repairing
nodes
–
lower
variance
• Batch
mutaMon:
set,
put,
delete,
increment
• Simplified
use
of
serializers
via
method
overloading
(vs.
Hector)
• ConnecMonPoolMonitor
interface
for
counters
and
tracers
• Composite
Column
Names
replacing
deprecated
SuperColumns
71. Astyanax
Query
Example
Paginate
through
all
columns
in
a
row
ColumnList<String>
columns;
int
pageize
=
10;
try
{
RowQuery<String,
String>
query
=
keyspace
.prepareQuery(CF_STANDARD1)
.getKey("A")
.setIsPaginaMng()
.withColumnRange(new
RangeBuilder().setMaxSize(pageize).build());
while
(!(columns
=
query.execute().getResult()).isEmpty())
{
for
(Column<String>
c
:
columns)
{
}
}
}
catch
(ConnecMonExcepMon
e)
{
}
72. High
Availability
• Cassandra
stores
3
local
copies,
1
per
zone
– Synchronous
access,
durable,
highly
available
– Read/Write
One
fastest,
least
consistent
-‐
~1ms
– Read/Write
Quorum
2
of
3,
consistent
-‐
~3ms
• AWS
Availability
Zones
– Separate
buildings
– Separate
power
etc.
– Fairly
close
together
73. “TradiMonal”
Cassandra
Write
Data
Flows
Single
Region,
MulMple
Availability
Zone,
Not
Token
Aware
Cassandra
• Disks
• Zone
A
2
2
4
2
1. Client
Writes
to
any
Cassandra
3
3
Cassandra
If
a
node
goes
offline,
Cassandra
Node
• Disks
5 • Disks
5
hinted
handoff
2. Coordinator
Node
• Zone
C
1 • Zone
A
completes
the
write
replicates
to
nodes
when
the
node
comes
and
Zones
Non
Token
back
up.
3. Nodes
return
ack
to
Aware
coordinator
Clients
Requests
can
choose
to
4. Coordinator
returns
3
wait
for
one
node,
a
Cassandra
Cassandra
ack
to
client
• Disks
• Disks
5
quorum,
or
all
nodes
to
5. Data
wri=en
to
• Zone
C
• Zone
B
ack
the
write
internal
commit
log
disk
(no
more
than
Cassandra
SSTable
disk
writes
and
• Disks
10
seconds
later)
• Zone
B
compacMons
occur
asynchronously
74. Astyanax
-‐
Cassandra
Write
Data
Flows
Single
Region,
MulMple
Availability
Zone,
Token
Aware
Cassandra
• Disks
• Zone
A
1. Client
Writes
to
Cassandra
2
2
Cassandra
If
a
node
goes
offline,
nodes
and
Zones
• Disks
3 • Disks
3
hinted
handoff
2. Nodes
return
ack
to
• Zone
C
1 • Zone
A
completes
the
write
client
3. Data
wri=en
to
Token
when
the
node
comes
back
up.
internal
commit
log
Aware
disks
(no
more
than
Clients
2
Requests
can
choose
to
10
seconds
later)
Cassandra
Cassandra
wait
for
one
node,
a
• Disks
• Disks
3
quorum,
or
all
nodes
to
• Zone
C
• Zone
B
ack
the
write
Cassandra
SSTable
disk
writes
and
• Disks
• Zone
B
compacMons
occur
asynchronously
75. Data
Flows
for
MulM-‐Region
Writes
Token
Aware,
Consistency
Level
=
Local
Quorum
1. Client
writes
to
local
replicas
If
a
node
or
region
goes
offline,
hinted
handoff
2. Local
write
acks
returned
to
completes
the
write
when
the
node
comes
back
up.
Client
which
conMnues
when
Nightly
global
compare
and
repair
jobs
ensure
2
of
3
local
nodes
are
everything
stays
consistent.
commi=ed
3. Local
coordinator
writes
to
remote
coordinator.
Cassandra
100+ms
latency
4. When
data
arrives,
remote
Cassandra
• Disks
• Disks
• Zone
A
• Zone
A
coordinator
node
acks
and
Cassandra
2
2
Cassandra
Cassandra
4
Cassandra
6
6
3
5
Disks
6
copies
to
other
remote
zones
6
• Disks
• Disks
• Zone
C
• Zone
A
•
• Zone
C
4
Disks
A
•
• Zone
1
4
5. Remote
nodes
ack
to
local
US
EU
coordinator
Clients
Clients
Cassandra
2
Cassandra
Cassandra
5
Cassandra
6. Data
flushed
to
internal
• Disks
• Zone
C
• Disks
6
• Zone
B
• Disks
• Zone
C
• Disks
6
• Zone
B
commit
log
disks
(no
more
Cassandra
Cassandra
than
10
seconds
later)
• Disks
• Disks
• Zone
B
• Zone
B
77. Rules
of
the
Roadie
• Don’t
lose
stuff
• Make
sure
it
scales
• Figure
out
when
it
breaks
and
what
broke
• Yell
at
the
right
guy
to
fix
it
• Keep
everything
organized
78. Cassandra
Backup
• Full
Backup
Cassandra
Cassandra
Cassandra
– Time
based
snapshot
– SSTable
compress
-‐>
S3
Cassandra
Cassandra
• Incremental
S3
Backup
Cassandra
Cassandra
– SSTable
write
triggers
compressed
copy
to
S3
Cassandra
Cassandra
• Archive
Cassandra
Cassandra
– Copy
cross
region
A
79. ETL
for
Cassandra
• Data
is
de-‐normalized
over
many
clusters!
• Too
many
to
restore
from
backups
for
ETL
• SoluMon
–
read
backup
files
using
Hadoop
• Aegisthus
– h=p://techblog.ne:lix.com/2012/02/aegisthus-‐bulk-‐data-‐pipeline-‐out-‐of.html
– High
throughput
raw
SSTable
processing
– Re-‐normalizes
many
clusters
to
a
consistent
view
– Extract,
Transform,
then
Load
into
Teradata
80. Cassandra
Archive
A
Appropriate
level
of
paranoia
needed…
• Archive
could
be
un-‐readable
– Restore
S3
backups
weekly
from
prod
to
test,
and
daily
ETL
• Archive
could
be
stolen
– PGP
Encrypt
archive
• AWS
East
Region
could
have
a
problem
– Copy
data
to
AWS
West
• ProducMon
AWS
Account
could
have
an
issue
– Separate
Archive
account
with
no-‐delete
S3
ACL
• AWS
S3
could
have
a
global
problem
– Create
an
extra
copy
on
a
different
cloud
vendor….
81. Tools
and
AutomaMon
• Developer
and
Build
Tools
– Jira,
Perforce,
Eclipse,
Jenkins,
Ivy,
ArMfactory
– Builds,
creates
.war
file,
.rpm,
bakes
AMI
and
launches
• Custom
Ne:lix
ApplicaMon
Console
– AWS
Features
at
Enterprise
Scale
(hide
the
AWS
security
keys!)
– Auto
Scaler
Group
is
unit
of
deployment
to
producMon
• Open
Source
+
Support
– Apache,
Tomcat,
Cassandra,
Hadoop
– Datastax
support
for
Cassandra,
AWS
support
for
Hadoop
via
EMR
• Monitoring
Tools
– Alert
processing
gateway
into
Pagerduty
– AppDynamics
–
Developer
focus
for
cloud
h=p://appdynamics.com
82. Scalability
TesMng
• Cloud
Based
TesMng
–
fricMonless,
elasMc
– Create/destroy
any
sized
cluster
in
minutes
– Many
test
scenarios
run
in
parallel
• Test
Scenarios
– Internal
app
specific
tests
– Simple
“stress”
tool
provided
with
Cassandra
• Scale
test,
keep
making
the
cluster
bigger
– Check
that
tooling
and
automaMon
works…
– How
many
ten
column
row
writes/sec
can
we
do?
86. Chaos
Monkey
• Computers
(Datacenter
or
AWS)
randomly
die
– Fact
of
life,
but
too
infrequent
to
test
resiliency
• Test
to
make
sure
systems
are
resilient
– Allow
any
instance
to
fail
without
customer
impact
• Chaos
Monkey
hours
– Monday-‐Thursday
9am-‐3pm
random
instance
kill
• ApplicaMon
configuraMon
opMon
– Apps
now
have
to
opt-‐out
from
Chaos
Monkey
87. Responsibility
and
Experience
• Make
developers
responsible
for
failures
– Then
they
learn
and
write
code
that
doesn’t
fail
• Use
Incident
Reviews
to
find
gaps
to
fix
– Make
sure
its
not
about
finding
“who
to
blame”
• Keep
Mmeouts
short,
fail
fast
– Don’t
let
cascading
Mmeouts
stack
up
• Make
configuraMon
opMons
dynamic
– You
don’t
want
to
push
code
to
tweak
an
opMon
89. PaaS
OperaMonal
Model
• Developers
– Provision
and
run
their
own
code
in
producMon
– Take
turns
to
be
on
call
if
it
breaks
(pagerduty)
– Configure
autoscalers
to
handle
capacity
needs
• DevOps
and
PaaS
(aka
NoOps)
– DevOps
is
used
to
build
and
run
the
PaaS
– PaaS
constrains
Dev
to
use
automaMon
instead
– PaaS
puts
more
responsibility
on
Dev,
with
tools
90. What’s
Le7
for
Corp
IT?
• Corporate
Security
and
Network
Management
– Billing
and
remnants
of
streaming
service
back-‐ends
in
DC
• Running
Ne:lix’
DVD
Business
– Tens
of
Oracle
instances
Corp
WiFi
Performance
– Hundreds
of
MySQL
instances
– Thousands
of
VMWare
VMs
– Zabbix,
CacM,
Splunk,
Puppet
• Employee
ProducMvity
– Building
networks
and
WiFi
– SaaS
OneLogin
SSO
Portal
– Evernote
Premium,
Safari
Online
Bookshelf,
Dropbox
for
Teams
– Google
Enterprise
Apps,
Workday
HCM/Expense,
Box.com
– Many
more
SaaS
migraMons
coming…
91. ImplicaMons
for
IT
OperaMons
• Cloud
is
run
by
developer
organizaMon
– Product
group’s
“IT
department”
is
the
AWS
API
and
PaaS
– CorpIT
handles
billing
and
some
security
funcMons
Cloud
capacity
is
10x
bigger
than
Datacenter
– Datacenter
oriented
IT
didn’t
scale
up
as
we
grew
– We
moved
a
few
people
out
of
IT
to
do
DevOps
for
our
PaaS
• TradiMonal
IT
Roles
and
Silos
are
going
away
– We
don’t
have
SA,
DBA,
Storage,
Network
admins
for
cloud
– Developers
deploy
and
“run
what
they
wrote”
in
producMon
97. Jenkins
Architecture
x86_64
slave
11
x86_64
slave
1
x86_64
slave
buildnode01
slave
buildnode01
1
x86_64
slave
Standard
buildnode01
custom
slaves
custom
slaves
buildnode01
group
custom
slaves
misc.
architecture
custom
slaves
misc.
architecture
Amazon
Linux
misc.
architecture
custom
slaves
Single
Master
misc.
architecture
Ad-‐hoc
slaves
m1.xlarge
misc.
architecture
Red
Hat
Linux
misc.
O/S
&
2x
quad
core
x86_64
architectures
26G
RAM
x86_64
slave
11
x86_64
slave
slave
Custom
~40
custom
slaves
buildnode01
1
x86_64
slave
buildnode01
group
buildnode01
maintained
by
product
Amazon
Linux
teams
various
us-‐west-‐1
VPC
Ne:lix
data
center
Ne:lix
data
center
and
office
98. Other
Uses
of
Jenkins
Maintence
of
test
and
prod
Cassandra
clusters
Automated
integraMon
tests
for
bake
and
deploy
ProducMon
bake
and
deployment
Housekeeping
of
the
build
/
deploy
infrastructure
99. Ne:lix
Extensions
to
Jenkins
" Job
DSL
plugin:
allow
jobs
to
be
set
up
with
minimal
definiMon,
using
templates
and
a
Groovy-‐based
DSL
" Housekeeping
and
maintenance
processes
implemented
as
Jenkins
jobs,
system
Groovy
scripts
100. The
DynaSlave
Plugin
What
We
Have
" Exposes
a
new
endpoint
in
Jenkins
that
EC2
instances
in
VPC
use
for
registraMon
" Allows
a
slave
to
name
itself,
label
itself,
tell
Jenkins
how
many
executors
it
can
support
" EC2
==
Ephemeral.
Disconnected
nodes
that
are
gone
for
>
30
mins
are
reaped
" Sizing
handled
by
EC2
ASGs,
tweaks
passed
through
via
user
data
(labels,
names,
etc)
101. The
DynaSlave
Plugin
What’s
Next
" Enhanced
security/registraMon
of
nodes
" Dynamic
resource
management
" have
Jenkins
respond
to
build
demand
" Slave
groups
" Allows
us
to
create
specialized
pools
of
build
nodes
" Refresh
mechanism
for
slave
tools
" JDKs,
Ant
versions,
etc.
" Give
it
back
to
the
community
" watch
techblog.ne:lix.com!
102. The
Bakery
• Create
base
AMIs
– We
have
CentOS,
Ubuntu
and
Windows
base
AMIs
– All
the
generic
code,
apache,
tomcat
etc.
– Standard
system
and
applicaMon
monitoring
tools
– Update
~monthly
with
patches
and
new
versions
• Add
yummy
topping
and
bake
– Build
app
specific
AMI
including
all
code
etc.
– Bakery
mounts
EBS
snapshot,
installs
and
bakes
– One
bakery
per
region,
delivers
into
paastest
– Tweak
config
and
publish
AMI
to
paasprod
104. Accounts
Isolate
Concerns
• paastest
–
for
development
and
tesMng
– Fully
funcMonal
deployment
of
all
services
– Developer
tagged
“stacks”
for
separaMon
• paasprod
–
for
producMon
– Autoscale
groups
only,
isolated
instances
are
terminated
– Alert
rouMng,
backups
enabled
by
default
• paasaudit
–
for
sensiMve
services
– To
support
SOX,
PCI,
etc.
– Extra
access
controls,
audiMng
• paasarchive
–
for
disaster
recovery
– Long
term
archive
of
backups
– Different
region,
perhaps
different
vendor
105. ReservaMons
and
Billing
• Consolidated
Billing
– Combine
all
accounts
into
one
bill
– Pooled
capacity
for
bigger
volume
discounts
h=p://docs.amazonwebservices.com/AWSConsolidatedBilling/1.0/AWSConsolidatedBillingGuide.html
• ReservaMons
– Save
up
to
71%
on
your
baseline
load
– Priority
when
you
request
reserved
capacity
– Unused
reservaMons
are
shared
across
accounts
106. Cloud
Access
Gateway
• Datacenter
or
office
based
– A
separate
VM
for
each
AWS
account
– Two
per
account
for
high
availability
– Mount
NFS
shared
home
directories
for
developers
– Instances
trust
the
gateway
via
a
security
group
• Manage
how
developers
login
to
cloud
– Access
control
via
ldap
group
membership
– Audit
logs
of
every
login
to
the
cloud
– Similar
to
awsfabrictasks
ssh
wrapper
h=p://readthedocs.org/docs/awsfabrictasks/en/latest/
108. Now
Add
Code
Ne:lix
has
open
sourced
a
lot
of
what
you
need,
more
is
on
the
way…
109. Ne:lix
Open
Source
Strategy
• Release
PaaS
Components
git-‐by-‐git
– Source
at
github.com/ne:lix
–
we
build
from
it…
– Intros
and
techniques
at
techblog.ne:lix.com
– Blog
post
or
new
code
every
few
weeks
• MoMvaMons
– Give
back
to
Apache
licensed
OSS
community
– MoMvate,
retain,
hire
top
engineers
– “Peer
pressure”
code
cleanup,
external
contribuMons
110. Open
Source
Projects
and
Posts
Legend
Github
/
Techblog
Priam
Exhibitor
Servo
and
Autoscaling
Cassandra
as
a
Service
Zookeeper
as
a
Service
Scripts
Apache
ContribuMons
Astyanax
Honu
Curator
Techblog
Post
Cassandra
client
for
Log4j
streaming
to
Zookeeper
Pa=erns
Java
Hadoop
Coming
Soon
EVCache
CassJMeter
Circuit
Breaker
Memcached
as
a
Cassandra
test
suite
Robust
service
pa=ern
Service
Cassandra
Asgard
Discovery
Service
MulM-‐region
EC2
AutoScaleGroup
based
Directory
datastore
support
AWS
console
Aegisthus
ConfiguraMon
Chaos
Monkey
Hadoop
ETL
for
ProperMes
Service
Robustness
verificaMon
Cassandra
111. Asgard
Not
quite
out
yet…
• Runs
in
a
VM
in
our
datacenter
– So
it
can
deploy
to
an
empty
account
– Groovy/Grails/JVM
based
– Supports
all
AWS
regions
on
a
global
basis
• Hides
the
AWS
credenMals
– Use
AWS
IAM
to
issue
restricted
keys
for
Asgard
– Each
Asgard
instance
manages
one
account
– One
install
each
for
paastest,
paasprod,
paasaudit
112. “Discovery”
-‐
Service
Directory
• Map
an
instance
to
a
service
type
– Load
balance
over
clusters
of
instances
– Private
namespace,
so
DNS
isn’t
useful
– FoundaMon
service,
first
to
deploy
• Highly
available
distributed
coordinaMon
– Deploy
one
Apache
Zookeeper
instance
per
zone
– Ne:lix
Curator
includes
simple
discovery
service
– Ne:lix
Exhibitor
manages
Zookeeper
reliably
113. ConfiguraMon
ProperMes
Service
• Dynamic
hierarchical
&
propagates
in
seconds
– Client
Mmeouts,
feature
set
enables
– Region
specific
service
endpoints
– Cassandra
token
assignments
etc.
etc.
• Used
to
configure
everything
– So
everything
depends
on
it…
– Coming
soon
to
github
– Pluggable
backend
storage
interface
114. Persistence
services
• Use
SimpleDB
as
a
bootstrap
– Good
use
case
for
DynamoDB
or
SimpleDB
• Ne:lix
Priam
– Cassandra
automaMon
115. Monitoring,
alert
forwarding
• MulMple
monitoring
systems
– Internally
developed
data
collecMon
runs
on
AWS
– AppDynamics
APM
product
runs
as
external
SaaS
– When
one
breaks
the
other
is
usually
OK…
• Alerts
routed
to
the
developer
of
that
app
– Alert
gateway
combines
alerts
from
all
sources
– DeduplicaMon,
source
quenching,
rouMng
– Warnings
sent
via
email,
criMcal
via
pagerduty
116. Backups,
archives
• Cassandra
Backup
via
Priam
to
S3
bucket
– Create
versioned
S3
bucket
with
TTL
opMon
– Setup
service
to
encrypt
and
copy
to
archive
• Archive
Account
with
Read/Write
ACL
to
prod
– Setup
in
a
different
AWS
region
from
producMon
– Create
versioned
S3
bucket
with
TTL
opMon
117. Chaos
Monkey
• Install
it
on
day
1
in
test
and
producMon
• Prevents
people
from
doing
local
persistence
• Kill
anything
not
protected
by
an
ASG
• Supports
whitelist
for
temporary
do-‐not-‐kill
• Open
source
soon,
code
cleanup
in
progress…
118. You
take
it
from
here…
• Keep
watching
github
for
more
goodies
• Add
your
own
code
• Let
us
know
what
you
find
useful
• Bugs,
patches
and
addiMons
all
welcome
• See
you
at
AWS
Re:Invent?
119. Roadmap
for
2012
• More
resiliency
and
improved
availability
• More
automaMon,
orchestraMon
• “Hardening”
the
pla:orm,
code
clean-‐up
• Lower
latency
for
web
services
and
devices
• IPv6
support
• More
open
sourced
components
120. Wrap
Up
Answer
your
remaining
quesMons…
What
was
missing
that
you
wanted
to
cover?
121. Takeaway
NeVlix
has
built
and
deployed
a
scalable
global
PlaVorm
as
a
Service.
Key
components
of
the
NeVlix
PaaS
are
being
released
as
Open
Source
projects
so
you
can
build
your
own
custom
PaaS.
h=p://github.com/Ne:lix
h=p://techblog.ne:lix.com
h=p://slideshare.net/Ne:lix
h=p://www.linkedin.com/in/adriancockcro7
@adrianco
#ne:lixcloud
End
of
Part
3
of
3
123. You
want
an
Encore?
If
there
is
enough
Mme…
(there
wasn’t)
Something
for
the
hard
core
complex
adapMve
systems
people
to
digest.
125. Workload
CharacterisMcs
• A
quick
tour
through
a
taxonomy
of
workload
types
• Start
with
the
easy
ones
and
work
up
• Why
personalized
workloads
are
different
and
hard
• Some
examples
and
coping
strategies
5/15/12
Slide
254
126. Simple
Random
Arrivals
• Random
arrival
of
transacMons
with
fixed
mean
service
Mme
– Li=le’s
Law:
QueueLength
=
Throughput
*
Response
– UMlizaMon
Law:
UMlizaMon
=
Throughput
*
ServiceTime
• Complex
models
are
o7en
reduced
to
this
model
– By
averaging
over
longer
Mme
periods
since
the
formulas
only
work
if
you
have
stable
averages
– By
wishful
thinking
(i.e.
how
to
fool
yourself)
5/15/12
Slide
255
127. Mixed
random
arrivals
of
transacMons
with
stable
mean
service
Mmes
• Think
of
the
grocery
store
checkout
analogy
– Trolleys
full
of
shopping
vs.
baskets
full
of
shopping
– Baskets
are
quick
to
service,
but
get
stuck
behind
carts
– RelaMve
mixture
of
transacMon
types
starts
to
ma=er
• Many
transacMonal
systems
handle
a
mixture
– Databases,
web
services
• Consider
separaMng
fast
and
slow
transacMons
– So
that
we
have
a
“10
items
or
less”
line
just
for
baskets
– Separate
pools
of
servers
for
different
services
– The
old
rule
-‐
don’t
mix
OLTP
with
DSS
queries
in
databases
• Performance
is
o7en
thread-‐limited
– Thread
limit
and
slow
transacMons
constrains
maximum
throughput
• Model
mix
using
analyMcal
solvers
(e.g.
PDQ
perfdynamics.com)
5/15/12
Slide
256
128. Load
dependent
servers
–
varying
mean
service
Mmes
• Mean
service
Mme
may
increase
at
high
throughput
– Due
to
non-‐scalable
algorithms,
lock
contenMon
– System
runs
out
of
memory
and
starts
paging
or
frequent
GC
• Mean
service
Mme
may
also
decrease
at
high
throughput
– Elevator
seek
and
write
cancellaMon
opMmizaMons
in
storage
– Load
shedding
and
simplified
fallback
modes
• Systems
have
“Mpping
points”
if
the
service
Mme
increases
– Hysteresis
means
they
don’t
come
back
when
load
drops
– This
is
why
you
have
to
kill
catatonic
systems
– Best
designs
shed
load
to
be
stable
at
the
limit
–
circuit
breaker
pa=ern
– PracMcal
opMon
is
to
try
to
avoid
Mpping
points
by
reducing
variance
• Model
using
discrete
event
simulaMon
tools
– Behaviour
is
non-‐linear
and
hard
to
model
5/15/12
Slide
257
129. Self-‐similar
/
fractal
workloads
• Bursty
rather
than
random
arrival
rates
• Self-‐similar
– Looks
“random”
at
close
up,
stays
“random”
as
you
zoom
out
– Work
arrives
in
bursts,
transacMons
aren’t
independent
– Bursts
cluster
together
in
super-‐bursts,
etc.
• Network
packet
streams
tend
to
be
fractal
• Common
in
pracMce,
too
hard
to
model
– Probably
the
most
common
reason
why
your
model
is
wrong!
5/15/12
Slide
258