Scalability, Availability & Stability Patterns

Scalability,
Availability &
Stability
Patterns
Jonas Bonér
CTO Typesafe
twitter: @jboner

General
recommendations
• Immutability as the default
• Referential Transparency (FP)
• Laziness
• Think about your data:
• Different data need different guarantees

Trade-offs
•Performance vs Scalability
•Latency vs Throughput
•Availability vs Consistency

How do I know if I have a
performance problem?

performance problem?
If your system is
slow for a single user

scalability problem?

scalability problem?
If your system is
fast for a single user
but slow under heavy load

You should strive for
maximal throughput
with
acceptable latency

You can only pick 2
Consistency
Availability
Partition tolerance
At a given point in time

Centralized system
• In a centralized system (RDBMS etc.)
we don’t have network partitions, e.g.
P in CAP
• So you get both:
•Availability
•Consistency

Atomic
Consistent
Isolated
Durable

Distributed system
• In a distributed system we (will) have
network partitions, e.g. P in CAP
• So you get to only pick one:
•Availability
•Consistency

CAP in practice:
• ...there are only two types of systems:
1. CP
2. AP
• ...there is only one choice to make. In
case of a network partition, what do
you sacriﬁce?
1. C: Consistency
2. A:Availability

Basically Available
Soft state
Eventually consistent

Eventual Consistency
...is an interesting trade-off

Eventual Consistency
...is an interesting trade-off
But let’s get back to that later

•Fail-over
•Replication
• Master-Slave
• Tree replication
• Master-Master
• Buddy Replication
Availability Patterns

What do we mean with
Availability?

Fail-over
Copyright
Michael Nygaard

Fail-over
But fail-over is not always this simple
Copyright
Michael Nygaard

Fail-back
Copyright
Michael Nygaard

• Active replication - Push
• Passive replication - Pull
• Data not available, read from peer,
then store it locally
• Works well with timeout-based
caches
Replication

• Master-Slave replication
• Tree Replication
• Master-Master replication
• Buddy replication
Replication

•Partitioning
•HTTP Caching
•RDBMS Sharding
•NOSQL
•Distributed Caching
•Data Grids
•Concurrency
Scalability Patterns: State

HTTP Caching
Reverse Proxy
• Varnish
• Squid
• rack-cache
• Pound
• Nginx
• Apache mod_proxy
• Trafﬁc Server

Generate Static Content
Precompute content
• Homegrown + cron or Quartz
• Spring Batch
• Gearman
• Hadoop
• Google Data Protocol
• Amazon Elastic MapReduce

HTTP Caching
Subsequent request

Service of Record
•Relational Databases (RDBMS)
•NOSQL Databases

Sharding
•Partitioning
•Replication

ORM + rich domain model
anti-pattern
•Attempt:
• Read an object from DB
•Result:
• You sit with your whole database in your lap

Think about your data
• When do you need ACID?
• When is Eventually Consistent a better ﬁt?
• Different kinds of data has different needs
Think again

When is
a RDBMS
not
good enough?

Scaling reads
to a RDBMS
is hard

Scaling writes
to a RDBMS
is impossible

Do we
really need
a RDBMS?
Sometimes...

Do we
really need
a RDBMS?
But many times we don’t

•Key-Value databases
•Column databases
•Document databases
•Graph databases
•Datastructure databases
NOSQL

Who’s ACID?
• Relational DBs (MySQL, Oracle, Postgres)
• Object DBs (Gemstone, db4o)
• Clustering products (Coherence,
Terracotta)
• Most caching products (ehcache)

Who’s BASE?
Distributed databases
• Cassandra
• Riak
• Voldemort
• Dynomite,
• SimpleDB
• etc.

• Google: Bigtable
• Amazon: Dynamo
• Amazon: SimpleDB
• Yahoo: HBase
• Facebook: Cassandra
• LinkedIn: Voldemort
NOSQL in the wild

• Distributed Hash Tables (DHT)
• Scalable
• Partitioned
• Fault-tolerant
• Decentralized
• Peer to peer
• Popularized
• Node ring
• Consistent Hashing
Chord & Pastry

Node ring with Consistent Hashing
Find data in log(N) jumps

“How can we build a DB on top of Google
File System?”
• Paper: Bigtable:A distributed storage system
for structured data, 2006
• Rich data-model, structured storage
• Clones:
HBase
Hypertable
Neptune
Bigtable

“How can we build a distributed
hash table for the data center?”
• Paper: Dynamo:Amazon’s highly available key-
value store, 2007
• Focus: partitioning, replication and availability
• Eventually Consistent
• Clones:
Voldemort
Dynomite
Dynamo

Types of NOSQL stores
• Key-Value databases (Voldemort, Dynomite)
• Column databases (Cassandra,Vertica, Sybase IQ)
• Document databases (MongoDB, CouchDB)
• Graph databases (Neo4J,AllegroGraph)
• Datastructure databases (Redis, Hazelcast)

•Write-through
•Write-behind
•Eviction Policies
•Replication
•Peer-To-Peer (P2P)
Distributed Caching

Eviction policies
• TTL (time to live)
• Bounded FIFO (first in first out)
• Bounded LIFO (last in first out)
• Explicit cache invalidation

Peer-To-Peer
• Decentralized
• No “special” or “blessed” nodes
• Nodes can join and leave as they please

•EHCache
•JBoss Cache
•OSCache
•memcached
Distributed Caching
Products

memcached
• Very fast
• Simple
• Key-Value (string -‐>
binary)
• Clients for most languages
• Distributed
• Not replicated - so 1/N chance
for local access in cluster

Data Grids/Clustering
Parallel data storage
• Data replication
• Data partitioning
• Continuous availability
• Data invalidation
• Fail-over
• C + P in CAP

Data Grids/Clustering
Products
• Coherence
• Terracotta
• GigaSpaces
• GemStone
• Tibco Active Matrix
• Hazelcast

•Shared-State Concurrency
•Message-Passing Concurrency
•Dataﬂow Concurrency
•Software Transactional Memory
Concurrency

•Everyone can access anything anytime
•Totally indeterministic
•Introduce determinism at well-deﬁned
places...
•...using locks
Shared-State Concurrency

•Problems with locks:
• Locks do not compose
• Taking too few locks
• Taking too many locks
• Taking the wrong locks
• Taking locks in the wrong order
• Error recovery is hard

Please use java.util.concurrent.*
• ConcurrentHashMap
• BlockingQueue
• ConcurrentQueue

• ExecutorService
• ReentrantReadWriteLock
• CountDownLatch
• ParallelArray
• and
much
much
more..

•Originates in a 1973 paper by Carl
Hewitt
•Implemented in Erlang, Occam, Oz
•Encapsulates state and behavior
•Closer to the deﬁnition of OO
than classes
Actors

Actors
• Share NOTHING
• Isolated lightweight processes
• Communicates through messages
• Asynchronous and non-blocking
• No shared state
… hence, nothing to synchronize.
• Each actor has a mailbox (message queue)

• Easier to reason about
• Raised abstraction level
• Easier to avoid
–Race conditions
–Deadlocks
–Starvation
–Live locks
Actors

• Akka (Java/Scala)
• scalaz actors (Scala)
• Lift Actors (Scala)
• Scala Actors (Scala)
• Kilim (Java)
• Jetlang (Java)
• Actor’s Guild (Java)
• Actorom (Java)
• FunctionalJava (Java)
• GPars (Groovy)
Actor libs for the JVM

• Declarative
• No observable non-determinism
• Data-driven – threads block until
data is available
• On-demand, lazy
• No difference between:
• Concurrent &
• Sequential code
• Limitations: can’t have side-effects
Dataﬂow Concurrency

STM:
Software
Transactional Memory

STM: overview
• See the memory (heap and stack)
as a transactional dataset
• Similar to a database
• begin
• commit
• abort/rollback
•Transactions are retried
automatically upon collision
• Rolls back the memory on abort

• Transactions can nest
• Transactions compose (yipee!!)
atomic
{

...

atomic
{

...

}

}
STM: overview

All operations in scope of
a transaction:
l Need to be idempotent
STM: restrictions

• Akka (Java/Scala)
• Multiverse (Java)
• Clojure STM (Clojure)
• CCSTM (Scala)
• Deuce STM (Java)
STM libs for the JVM

Scalability Patterns:
Behavior

•Event-Driven Architecture
•Compute Grids
•Load-balancing
•Parallel Computing
Scalability Patterns:
Behavior

Event-Driven
Architecture
“Four years from now,‘mere mortals’ will begin to
adopt an event-driven architecture (EDA) for the
sort of complex event processing that has been
attempted only by software gurus [until now]”
--Roy Schulte (Gartner), 2003

• Domain Events
• Event Sourcing
• Command and Query Responsibility
Segregation (CQRS) pattern
• Event Stream Processing
• Messaging
• Enterprise Service Bus
• Actors
• Enterprise Integration Architecture (EIA)
Event-Driven Architecture

Domain Events
“It's really become clear to me in the last
couple of years that we need a new building
block and that is the Domain Events”
-- Eric Evans, 2009

Domain Events
“Domain Events represent the state of entities
at a given time when an important event
occurred and decouple subsystems with event
streams. Domain Events give us clearer, more
expressive models in those cases.”
-- Eric Evans, 2009

Domain Events
“State transitions are an important part of
our problem space and should be modeled
within our domain.”
-- GregYoung, 2008

Event Sourcing
• Every state change is materialized in an Event
• All Events are sent to an EventProcessor
• EventProcessor stores all events in an Event Log
• System can be reset and Event Log replayed
• No need for ORM, just persist the Events
• Many different EventListeners can be added to
EventProcessor (or listen directly on the Event log)

“A single model cannot be appropriate
for reporting, searching and
transactional behavior.”
-- GregYoung, 2008
Command and Query
Responsibility Segregation
(CQRS) pattern

UnidirectionalUnidirectional
Unidirectional

CQRS
in a nutshell
• All state changes are represented by Domain Events
• Aggregate roots receive Commands and publish Events
• Reporting (query database) is updated as a result of the
published Events
•All Queries from Presentation go directly to Reporting
and the Domain is not involved

CQRS
Copyright by Axis Framework

CQRS: Beneﬁts
• Fully encapsulated domain that only exposes
behavior
• Queries do not use the domain model
• No object-relational impedance mismatch
• Bullet-proof auditing and historical tracing
• Easy integration with external systems
• Performance and scalability

Event Stream Processing
select
*
from

Withdrawal(amount>=200).win:length(5)

Event Stream Processing
Products
• Esper (Open Source)
• StreamBase
• RuleCast

Messaging
• Publish-Subscribe
• Point-to-Point
• Store-forward
• Request-Reply

Store-Forward
Durability, event log, auditing etc.

Request-Reply
F.e.AMQP’s ‘replyTo’ header

Messaging
• Standards:
• AMQP
• JMS
• Products:
• RabbitMQ (AMQP)
• ActiveMQ (JMS)
• Tibco
• MQSeries
• etc

ESB products
• ServiceMix (Open Source)
• Mule (Open Source)
• Open ESB (Open Source)
• Sonic ESB
• WebSphere ESB
• Oracle ESB
• Tibco
• BizTalk Server

Actors
• Fire-forget
• Async send
• Fire-And-Receive-Eventually
• Async send + wait on Future for reply

Enterprise Integration
Patterns

Enterprise Integration
Patterns
Apache Camel
• More than 80 endpoints
• XML (Spring) DSL
• Scala DSL

Compute Grids
Parallel execution
• Divide and conquer
1. Split up job in independent tasks
2. Execute tasks in parallel
3. Aggregate and return result
• MapReduce - Master/Worker

Compute Grids
Parallel execution
• Automatic provisioning
• Load balancing
• Fail-over
• Topology resolution

Compute Grids
Products
• Platform
• DataSynapse
• Google MapReduce
• Hadoop
• GigaSpaces
• GridGain

• Random allocation
• Round robin allocation
• Weighted allocation
• Dynamic load balancing
• Least connections
• Least server CPU
• etc.
Load balancing

Load balancing
• DNS Round Robin (simplest)
• Ask DNS for IP for host
• Get a new IP every time
• Reverse Proxy (better)
• Hardware Load Balancing

Load balancing products
• Reverse Proxies:
• Apache mod_proxy (OSS)
• HAProxy (OSS)
• Squid (OSS)
• Nginx (OSS)
• Hardware Load Balancers:
• BIG-IP
• Cisco

• UE: Unit of Execution
• Process
• Thread
• Coroutine
• Actor
Parallel Computing
• SPMD Pattern
• Master/Worker Pattern
• Loop Parallelism Pattern
• Fork/Join Pattern
• MapReduce Pattern

SPMD Pattern
• Single Program Multiple Data
• Very generic pattern, used in many
other patterns
• Use a single program for all the UEs
• Use the UE’s ID to select different
pathways through the program. F.e:
• Branching on ID
• Use ID in loop index to split loops
• Keep interactions between UEs explicit

Master/Worker
• Good scalability
• Automatic load-balancing
• How to detect termination?
• Bag of tasks is empty
• Poison pill
• If we bottleneck on single queue?
• Use multiple work queues
• Work stealing
• What about fault tolerance?
• Use “in-progress” queue

Loop Parallelism
•Workﬂow
1.Find the loops that are bottlenecks
2.Eliminate coupling between loop iterations
3.Parallelize the loop
•If too few iterations to pull its weight
• Merge loops
• Coalesce nested loops
•OpenMP
• omp
parallel
for

What if task creation can’t be handled by:
• parallelizing loops (Loop Parallelism)
• putting them on work queues (Master/Worker)

What if task creation can’t be handled by:
• parallelizing loops (Loop Parallelism)
• putting them on work queues (Master/Worker)
Enter
Fork/Join

•Use when relationship between tasks
is simple
•Good for recursive data processing
•Can use work-stealing
1. Fork:Tasks are dynamically created
2. Join:Tasks are later terminated and
data aggregated
Fork/Join

Fork/Join
•Direct task/UE mapping
• 1-1 mapping between Task/UE
• Problem: Dynamic UE creation is expensive
•Indirect task/UE mapping
• Pool the UE
• Control (constrain) the resource allocation
• Automatic load balancing

Java 7 ParallelArray (Fork/Join DSL)
Fork/Join

Java 7 ParallelArray (Fork/Join DSL)
ParallelArray
students
=

new
ParallelArray(fjPool,
data);
double
bestGpa
=
students.withFilter(isSenior)

.withMapping(selectGpa)

.max();
Fork/Join

• Origin from Google paper 2004
• Used internally @ Google
• Variation of Fork/Join
• Work divided upfront not dynamically
• Usually distributed
• Normally used for massive data crunching
MapReduce

• Hadoop (OSS), used @Yahoo
• Amazon Elastic MapReduce
• Many NOSQL DBs utilizes it
for searching/querying
MapReduce
Products

Parallel Computing
products
• MPI
• OpenMP
• JSR166 Fork/Join
• java.util.concurrent
• ExecutorService, BlockingQueue etc.
• ProActive Parallel Suite
• CommonJ WorkManager (JEE)

•Timeouts
•Circuit Breaker
•Let-it-crash
•Fail fast
•Bulkheads
•Steady State
•Throttling
Stability Patterns

Timeouts
Always use timeouts (if possible):
• Thread.wait(timeout)
• reentrantLock.tryLock
• blockingQueue.poll(timeout,
timeUnit)/
offer(..)
• futureTask.get(timeout,
timeUnit)
• socket.setSoTimeOut(timeout)
• etc.

Let it crash
• Embrace failure as a natural state in
the life-cycle of the application
• Instead of trying to prevent it;
manage it
• Process supervision
• Supervisor hierarchies (from Erlang)

Fail fast
• Avoid “slow responses”
• Separate:
• SystemError - resources not available
• ApplicationError - bad user input etc
• Verify resource availability before
starting expensive task
• Input validation immediately

Bulkheads
• Partition and tolerate
failure in one part
• Redundancy
• Applies to threads as well:
• One pool for admin tasks
to be able to perform tasks
even though all threads are
blocked

Steady State
• Clean up after you
• Logging:
• RollingFileAppender (log4j)
• logrotate (Unix)
• Scribe - server for aggregating streaming log data
• Always put logs on separate disk

Throttling
• Maintain a steady pace
• Count requests
• If limit reached, back-off (drop, raise error)
• Queue requests
• Used in for example Staged Event-Driven
Architecture (SEDA)

Client-side consistency
• Strong consistency
• Weak consistency
• Eventually consistent
• Never consistent

Client-side
Eventual Consistency levels
• Casual consistency
• Read-your-writes consistency (important)
• Session consistency
• Monotonic read consistency (important)
• Monotonic write consistency

Server-side consistency
N = the number of nodes that store replicas of
the data
W = the number of replicas that need to
acknowledge the receipt of the update before the
update completes
R = the number of replicas that are contacted
when a data object is accessed through a read operation

Server-side consistency
W + R > N strong consistency
W + R <= N eventual consistency

Scalability, Availability & Stability Patterns

More Related Content

Scalability, Availability & Stability Patterns