This document provides an overview of patterns for scalability, availability, and stability in distributed systems. It discusses general recommendations like immutability and referential transparency. It covers scalability trade-offs around performance vs scalability, latency vs throughput, and availability vs consistency. It then describes various patterns for scalability including managing state through partitioning, caching, sharding databases, and using distributed caching. It also covers patterns for managing behavior through event-driven architecture, compute grids, load balancing, and parallel computing. Availability patterns like fail-over, replication, and fault tolerance are discussed. The document provides examples of popular technologies that implement many of these patterns.
11. General
recommendations
• Immutability as the default
• Referential Transparency (FP)
• Laziness
• Think about your data:
• Different data need different guarantees
24. You can only pick 2
Consistency
Availability
Partition tolerance
At a given point in time
25. Centralized system
• In a centralized system (RDBMS etc.)
we don’t have network partitions, e.g.
P in CAP
• So you get both:
•Availability
•Consistency
27. Distributed system
• In a distributed system we (will) have
network partitions, e.g. P in CAP
• So you get to only pick one:
•Availability
•Consistency
28. CAP in practice:
• ...there are only two types of systems:
1. CP
2. AP
• ...there is only one choice to make. In
case of a network partition, what do
you sacrifice?
1. C: Consistency
2. A:Availability
42. • Active replication - Push
• Passive replication - Pull
• Data not available, read from peer,
then store it locally
• Works well with timeout-based
caches
Replication
64. ORM + rich domain model
anti-pattern
•Attempt:
• Read an object from DB
•Result:
• You sit with your whole database in your lap
65. Think about your data
• When do you need ACID?
• When is Eventually Consistent a better fit?
• Different kinds of data has different needs
Think again
80. Node ring with Consistent Hashing
Find data in log(N) jumps
81. “How can we build a DB on top of Google
File System?”
• Paper: Bigtable:A distributed storage system
for structured data, 2006
• Rich data-model, structured storage
• Clones:
HBase
Hypertable
Neptune
Bigtable
82. “How can we build a distributed
hash table for the data center?”
• Paper: Dynamo:Amazon’s highly available key-
value store, 2007
• Focus: partitioning, replication and availability
• Eventually Consistent
• Clones:
Voldemort
Dynomite
Dynamo
91. memcached
• Very fast
• Simple
• Key-Value (string -‐>
binary)
• Clients for most languages
• Distributed
• Not replicated - so 1/N chance
for local access in cluster
93. Data Grids/Clustering
Parallel data storage
• Data replication
• Data partitioning
• Continuous availability
• Data invalidation
• Fail-over
• C + P in CAP
98. •Everyone can access anything anytime
•Totally indeterministic
•Introduce determinism at well-defined
places...
•...using locks
Shared-State Concurrency
99. •Problems with locks:
• Locks do not compose
• Taking too few locks
• Taking too many locks
• Taking the wrong locks
• Taking locks in the wrong order
• Error recovery is hard
Shared-State Concurrency
100. Please use java.util.concurrent.*
• ConcurrentHashMap
• BlockingQueue
• ConcurrentQueue
• ExecutorService
• ReentrantReadWriteLock
• CountDownLatch
• ParallelArray
• and
much
much
more..
Shared-State Concurrency
102. •Originates in a 1973 paper by Carl
Hewitt
•Implemented in Erlang, Occam, Oz
•Encapsulates state and behavior
•Closer to the definition of OO
than classes
Actors
103. Actors
• Share NOTHING
• Isolated lightweight processes
• Communicates through messages
• Asynchronous and non-blocking
• No shared state
… hence, nothing to synchronize.
• Each actor has a mailbox (message queue)
104. • Easier to reason about
• Raised abstraction level
• Easier to avoid
–Race conditions
–Deadlocks
–Starvation
–Live locks
Actors
109. STM: overview
• See the memory (heap and stack)
as a transactional dataset
• Similar to a database
• begin
• commit
• abort/rollback
•Transactions are retried
automatically upon collision
• Rolls back the memory on abort
115. Event-Driven
Architecture
“Four years from now,‘mere mortals’ will begin to
adopt an event-driven architecture (EDA) for the
sort of complex event processing that has been
attempted only by software gurus [until now]”
--Roy Schulte (Gartner), 2003
117. Domain Events
“It's really become clear to me in the last
couple of years that we need a new building
block and that is the Domain Events”
-- Eric Evans, 2009
118. Domain Events
“Domain Events represent the state of entities
at a given time when an important event
occurred and decouple subsystems with event
streams. Domain Events give us clearer, more
expressive models in those cases.”
-- Eric Evans, 2009
119. Domain Events
“State transitions are an important part of
our problem space and should be modeled
within our domain.”
-- GregYoung, 2008
120. Event Sourcing
• Every state change is materialized in an Event
• All Events are sent to an EventProcessor
• EventProcessor stores all events in an Event Log
• System can be reset and Event Log replayed
• No need for ORM, just persist the Events
• Many different EventListeners can be added to
EventProcessor (or listen directly on the Event log)
122. “A single model cannot be appropriate
for reporting, searching and
transactional behavior.”
-- GregYoung, 2008
Command and Query
Responsibility Segregation
(CQRS) pattern
129. CQRS
in a nutshell
• All state changes are represented by Domain Events
• Aggregate roots receive Commands and publish Events
• Reporting (query database) is updated as a result of the
published Events
•All Queries from Presentation go directly to Reporting
and the Domain is not involved
131. CQRS: Benefits
• Fully encapsulated domain that only exposes
behavior
• Queries do not use the domain model
• No object-relational impedance mismatch
• Bullet-proof auditing and historical tracing
• Easy integration with external systems
• Performance and scalability
146. Compute Grids
Parallel execution
• Divide and conquer
1. Split up job in independent tasks
2. Execute tasks in parallel
3. Aggregate and return result
• MapReduce - Master/Worker
150. • Random allocation
• Round robin allocation
• Weighted allocation
• Dynamic load balancing
• Least connections
• Least server CPU
• etc.
Load balancing
151. Load balancing
• DNS Round Robin (simplest)
• Ask DNS for IP for host
• Get a new IP every time
• Reverse Proxy (better)
• Hardware Load Balancing
154. • UE: Unit of Execution
• Process
• Thread
• Coroutine
• Actor
Parallel Computing
• SPMD Pattern
• Master/Worker Pattern
• Loop Parallelism Pattern
• Fork/Join Pattern
• MapReduce Pattern
155. SPMD Pattern
• Single Program Multiple Data
• Very generic pattern, used in many
other patterns
• Use a single program for all the UEs
• Use the UE’s ID to select different
pathways through the program. F.e:
• Branching on ID
• Use ID in loop index to split loops
• Keep interactions between UEs explicit
157. Master/Worker
• Good scalability
• Automatic load-balancing
• How to detect termination?
• Bag of tasks is empty
• Poison pill
• If we bottleneck on single queue?
• Use multiple work queues
• Work stealing
• What about fault tolerance?
• Use “in-progress” queue
158. Loop Parallelism
•Workflow
1.Find the loops that are bottlenecks
2.Eliminate coupling between loop iterations
3.Parallelize the loop
•If too few iterations to pull its weight
• Merge loops
• Coalesce nested loops
•OpenMP
• omp
parallel
for
159. What if task creation can’t be handled by:
• parallelizing loops (Loop Parallelism)
• putting them on work queues (Master/Worker)
160. What if task creation can’t be handled by:
• parallelizing loops (Loop Parallelism)
• putting them on work queues (Master/Worker)
Enter
Fork/Join
161. •Use when relationship between tasks
is simple
•Good for recursive data processing
•Can use work-stealing
1. Fork:Tasks are dynamically created
2. Join:Tasks are later terminated and
data aggregated
Fork/Join
162. Fork/Join
•Direct task/UE mapping
• 1-1 mapping between Task/UE
• Problem: Dynamic UE creation is expensive
•Indirect task/UE mapping
• Pool the UE
• Control (constrain) the resource allocation
• Automatic load balancing
164. Java 7 ParallelArray (Fork/Join DSL)
ParallelArray
students
=
new
ParallelArray(fjPool,
data);
double
bestGpa
=
students.withFilter(isSenior)
.withMapping(selectGpa)
.max();
Fork/Join
165. • Origin from Google paper 2004
• Used internally @ Google
• Variation of Fork/Join
• Work divided upfront not dynamically
• Usually distributed
• Normally used for massive data crunching
MapReduce
166. • Hadoop (OSS), used @Yahoo
• Amazon Elastic MapReduce
• Many NOSQL DBs utilizes it
for searching/querying
MapReduce
Products
173. Let it crash
• Embrace failure as a natural state in
the life-cycle of the application
• Instead of trying to prevent it;
manage it
• Process supervision
• Supervisor hierarchies (from Erlang)
187. Bulkheads
• Partition and tolerate
failure in one part
• Redundancy
• Applies to threads as well:
• One pool for admin tasks
to be able to perform tasks
even though all threads are
blocked
188. Steady State
• Clean up after you
• Logging:
• RollingFileAppender (log4j)
• logrotate (Unix)
• Scribe - server for aggregating streaming log data
• Always put logs on separate disk
189. Throttling
• Maintain a steady pace
• Count requests
• If limit reached, back-off (drop, raise error)
• Queue requests
• Used in for example Staged Event-Driven
Architecture (SEDA)
195. Server-side consistency
N = the number of nodes that store replicas of
the data
W = the number of replicas that need to
acknowledge the receipt of the update before the
update completes
R = the number of replicas that are contacted
when a data object is accessed through a read operation