This document summarizes a talk on high performance network programming on the JVM. The talk discusses choosing between synchronous and asynchronous I/O, with examples of when each approach is best. It also covers how to optimize synchronous I/O on the JVM to maximize throughput. The document provides benchmarks comparing the performance of a simple synchronous memcache client versus an asynchronous one.
1 of 230
More Related Content
High performance network programming on the jvm oscon 2012
2. About Me
• Director of Architecture and Delivery at Urban Airship
• Most of my career biased towards performance and scale
• Java, Python, C++ in service oriented architectures
3. In this Talk
• WTF is an “Urban Airship”?
• Networked Systems on the JVM
• Choosing a framework
• Critical learnings
• Q&A
5. About This Talk
You probably won’t like this talk if you:
• Are willing to give up orders of magnitude in performance
for a slower runtime or language
6. About This Talk
You probably won’t like this talk if you:
• Are willing to give up orders of magnitude in performance
for a slower runtime or language
• Enjoy spending money on virtualized servers (e.g. ec2)
7. About This Talk
You probably won’t like this talk if you:
• Are willing to give up orders of magnitude in performance
for a slower runtime or language
• Enjoy spending money on virtualized servers (e.g. ec2)
• Think that a startup should’t worry about CoGS
8. About This Talk
You probably won’t like this talk if you:
• Are willing to give up orders of magnitude in performance
for a slower runtime or language
• Enjoy spending money on virtualized servers (e.g. ec2)
• Think that a startup should’t worry about CoGS
• Think that writing code is the hardest part of a developer’s
job
9. About This Talk
You probably won’t like this talk if you:
• Are willing to give up orders of magnitude in performance
for a slower runtime or language
• Enjoy spending money on virtualized servers (e.g. ec2)
• Think that a startup should’t worry about CoGS
• Think that writing code is the hardest part of a developer’s
job
• Think async for all the things
11. Lexicon
What makes something “High Performance”?
• Low Latency - I’m doing an operation that includes a
request/reply
• Throughput - how many operations can I drive through my
architecture at one time?
• Productivity - how quickly can I create a new operation? A
new service?
• Sustainability - when a service breaks, what’s the time to
RCA
• Fault tolerance
12. WTF is an Urban Airship?
• Fundamentally, an engagement platform
• Buzzword compliant - Cloud Service providing an API for
Mobile
• Unified API for services across platforms for messaging,
location, content entitlements, in-app purchase
• SLAs for throughput, latency
• Heavy users and contributors to HBase, ZooKeeper,
Cassandra
14. What is Push?
• Cost
• Throughput and immediacy
• The platform makes it compelling
• Push can be intelligent
• Push can be precisely targeted
• Deeper measurement of user engagement
15. How does this relate to the JVM?
• We deal with lots of heterogeneous connections from the
public network, the vast majority of them are handled by a
JVM
• We perform millions of operations per second across our
LAN
• Billions and billions of discrete system events a day
• Most of those operations are JVM-JVM
18. Distributed Systems on the JDK
• Platform has several tools baked in
• HTTP Client and Server
• RMI (Remote Method Invocation) or better JINI
• CORBA/IIOP
• JDBC
• Lower level
• Sockets + streams, channels + buffers
• Java5 brought NIO which included Async I/O
• High performance, high productivity platform when used correctly
• Missing some low-level capabilities
20. Synchronous vs. Async I/O
• Synchronous Network I/O on the JRE
• Sockets (InputStream, OutputStream)
• Channels and Buffers
• Asynchronous Network I/O on the JRE
• Selectors (async)
• Buffers fed to Channels which are asynchronous
• Almost all asynchronous APIs are for Socket I/O
• Can operate on direct, off heap buffers
• Offer decent low-level configuration options
21. Synchronous vs. Async I/O
• Synchronous I/O has many upsides on the JVM
• Clean streaming - good for moving around really large
things
• Sendfile support for MMap’d files
(FileChannel::transferTo)
• Vectored I/O support
• No need for additional SSL abstractions (except for
maybe Keystore cruft)
• No idiomatic impedance for RPC
24. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
25. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
• Minimize trips across the system boundary
26. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
• Minimize trips across the system boundary
• Minimize copies of data
27. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
• Minimize trips across the system boundary
• Minimize copies of data
• Vector I/O if possible
28. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
• Minimize trips across the system boundary
• Minimize copies of data
• Vector I/O if possible
• MMap if possible
29. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
• Minimize trips across the system boundary
• Minimize copies of data
• Vector I/O if possible
• MMap if possible
• Favor direct ByteBufffers and NIO Channels
30. Synchronous vs. Async I/O
• Synchronous I/O - doing it well
• Buffers all the way down (streams, readers, channels)
• Minimize trips across the system boundary
• Minimize copies of data
• Vector I/O if possible
• MMap if possible
• Favor direct ByteBufffers and NIO Channels
• Netty does support sync. I/O but it feels tedious on that
abstraction
32. Synchronous vs. Async I/O
• Async I/O
• On Linux, implemented via epoll as the “Selector”
abstraction with async Channels
• Async Channels fed buffers, you have to tend to fully
reading/writing them
• Async I/O - doing it well
• Again, favor direct ByteBuffers, especially for large data
• Consider the application - what do you gain by not
waiting for a response?
• Avoid manual TLS operations
34. Sync vs. Async - FIGHT!
Async I/O Wins:
• Large numbers of clients
35. Sync vs. Async - FIGHT!
Async I/O Wins:
• Large numbers of clients
• Only way to be notified if a socket is
closed without trying to read it
36. Sync vs. Async - FIGHT!
Async I/O Wins:
• Large numbers of clients
• Only way to be notified if a socket is
closed without trying to read it
• Large number of open sockets
37. Sync vs. Async - FIGHT!
Async I/O Wins:
• Large numbers of clients
• Only way to be notified if a socket is
closed without trying to read it
• Large number of open sockets
• Lightweight proxying of traffic
39. Sync vs. Async - FIGHT!
Async I/O Loses:
• Context switching, CPU cache
pipeline loss can be substantial
overhead for simple protocols
40. Sync vs. Async - FIGHT!
Async I/O Loses:
• Context switching, CPU cache
pipeline loss can be substantial
overhead for simple protocols
• Not always the best option for raw,
full bore throughput
41. Sync vs. Async - FIGHT!
Async I/O Loses:
• Context switching, CPU cache
pipeline loss can be substantial
overhead for simple protocols
• Not always the best option for raw,
full bore throughput
• Complexity, ability to reason about
code diminished
42. Sync vs. Async - FIGHT!
Async I/O Loses:
http://www.youtube.com/watch?v=bzkRVzciAZg&feature=player_detailpage#t=133s
45. Sync vs. Async - FIGHT!
Sync I/O Wins:
• Simplicity, readability
• Better fit for dumb protocols, less
impedance for request/reply
46. Sync vs. Async - FIGHT!
Sync I/O Wins:
• Simplicity, readability
• Better fit for dumb protocols, less
impedance for request/reply
• Squeezing every bit of throughput
out of a single host, small number of
threads
47. Sync vs. Async - Memcache
• UA uses memcached heavily
• memcached is an awesome example of why choosing
Sync vs. Async is hard
• Puts always should be completely asynchronous
• Reads are fairly useless when done asynchronously
• Protocol doesn’t lend itself well to Async I/O
• For Java clients, we experimented with Xmemcached but
didn’t like its complexity, I/O approach
• Created FSMC (freakin’ simple memcache client)
48. FSMC vs. Xmemcached
Synch vs. Async Memcache Client Throughput
60000
SET/GET per Second
45000
30000
15000
0
1 2 4 8 16 32 56 128
Threads
FSMC (no nagle) FSMC Xmemcached
51. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
52. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
53. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
• Allocation is near free
54. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
• Allocation is near free
• Collection also near free if you don’t copy anything
55. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
• Allocation is near free
• Collection also near free if you don’t copy anything
• Don’t buffer large things, stream or chunck
56. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
• Allocation is near free
• Collection also near free if you don’t copy anything
• Don’t buffer large things, stream or chunck
• When you must cache:
57. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
• Allocation is near free
• Collection also near free if you don’t copy anything
• Don’t buffer large things, stream or chunck
• When you must cache:
• Cache early and don’t touch
58. A Word on Garbage Collection
• Any JVM service on most hardware has to live with GC
• A good citizen will create lots of ParNew garbage and
nothing more
• Allocation is near free
• Collection also near free if you don’t copy anything
• Don’t buffer large things, stream or chunck
• When you must cache:
• Cache early and don’t touch
• Better, cache off heap or use memcache
62. A Word on Garbage Collection
When you care about throughput, the virtualization tax is high
ParNew GC Effectiveness
300
225
150
75
0
MB Collected
Bare Metal EC2 XL
63. About EC2...
When you care about throughput, the virtualization tax is high
Mean Time ParNew GC
0.04
0.03
0.02
0.01
0
Collection Time (sec)
Bare Metal EC2 XL
64. How we do at UA
• Originally our codebase was mostly one giant monolithic
application, over time several databases
• Difficult to scale, technically and operationally
• Wanted to break off large pieces of functionality into coarse
grained services encapsulating their capability and function
• Most message exchange was done using beanstalkd after
migrating off RabbitMQ
• Fundamentally, our business is message passing
67. Choosing A Framework
• All frameworks are a form of concession
• Nobody would use Spring if people called it “Concessions
to the horrors of EJB”
68. Choosing A Framework
• All frameworks are a form of concession
• Nobody would use Spring if people called it “Concessions
to the horrors of EJB”
• Understand concessions when choosing, look for:
69. Choosing A Framework
• All frameworks are a form of concession
• Nobody would use Spring if people called it “Concessions
to the horrors of EJB”
• Understand concessions when choosing, look for:
• Configuration options - how do I configure Nagle
behavior?
70. Choosing A Framework
• All frameworks are a form of concession
• Nobody would use Spring if people called it “Concessions
to the horrors of EJB”
• Understand concessions when choosing, look for:
• Configuration options - how do I configure Nagle
behavior?
• Metrics - what does the framework tell me about its
internals?
71. Choosing A Framework
• All frameworks are a form of concession
• Nobody would use Spring if people called it “Concessions
to the horrors of EJB”
• Understand concessions when choosing, look for:
• Configuration options - how do I configure Nagle
behavior?
• Metrics - what does the framework tell me about its
internals?
• Intelligent logging - next level down from metrics
72. Choosing A Framework
• All frameworks are a form of concession
• Nobody would use Spring if people called it “Concessions
to the horrors of EJB”
• Understand concessions when choosing, look for:
• Configuration options - how do I configure Nagle
behavior?
• Metrics - what does the framework tell me about its
internals?
• Intelligent logging - next level down from metrics
• How does the framework play with peers?
75. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
76. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
• Simple protocol - easy to reason about, inspect
77. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
• Simple protocol - easy to reason about, inspect
• Efficient, flexible message format - Google Protocol
Buffers
78. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
• Simple protocol - easy to reason about, inspect
• Efficient, flexible message format - Google Protocol
Buffers
• Compostable - easily create new services
79. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
• Simple protocol - easy to reason about, inspect
• Efficient, flexible message format - Google Protocol
Buffers
• Compostable - easily create new services
• Support both sync and async operations
80. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
• Simple protocol - easy to reason about, inspect
• Efficient, flexible message format - Google Protocol
Buffers
• Compostable - easily create new services
• Support both sync and async operations
• Support for multiple languages (Python, Java, C++)
81. Frameworks - DO IT LIVE!
• Our requirements:
• Capable of > 100K requests per second in aggregate
across multiple threads
• Simple protocol - easy to reason about, inspect
• Efficient, flexible message format - Google Protocol
Buffers
• Compostable - easily create new services
• Support both sync and async operations
• Support for multiple languages (Python, Java, C++)
• Simple configuration
85. Frameworks - DO IT LIVE!
• Desirable:
• Discovery mechanism
• Predictable fault handling
86. Frameworks - DO IT LIVE!
• Desirable:
• Discovery mechanism
• Predictable fault handling
• Adaptive load balancing
87. Frameworks - Akka
• Predominantly Scala platform for sending messages,
distributed incarnation of the Actor pattern
• Message abstraction tolerates distribution well
• If you like OTP, you’ll probably like Akka
90. Frameworks - Akka
• Cons:
• We don’t like reading other people’s Scala
• Some pretty strong assertions in the docs that aren’t
substantiated
• Bulky wire protocol, especially for primitives
• Configuration felt complicated
• Sheer surface area of the framework is daunting
• Unclear integration story with Python
91. Frameworks - Aleph
• Clojure framework based on Netty, Lamina
• Conceptually funs are applied to a channels to move
around messages
• Channels are refs that you realize when you want data
• Operations with channels very easy
• Concise format for standing up clients and services using
text protocols
94. Frameworks - Aleph
• Cons:
• Very high level abstraction, knobs are buried if they exist
• Channel concept leaky for large messages
• Documentation, tests
95. Frameworks - Netty
• The preeminent framework for doing Async Network I/O
on the JVM
• Netty Channels backed by pipelines on top of Channels
• Pros:
• Abstraction doesn’t hide the important pieces
• The only sane way to do TLS with Async I/O on the JVM
• Protocols well abstracted into pipeline steps
• Clean callback model for events of interest but optional in
simple cases - no death by callback
• Many implementations of interesting protocols
96. Frameworks - Netty
• Cons:
• Easy to make too many copies of the data
• Some old school bootstrap idioms
• Writes can occasionally be reordered
• Failure conditions can be numerous, difficult to reason
about
• Simple things can feel difficult - UDP, simple request/reply
107. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
108. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
• Discovery (optional) using a defined tree of services in
ZooKeeper
109. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
• Discovery (optional) using a defined tree of services in
ZooKeeper
• Service instances periodically publish load factor to
ZooKeeper for clients to inform routing decisions
110. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
• Discovery (optional) using a defined tree of services in
ZooKeeper
• Service instances periodically publish load factor to
ZooKeeper for clients to inform routing decisions
• Rich metrics using Yammer Metrics
111. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
• Discovery (optional) using a defined tree of services in
ZooKeeper
• Service instances periodically publish load factor to
ZooKeeper for clients to inform routing decisions
• Rich metrics using Yammer Metrics
• Core service traits are part of the framework
112. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
• Discovery (optional) using a defined tree of services in
ZooKeeper
• Service instances periodically publish load factor to
ZooKeeper for clients to inform routing decisions
• Rich metrics using Yammer Metrics
• Core service traits are part of the framework
• Service instances quiesce gracefully
113. Frameworks - DO IT LIVE!
• Ultimately implemented our own using combination of
Netty and Google Protocol Buffers called Reactor
• Discovery (optional) using a defined tree of services in
ZooKeeper
• Service instances periodically publish load factor to
ZooKeeper for clients to inform routing decisions
• Rich metrics using Yammer Metrics
• Core service traits are part of the framework
• Service instances quiesce gracefully
• Netty made UDP, Sync, Async. easy
114. Frameworks - DO IT LIVE!
• All operations are Callables, services define a mapping b/t
a request type and a Callable
• Client API always returns a Future, sometimes it’s always
materialized
• Precise tuning from config files
118. What We Learned - In General
• Straight through RPC was fairly easy, edge cases were
hard
119. What We Learned - In General
• Straight through RPC was fairly easy, edge cases were
hard
• ZooKeeper is brutal to program with, recover from errors
120. What We Learned - In General
• Straight through RPC was fairly easy, edge cases were
hard
• ZooKeeper is brutal to program with, recover from errors
• Discovery is also difficult - clients need to defend
themselves, consider partitions
121. What We Learned - In General
• Straight through RPC was fairly easy, edge cases were
hard
• ZooKeeper is brutal to program with, recover from errors
• Discovery is also difficult - clients need to defend
themselves, consider partitions
• RPC is great for latency, but upstream pushback is
important
122. What We Learned - In General
• Straight through RPC was fairly easy, edge cases were
hard
• ZooKeeper is brutal to program with, recover from errors
• Discovery is also difficult - clients need to defend
themselves, consider partitions
• RPC is great for latency, but upstream pushback is
important
• Save RPC for latency sensitive operations - use Kafka
123. What We Learned - In General
• Straight through RPC was fairly easy, edge cases were
hard
• ZooKeeper is brutal to program with, recover from errors
• Discovery is also difficult - clients need to defend
themselves, consider partitions
• RPC is great for latency, but upstream pushback is
important
• Save RPC for latency sensitive operations - use Kafka
• RPC less than ideal for fan-out
126. What We Learned - TCP
• RTO (retransmission timeout) and Karn and Jacobson’s
Algorithms
127. What We Learned - TCP
• RTO (retransmission timeout) and Karn and Jacobson’s
Algorithms
• Linux defaults to 15 retry attempts, 3 seconds between
128. What We Learned - TCP
• RTO (retransmission timeout) and Karn and Jacobson’s
Algorithms
• Linux defaults to 15 retry attempts, 3 seconds between
• With no ACKs, congestion control kicks in and widens
that 3 second window exponentially, thinking its
congested
129. What We Learned - TCP
• RTO (retransmission timeout) and Karn and Jacobson’s
Algorithms
• Linux defaults to 15 retry attempts, 3 seconds between
• With no ACKs, congestion control kicks in and widens
that 3 second window exponentially, thinking its
congested
• Connection timeout can take up to 30 minutes
130. What We Learned - TCP
• RTO (retransmission timeout) and Karn and Jacobson’s
Algorithms
• Linux defaults to 15 retry attempts, 3 seconds between
• With no ACKs, congestion control kicks in and widens
that 3 second window exponentially, thinking its
congested
• Connection timeout can take up to 30 minutes
• Devices, Carriers and EC2 at scale eat FIN/RST
131. What We Learned - TCP
• RTO (retransmission timeout) and Karn and Jacobson’s
Algorithms
• Linux defaults to 15 retry attempts, 3 seconds between
• With no ACKs, congestion control kicks in and widens
that 3 second window exponentially, thinking its
congested
• Connection timeout can take up to 30 minutes
• Devices, Carriers and EC2 at scale eat FIN/RST
• Our systems think a device is still online at the time of a
push
137. What We Learned - TCP
• Efficiency means understanding your traffic
138. What We Learned - TCP
• Efficiency means understanding your traffic
• Size send/recv buffers appropriately (defaults way too low
for edge tier services)
139. What We Learned - TCP
• Efficiency means understanding your traffic
• Size send/recv buffers appropriately (defaults way too low
for edge tier services)
• Nagle! Non-duplex protocols can benefit significantly
140. What We Learned - TCP
• Efficiency means understanding your traffic
• Size send/recv buffers appropriately (defaults way too low
for edge tier services)
• Nagle! Non-duplex protocols can benefit significantly
• Example: 19K message deliveries per second vs. 2K
141. What We Learned - TCP
• Efficiency means understanding your traffic
• Size send/recv buffers appropriately (defaults way too low
for edge tier services)
• Nagle! Non-duplex protocols can benefit significantly
• Example: 19K message deliveries per second vs. 2K
• Example: our protocol has a size frame, w/o Nagle that
went in its own packet
151. What We Learned - TCP
• Don’t Nagle!
• Again, understand what your traffic is doing
152. What We Learned - TCP
• Don’t Nagle!
• Again, understand what your traffic is doing
• Buffer and make one syscall instead of multiple
153. What We Learned - TCP
• Don’t Nagle!
• Again, understand what your traffic is doing
• Buffer and make one syscall instead of multiple
• High-throughput RPC mechanisms disable it explicitly
154. What We Learned - TCP
• Don’t Nagle!
• Again, understand what your traffic is doing
• Buffer and make one syscall instead of multiple
• High-throughput RPC mechanisms disable it explicitly
• See also:
155. What We Learned - TCP
• Don’t Nagle!
• Again, understand what your traffic is doing
• Buffer and make one syscall instead of multiple
• High-throughput RPC mechanisms disable it explicitly
• See also:
• http://www.evanjones.ca/software/java-
bytebuffers.html
156. What We Learned - TCP
• Don’t Nagle!
• Again, understand what your traffic is doing
• Buffer and make one syscall instead of multiple
• High-throughput RPC mechanisms disable it explicitly
• See also:
• http://www.evanjones.ca/software/java-
bytebuffers.html
• http://blog.boundary.com/2012/05/02/know-a-delay-
nagles-algorithm-and-you/
160. About UDP...
• Generally to be avoided
• Great for small unimportant data like memcache operations
at extreme scale
161. About UDP...
• Generally to be avoided
• Great for small unimportant data like memcache operations
at extreme scale
• Bad for RPC when you care about knowing if your request
was handled
162. About UDP...
• Generally to be avoided
• Great for small unimportant data like memcache operations
at extreme scale
• Bad for RPC when you care about knowing if your request
was handled
• Conditions where you most want your data are also the
most likely to cause your data to be dropped
164. About TLS
• Try to avoid it - complex, slow and expensive, especially for
internal services
165. About TLS
• Try to avoid it - complex, slow and expensive, especially for
internal services
• ~6.5K and 4 hops to secure the channel
166. About TLS
• Try to avoid it - complex, slow and expensive, especially for
internal services
• ~6.5K and 4 hops to secure the channel
• 40 bytes overhead per frame
167. About TLS
• Try to avoid it - complex, slow and expensive, especially for
internal services
• ~6.5K and 4 hops to secure the channel
• 40 bytes overhead per frame
• 38.1MB overhead for every keep-alive sent to 1M devices
168. About TLS
• Try to avoid it - complex, slow and expensive, especially for
internal services
• ~6.5K and 4 hops to secure the channel
• 40 bytes overhead per frame
• 38.1MB overhead for every keep-alive sent to 1M devices
TLS source: http://netsekure.org/2010/03/tls-overhead/
170. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
171. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
172. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
• TLS Handshake
173. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
• TLS Handshake
• HTTP Request
174. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
• TLS Handshake
• HTTP Request
• HTTP Response
175. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
• TLS Handshake
• HTTP Request
• HTTP Response
• TLS End
176. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
• TLS Handshake
• HTTP Request
• HTTP Response
• TLS End
• Server TIME_WAIT
177. We Learned About HTTPS
• Thought we could ignore - basic plumbing of the internet
• 100s of millions of devices, performing 100s of millions of
tiny request/reply cycles:
• TLS Handshake
• HTTP Request
• HTTP Response
• TLS End
• Server TIME_WAIT
• Higher grade crypto eats more cycles
180. We Learned About HTTPS
• Corrective measures:
• Reduce TIME_WAIT - 60 seconds too long for an HTTPS
connection
181. We Learned About HTTPS
• Corrective measures:
• Reduce TIME_WAIT - 60 seconds too long for an HTTPS
connection
• Reduce non-critical HTTPS operations to lower cyphers
182. We Learned About HTTPS
• Corrective measures:
• Reduce TIME_WAIT - 60 seconds too long for an HTTPS
connection
• Reduce non-critical HTTPS operations to lower cyphers
• Offload TLS handshake to EC2
183. We Learned About HTTPS
• Corrective measures:
• Reduce TIME_WAIT - 60 seconds too long for an HTTPS
connection
• Reduce non-critical HTTPS operations to lower cyphers
• Offload TLS handshake to EC2
• Deployed Akamai for SSL/TCP offload and to pipeline
device requests into our infrastructure
184. We Learned About HTTPS
• Corrective measures:
• Reduce TIME_WAIT - 60 seconds too long for an HTTPS
connection
• Reduce non-critical HTTPS operations to lower cyphers
• Offload TLS handshake to EC2
• Deployed Akamai for SSL/TCP offload and to pipeline
device requests into our infrastructure
• Implement adaptive backoff at the client layer
185. We Learned About HTTPS
• Corrective measures:
• Reduce TIME_WAIT - 60 seconds too long for an HTTPS
connection
• Reduce non-critical HTTPS operations to lower cyphers
• Offload TLS handshake to EC2
• Deployed Akamai for SSL/TCP offload and to pipeline
device requests into our infrastructure
• Implement adaptive backoff at the client layer
• Aggressive batching
187. We Learned About Carriers
• Data plans are like gym memberships
188. We Learned About Carriers
• Data plans are like gym memberships
• Aggressively cull idle stream connections
189. We Learned About Carriers
• Data plans are like gym memberships
• Aggressively cull idle stream connections
• Don’t like TCP keepalives
190. We Learned About Carriers
• Data plans are like gym memberships
• Aggressively cull idle stream connections
• Don’t like TCP keepalives
• Don’t like UDP
191. We Learned About Carriers
• Data plans are like gym memberships
• Aggressively cull idle stream connections
• Don’t like TCP keepalives
• Don’t like UDP
• Like to batch, delay or just drop FIN/FIN ACK/RST
192. We Learned About Carriers
• Data plans are like gym memberships
• Aggressively cull idle stream connections
• Don’t like TCP keepalives
• Don’t like UDP
• Like to batch, delay or just drop FIN/FIN ACK/RST
• Move data through aggregators
194. About Devices...
• Small compute units that do exactly what you tell them to
195. About Devices...
• Small compute units that do exactly what you tell them to
• Like phone home when you push to them...
196. About Devices...
• Small compute units that do exactly what you tell them to
• Like phone home when you push to them...
• 10M at a time...
197. About Devices...
• Small compute units that do exactly what you tell them to
• Like phone home when you push to them...
• 10M at a time...
• Causing...
198. About Devices...
• Small compute units that do exactly what you tell them to
• Like phone home when you push to them...
• 10M at a time...
• Causing...
205. About Devices...
• By virtue of being a mobile device, they move around a lot
206. About Devices...
• By virtue of being a mobile device, they move around a lot
• When they move, they often change IP addresses
207. About Devices...
• By virtue of being a mobile device, they move around a lot
• When they move, they often change IP addresses
• New cell tower
208. About Devices...
• By virtue of being a mobile device, they move around a lot
• When they move, they often change IP addresses
• New cell tower
• Change connectivity - 4G -> 3G, 3G -> WiFi, etc.
209. About Devices...
• By virtue of being a mobile device, they move around a lot
• When they move, they often change IP addresses
• New cell tower
• Change connectivity - 4G -> 3G, 3G -> WiFi, etc.
• When they change IP addresses, they need to reconnect
TCP sockets
210. About Devices...
• By virtue of being a mobile device, they move around a lot
• When they move, they often change IP addresses
• New cell tower
• Change connectivity - 4G -> 3G, 3G -> WiFi, etc.
• When they change IP addresses, they need to reconnect
TCP sockets
• Sometimes they are kind enough to let us know
211. About Devices...
• By virtue of being a mobile device, they move around a lot
• When they move, they often change IP addresses
• New cell tower
• Change connectivity - 4G -> 3G, 3G -> WiFi, etc.
• When they change IP addresses, they need to reconnect
TCP sockets
• Sometimes they are kind enough to let us know
• Those reconnections are expensive for us and the devices
214. We Learned About EC2
• EC2 is a great jumping-off point
• Scaling vertically is very expensive
215. We Learned About EC2
• EC2 is a great jumping-off point
• Scaling vertically is very expensive
• Like Carriers, EC2 networking is fond of holding on to TCP
teardown sequence packets
216. We Learned About EC2
• EC2 is a great jumping-off point
• Scaling vertically is very expensive
• Like Carriers, EC2 networking is fond of holding on to TCP
teardown sequence packets
• vNICs obfuscate important data when you care about 1M
connections
217. We Learned About EC2
• EC2 is a great jumping-off point
• Scaling vertically is very expensive
• Like Carriers, EC2 networking is fond of holding on to TCP
teardown sequence packets
• vNICs obfuscate important data when you care about 1M
connections
• Great for surge capacity
218. We Learned About EC2
• EC2 is a great jumping-off point
• Scaling vertically is very expensive
• Like Carriers, EC2 networking is fond of holding on to TCP
teardown sequence packets
• vNICs obfuscate important data when you care about 1M
connections
• Great for surge capacity
• Don’t split services into the virtual domain
219. About EC2...
• When you care about throughput, the virtualization tax is
high
222. About EC2...
• Limited applicability for testing
• Egress port limitations kick in at ~63K egress
connections - 16 XLs to test 1M connections
223. About EC2...
• Limited applicability for testing
• Egress port limitations kick in at ~63K egress
connections - 16 XLs to test 1M connections
• Can’t create vNIC in an EC2 guest
224. About EC2...
• Limited applicability for testing
• Egress port limitations kick in at ~63K egress
connections - 16 XLs to test 1M connections
• Can’t create vNIC in an EC2 guest
• Killing a client doesn’t disconnect immediately
225. About EC2...
• Limited applicability for testing
• Egress port limitations kick in at ~63K egress
connections - 16 XLs to test 1M connections
• Can’t create vNIC in an EC2 guest
• Killing a client doesn’t disconnect immediately
• Pragmatically, smalls have no use for our purposes, not
enough RAM, %steal too high
226. Lessons Learned - Failing Well
• Scale vertically and horizontally
• Scale vertically but remember...
• We can reliably take one Java process up to 990K open
connections
• What happens when that one process fails?
• What happens when you need to do maintenance?
227. Thanks!
• Urban Airship http://urbanairship.com/
• Me @eonnen on Twitter or [email protected]
• We’re hiring! http://urbanairship.com/company/jobs/