SlideShare a Scribd company logo
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Developing the fastest
HTTP/2 server
DeNA Co., Ltd.
Kazuho Oku
1
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Who am I?
n  Kazuho Oku
n  Major works:
⁃  Palmscape / Xiino (web browser for Palm OS)
•  awarded M.I.T. TR 100/2002
⁃  Mitoh project 2004 super creator
⁃  Q4M (message queue plugin for MySQL)
•  MySQL Conference Community Awards 2011
⁃  H2O (HTTP/2 server)
•  Japan OSS Contribution Award 2015
2	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Background
3	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Responsiveness is important
4	Developing the fastest HTTP/2 server
source:	h@p://radar.oreilly.com/2009/06/bing-and-google-agree-slow-pag.html
n  500ms increase → -1.2% revenue
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Increasing size and # of requests
5	Developing the fastest HTTP/2 server
source:	h@p://h@parchive.org/trends.php?s=All&minlabel=Aug+1+2011&maxlabel=Aug+1+2015#bytesTotal&reqTotal
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Bandwidth is also increasing
n  end-usersʼ B/W increase 50% every year (Nielsenʼs
Law)
6	Developing the fastest HTTP/2 server
source:	h@p://www.nngroup.com/arRcles/law-of-bandwidth/
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
More bandwidth doesnʼt matter
7	Developing the fastest HTTP/2 server
source:	More	Bandwidth	Doesn't	Ma@er	-	2011	Mike	Belshe	(Google)
* effective B/W reaches ceiling at around 1.6Mbps
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Latency is the new bottleneck
8	Developing the fastest HTTP/2 server
source:	More	Bandwidth	Doesn't	Ma@er	-	2011	Mike	Belshe	(Google)
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Latency cannot be optimized
n  latency = speed of light
⁃  round-trip bet. Japan and US: 80ms
n  mobile carriers have huge latency
⁃  LTE ~ 50ms
n  the Web is becoming more and more complex
9	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Web is becoming slower ... unless we do something.
10	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Solution: new protocol
11	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
HTTP/2!
12	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
The reasons HTTP/1.1 is slow
n  concurrency is too small
⁃  multiple round-trips required when issuing many
requests
n  no prioritization between. requests
⁃  can suspend HTML / image streams in favor of
CSS / JS
n  big request / response headers
⁃  typically hundreds of octets
⁃  becomes an overhead when issuing many reqs.
13	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
HTTP/2
n  RFC 7540 (2015/5)
⁃  based on SPDY by Google
n  key features:
⁃  binary protocol
⁃  header compression
⁃  multiplexing
⁃  prioritization
14	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Benchmark
n  red bar: time spent until first-paint
n  big difference bet. server implementations
n  reason: quality of prioritization logic
n  H2O shows the true potential of HTTP/2
15	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Have we reached the limit?
16	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Letʼs consider what would be the ideal HTTP flow.
17	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
TCP slow start
n  Initial Congestion Window (IW)=10
⁃  only 10 packets can be sent in first RTT
⁃  used to be IW=3
n  window increase: 1.5x/RTT
18	Developing the fastest HTTP/2 server
0	
100,000	
200,000	
300,000	
400,000	
500,000	
600,000	
700,000	
800,000	
1	 2	 3	 4	 5	 6	 7	 8	
bytes	transmi,ed
RTT
TCP	slow	start	(IW10,	MSS1460)
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Flow of the ideal HTTP
n  fastest within the limits of TCP/IP
n  receive a request 0-RTT, and:
⁃  first send CSS/JS*
⁃  then send the HTML
⁃  then send the images*
*: but only the ones not cached by the browser
19	Developing the fastest HTTP/2 server
client server
1	RTT
request
response
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
The reality in HTTP/2
n  TCP establishment: +1 RTT
n  TLS handshake: +2 RTT*
n  HTML fetch: +1 RTT
n  JS,CSS fetch: +2 RTT**
n  Total: 6 RTT
*: 1 RTT on reconnection
**: servers often cannot switch to sending JS,CSS
instantly, due to the output buffered in TCP send buffer
20	Developing the fastest HTTP/2 server
client server
1	RTT
TCP	SYN
TCP	SYNACK
TLS	Handshake
TLS	Handshake
TLS	Handshake
TLS	Handshake
GET	/
HTML
GET	css,js
css,	js
〜〜
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Ongoing optimizations
n  TCP Fast Open
⁃  connection establishment in 0 RTT
n  TLS 1.3
⁃  initial handshake complete in 1 RTT
⁃  resumption in 0 RTT
n  what can be done in the HTTP/2 layer?
21	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Further optimizations in HTTP/2 layer
n  optimize TCP for responsiveness
n  Cache-aware server push
22	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Optimizing TCP for responsiveness
23	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Typical sequence of HTTP/2
24	Developing the fastest HTTP/2 server
HTTP/2 200 OK
<!DOCTYPE HTML>
…
<SCRIPT SRC=”jquery.js”>
…
client server
GET /
GET /jquery.js
need	to	switch	sending	from	HTML	
to	JS	at	this	very	moment	
(means	that	amount	of	data	sent	in	
*	must	be	smaller	than	IW)
1	RTT
*
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Buffering in TCP and TLS layer
25	Developing the fastest HTTP/2 server
TCP	send	buffer
CWND	
unacked	 poll	threshold	
BIO	buf.
// ordinary code (non-blocking)
while (SSL_write(…) != SSL_ERR_WANT_WRITE)
;
TLS	Records
sent	immediately	 not	immediately	sent	
HTTP/2	frames
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Why do we have buffers?
26	Developing the fastest HTTP/2 server
n  TCP send buffer:
⁃  reduce ping-pong bet. kernel and application
n  BIO buffer:
⁃  for data that couldnʼt be stored in TCP send buffer
TCP	send	buffer
CWND	
unacked	 poll	threshold	
BIO	buf.
TLS	Records
sent	immediately	 not	immediately	sent	
HTTP/2	frames
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Improvement: poll-then-write
27	Developing the fastest HTTP/2 server
TCP	send	buffer
CWND	
unacked	 poll	threshold	
// only call SSL_write when polls notifies the app.
while (poll_for_write(fd) == SOCKET_IS_READY)
SSL_write(…);
TLS	Records
sent	immediately	 not	immediately	sent	
HTTP/2	frames
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Adjust poll threshold
28	Developing the fastest HTTP/2 server
TCP	send	buffer
CWND	
unacked	 poll	threshold	
n  set poll threshold to the end of CWND?
⁃  setsockopt(TCP_NOTSENT_LOWAT)
⁃  in linux, the minimum is CWND + 1 octet
•  becomes unstable when set to CWND + 0
TLS	Records
sent	immediately	 not	immediately	sent	
HTTP/2	frames
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Adjust poll threshold
29	Developing the fastest HTTP/2 server
CWND	
unacked	 poll	threshold	
// only call SSL_write when polls notifies the app.
while (poll_for_write(fd) == SOCKET_IS_READY)
SSL_write(…);
TLS	Records
sent	immediately	 not	immediately	sent	
HTTP/2	frames
TCP	send	buffer
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Further improvement: read TCP states
30	Developing the fastest HTTP/2 server
CWND	
unacked	 poll	threshold	
// calc size of data to send by calling getsockopt(TCP_INFO)
if (poll_for_write(fd) == SOCKET_IS_READY) {
capacity = CWND + unacked + ONE_MSS - TLS_overhead;
SSL_write(prepare_http2_frames(capacity));
}
TLS	Records
sent	immediately	 not	immediately	sent	
HTTP/2	frames
TCP	send	buffer
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Issues in the proposed approach
n  increased delay bet. ACK recv. → data send
⁃  leads to slower peak speed
⁃  reason:
•  traditional approach: completes within kernel
•  this approach: application needs to be notified to
generate new data
n  solution:
⁃  use the approach only when necessary
•  i.e. when RTT is big and CWND is small
•  increased delay can be ignored if: delay << RTT
31	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Code for calculating size of data to send
size_t get_suggested_write_size() {
getsockopt(fd, IPPROTO_TCP, TCP_INFO, &tcp_info, sizeof(tcp_info));
if (tcp_info.tcpi_rtt < min_rtt || tcp_info.tcpi_snd_cwnd > max_cwnd)
return UNKNOWN;
switch (SSL_get_current_cipher(ssl)->id) {
case TLS1_CK_RSA_WITH_AES_128_GCM_SHA256:
case …:
tls_overhead = 5 + 8 + 16;
break;
default:
return UNKNOWN;
}
packets_sendable = tcp_info.tcpi_snd_cwnd > tcp_info.tcpi_unacked ?
tcp_info.tcpi_snd_cwnd - tcp_info.tcpi_unacked : 0;
return (packets_sendable + 1) * (tcp_info.tcpi_snd_mss - tls_overhead);
}
32	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Benchmark
33	Developing the fastest HTTP/2 server
n  conditions:
⁃  server in Ireland, client in Japan (RTT 250ms)
⁃  load tiny js at the top of a large HTML
n  result: delay decreased from 511ms to 250ms
⁃  i.e. JS fetch latency was 2RTT, became 1 RTT
•  similar results in other environments
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Conclusion
n  near-optimal result can be achieved
⁃  by adjusting poll threshold and reading TCP
states
⁃  1-packet overhead due to restriction in Linux
kernel
n  1-RTT improvement in H2O
⁃  estimated 1-RTT improvement per the depth of
the load graph
34	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Same problem exists with load balancers
n  L4 L/B or TLS terminator also act as buffers
⁃  impact bigger than that of TCP send buffer of
httpd
n  solution:
⁃  best: donʼt use L/B
⁃  next to best: implement mitigations in L/B
⁃  long-term: TCP migration + L3 NAT or DSR
•  i.e. accept in L/B, then transfer the connection to
HTTP/2 server
35	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Cache-aware Server Push
36	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
What is server-push?
n  start the delivery of CSS / JS when receiving a
request for HTML
n  effect:
⁃  1 RTT reduction, or more
37	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Use-case: conceal request process time
n  ex. RTT=50ms, process time=200ms
38	Developing the fastest HTTP/2 server
req.
process	request
push-asset
HTML
push-asset
push-asset
push-asset
req.
process	request
asset
HTML
asset
asset
asset
req.
450ms	(5	RTT	+	processing	=me)
250ms	(1	RTT	+	processing	=me)
without	push with	push
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Use-case: conceal network distance
n  CDNsʼ use-case
⁃  utilize the conn. while waiting for app. response
⁃  side-effect: reduce the number of app DCs
39	Developing the fastest HTTP/2 server
req.
push-asset
HTML
push-asset
push-asset
push-asset
client edge	server	(CDN) app.	server
req.
HTML
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Issues of server-push
n  how to determine if a resource is already cached
⁃  shouldnʼt push a resource already in cache
•  waste of bandwidth (and time)
⁃  canʼt issue a request to identify the cache state
•  since it would waste 1 RTT we are trying to reduce!
40	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Cache-aware server push
n  experimental feature since H2O 1.5
n  create a digest of URLs found in browser cache
⁃  uses Golomb coded sets
•  space-efficient variant of bloom filter
n  server uses the digest to determine whether or not
to push
41	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Memo: fresh vs. stale
n  two states of a cached resource
n  fresh:
⁃  resource that can be used
⁃  example: Expires: Jan 1 2030
n  stale:
⁃  needs revalidation before use
•  i.e. issue GET with if-modified-since
42	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Generating a digest
1.  calc hashcode of URLs of every fresh cache
⁃  range: 0 .. #-of-URL / false-positive-rate
2.  sort the hashcodes, remove duplicates
3.  emit the first element using the following encoding:
1.  “value * FPR” using unary coding
2.  “value mod (1/false-positive-rate)” using binary
coding
4.  for every other element, emit the delta from
preceding element subtracted by one using the
encoding
5.  pad 1 up to the byte boundary
43	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Generating a digest
n  scenario:
⁃  FPR: 1/256
⁃  URLs of fresh resources in cache:
•  https://example.com/ecma.js
•  https://example.com/style.css
n  calc hash modulo 512: 0x3d, 0x16b
n  sort, remove dupes, and emit the delta:
⁃  0x3d → 0 00111101
⁃  0x16b - 0x3d - 1 → 0x12d → 10 00101101
⁃  padding → 111111
44	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Overhead of sending the digest
n  size: #-of-URLs * (1/log2(FPR) + 1.x) bits
n  1,400 URLs can be stored in 1 packet
⁃  when false-positive-rate set to 1/128
n  can raise FPR to cram more URLs
⁃  false-positive means the resource is not pushed,
browser can just pull it
⁃  pushing some of the required resources is better
than none
45	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Where to store the digest?
n  cookie
⁃  pros: runs on any browser, anytime
⁃  cons: digest becomes inaccurate
•  only the browser knows whatʼs in the browser cache
n  ServiceWorker (+ServiceWorker Cache)
⁃  pros: runs on Chrome, Firefox
⁃  cons: doesnʼt start until leaving the landing page
n  HTTP/2 frame
⁃  pros: minimal octets transferred
•  thanks to the knowledge of HTTP/2 connection
⁃  cons: needs to be implemented by browser developer
46	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Discussion at IETF
n  IETF 95 (April)
⁃  initial submission of the internet draft
•  co-author: Mark Nottingham (HTTP WG Chair)
⁃  defines the HTTP/2 frame
•  since itʼs the best way in the long-term
•  store the frame in headers / cookies for the short-
term
n  IETF 96, HTTP Workshop (July)
⁃  to define digest calculation of stale resources
47	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Handling stale resources
n  hash key changed to URL + Etag
⁃  anyone needs support for last-modified?
n  server uses URL + Etag of the resource to check the
digest
⁃  push the resource in case a match is not found
⁃  push 304 Not Modified in case a match is found
48	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Difficulties in pushing 304
n  Etag cannot always be obtained immediately
⁃  cannot build If-Match request header without
etag
⁃  the “request*” of a pushed resource SHOULD be
sent before the main response
n  proposed solution:
⁃  allow 304 against a non-conditional GET
*: in case of server-push, the server generates both request and response, sends
them to the client.
49	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Using server-push from Ruby
n  Link: rel=preload header
⁃  web server pushes the specified URL
HTTP/1.1 200 OK
Content-Type: text/html
Link: </style.css>; rel=preload # this header!!!
⁃  supported by:
•  H2O, nghttpx (nghttp2), mod_h2 (Apache)
⁃  patch for nginx exists
50	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
The issue with Link: rel=preload
n  cannot initiate push while processing the request
51	Developing the fastest HTTP/2 server
client HTTP/2	server Web	app.
GET	/
can’t	push	at	
this	moment
GET	/
200	OK	
Link:	…200	OK
process	request
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
1xx Early Metadata
52	Developing the fastest HTTP/2 server
n  send Link: rel=preload as interim response
⁃  application sends 1xx then processes the request
n  supported in H2O 2.1
n  might propose for standardization in IETF
GET / HTTP/1.1
Host: example.com
HTTP/1.1 1xx Early Metadata
Link: </style.css>; rel=preload
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
<!DOCTYPE HTML>
...
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Sending 1xx from Rack
n  in case of Unicorn:
Proc.new do |env|
env[”unicorn.socket”].write(
”HTTP/1.1 1xx Early Metadatarn” +
”Link: </style.js>; rel=preloadrn” +
”rn”);
# time-consuming operation ...
[ 200, [ ... ], [ ... ] ]
end
...we need to define the formal API
53	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Conclusion
54	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Conclusion
n  the Web has become faster with HTTP/2
n  HTTP/2 becomes fast as to the limit of TCP/IP with:
⁃  optimizing TCP for responsiveness
⁃  Cache Digest
⁃  1xx Early Metadata
55	Developing the fastest HTTP/2 server
Copyright	(C)	2016	DeNA	Co.,Ltd.	All	Rights	Reserved.	
Q&A
n  Q. Can it be made faster than the limits o TCP/IP?
n  A. Yes!
⁃  shorten the RTT!
•  CDNsʼ approach
⁃  make DNS query part of TLS handshake
•  was part of TLS 1.3 draft (removed as too
premature)
⁃  fairness isnʼt a issue for a private network!
•  TCP optimizer for mobile carriers
56	Developing the fastest HTTP/2 server

More Related Content

Developing the fastest HTTP/2 server