SlideShare a Scribd company logo
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
H2O 
the optimized HTTP server 
DeNA Co., Ltd. 
Kazuho Oku 
1
Who am I? 
n long experience in network-‐‑‒related / high-‐‑‒ 
performance programming 
n works in the field: 
⁃ Palmscape / Xiino 
• world's first web browser for Palm OS, bundled by 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Sony, IBM, NTT DoCoMo 
⁃ MySQL extensions: Q4M, mycached, … 
• MySQL Conference Community Awards (as DeNA) 
⁃ JSX 
• altJS with an optimizing compiler 
H2O -‐‑‒ the optimized HTTP server2
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Agenda 
n Introduction of H2O 
n The motives behind 
n Writing a fast server 
n Writing H2O modules 
n Current status  the future 
n Questions regarding HTTP/2 
H2O -‐‑‒ the optimized HTTP server3
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Introducing H2O 
H2O -‐‑‒ the optimized HTTP server4
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
H2O – the umbrella project 
n h2o – the standalone HTTP server 
⁃ libh2o – can be used as a library as well 
n picohttpparser – the HTTP/1 parser 
n picotest – TAP-‐‑‒compatible testing library 
n qrintf – C preprocessor for optimizing s(n)printf 
n yoml – DOM-‐‑‒like wrapper for libyaml 
github.com/h2o 
H2O -‐‑‒ the optimized HTTP server5
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
h2o 
n the standalone HTTP server 
n protocols: 
⁃ HTTP/1.x 
⁃ HTTP/2 
• via Upgrade, NPN, ALPN, direct 
⁃ WebSocket (uses wslay) 
⁃ with SSL support (uses OpenSSL) 
n modules: 
⁃ file (static files), reverse-‐‑‒proxy, reproxy, deflate 
n configuration using yaml 
H2O -‐‑‒ the optimized HTTP server6
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
libh2o 
n h2o is also available as a library 
n event loop can be selected 
⁃ libuv 
⁃ h2o's embedded event loop 
n configurable via API and/or yaml 
⁃ dependency to libyaml is optional 
H2O -‐‑‒ the optimized HTTP server7
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Modular design 
n library layer: 
⁃ memory, string, socket, timeout, event-‐‑‒loop, 
http1client, … 
n protocol layer: 
⁃ http1, http2, websocket, loopback 
n handlers: 
⁃ file, reverse-‐‑‒proxy 
n output filters: 
⁃ chunked-‐‑‒encoder, deflate, reproxy 
n loggers: 
⁃ access-‐‑‒log 
H2O -‐‑‒ the optimized HTTP server8
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Testing 
n two levels of testing for better quality 
⁃ essential for keeping the protocol 
implementations and module-‐‑‒level API apart 
n unit-‐‑‒testing 
⁃ every module has (can have) it's own unit-‐‑‒test 
⁃ tests run using the loopback protocol handler 
• module-‐‑‒level unit-‐‑‒tests do not depend on the 
protocol 
n end-‐‑‒to-‐‑‒end testing 
⁃ spawns the server and connect via network 
⁃ uses nghttp2 
H2O -‐‑‒ the optimized HTTP server9
Internals 
n uses h2o_buf_t (pair of [char*, size_̲t]) is used to 
represent data 
⁃ common header names are interned into tokens 
• those defined in HPACK static_̲table + α 
n mostly zero-‐‑‒copy 
n incoming data allocated using: malloc, realloc, 
mmap 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ requires 64-‐‑‒bit arch for heavy use 
n uses writev for sending data 
H2O -‐‑‒ the optimized HTTP server10
6 
bytes 
1,024 
bytes 
10,240 
bytes 
6 
bytes 
1,024 
bytes 
10,240 
bytes 
6 
bytes 
1,024 
bytes 
10,240 
bytes 
6 
bytes 
1,024 
bytes 
10,240 
bytes 
HTTP/1 
(local; 
osx) 
HTTP/1 
(local; 
linux) 
HTTP/1 
(remote; 
linux) 
HTTPS/1 
(remote; 
linux) 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Fast 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
0 
Requests 
/ 
second.core 
nginx-­‐1.7.7 
h2o 
Note: 
used 
MacBook 
Pro 
Early 
2014 
(Core 
i7@2.4GHz), 
Amazon 
EC2 
cc2.8xlarge, 
no 
logging 
H2O -‐‑‒ the optimized HTTP server11
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Why is it fast? 
Why should it be fast? 
H2O -‐‑‒ the optimized HTTP server12
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
It all started with PSGI/Plack 
n PSGI/Plack is the WSGI/Rack for Perl 
n on Sep 7th 2010: 
⁃ first commit to github.com/plack/Plack 
⁃ I asked: why ever use FastCGI? 
• at the time, HTTP was believed to be slow, and 
FastCGI is necessary 
⁃ the other choice was to use Apache+mod_̲perl 
⁃ I proposed: 
• write a fast HTTP parser in C, and use it from Perl 
• get rid of specialized protocols / tightly-‐‑‒coupled 
legacy servers 
⁃ for ease of dev., H2O -‐‑‒ the optimized HTTP server deploy., admin.13
So I wrote HTTP::Parser::XS and picohttpparser. 
H2O -‐‑‒ the optimized HTTP server14 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
How fast is picohttpparser? 
n 10x faster than http-‐‑‒parser according to 3p bench. 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ github.com/fukamachi/fast-‐‑‒http 
HTTP 
Parser 
Performance 
Comparison 
329,033 
3,162,745 
3,500,000 
3,000,000 
2,500,000 
2,000,000 
1,500,000 
1,000,000 
500,000 
0 
hYp-­‐parser@5fd51fd 
picohYpparser@56975cd 
requests 
/ 
second 
H2O -‐‑‒ the optimized HTTP server15
HTTP::Parser::XS 
n the de-‐‑‒facto HTTP parser used by PSGI/Plack 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ PSGI/Plack is the WSGI/Rack for Perl 
n modern Perl-‐‑‒based services rarely use FastCGI or 
mod_̲perl 
n the application servers used (Starlet, Starman, etc.) 
speak HTTP using HTTP::Parser::XS 
⁃ application servers can be and in fact are written 
in Perl, since the slow part is handled by 
HTTP::Parser::XS 
n picohttpparser is the C-‐‑‒based backend of 
HTTP::Parser::XS 
H2O -‐‑‒ the optimized HTTP server16
The lessons learned 
n using one protocol (HTTP) everywhere reduces the 
TCO 
⁃ easier to develop, debug, test, monitor, 
administer 
⁃ popular protocols tend to be better designed  
implemented thanks to the competition 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
n similar transition happens everywhere 
⁃ WAP has been driven out by HTTP  HTML 
⁃ we rarely use FTP these days 
H2O -‐‑‒ the optimized HTTP server17
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
but HTTP is not yet used everywhere 
n web browser 
⁃ HTTP/1 is used now, transiting to HTTP/2 
n SOA / microservices 
⁃ HTTP/1 is used now 
• harder to transit to HTTP/2 since many proglangs 
use blocking I/O 
⁃ other protocols coexist: RDBMS, memcached, … 
• are they the next target of HTTP (like FastCGI?) 
n IoT 
• MQTT is emerging 
H2O -‐‑‒ the optimized HTTP server18
So I decided to write H2O 
n in July 2014 
n life of the developers becomes easier if all the 
services use HTTP 
n but for the purpose, it seems like we need to raise 
the bar (of performance) 
⁃ or other protocols may emerge / continue to be 
used 
n now (at the time of transition to HTTP/2) might be a 
good moment to start a performance race between 
HTTP implementers 
H2O -‐‑‒ the optimized HTTP server19 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Writing a fast server 
H2O -‐‑‒ the optimized HTTP server20
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Two things to be aware of 
n characteristics of a fast program 
1. executes less instructions 
• speed is a result of simplicity, not complexity 
2. causes less pipeline hazards 
• minimum number of conditional branches / indirect 
calls 
• use branch-‐‑‒predictor-‐‑‒friendly logic 
⁃ e.g. conditional branch exists, but it is taken 
95% 
H2O -‐‑‒ the optimized HTTP server21
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
H2O -‐‑‒ design principles 
n do it right 
⁃ local bottlenecks can be fixed afterwards 
⁃ large-‐‑‒scale design issues are hard to notice / fix 
n do it simple 
⁃ as explained 
⁃ provide / use hooks only at high-‐‑‒level 
• hooks exist for: protocol, generator, filter, logger 
H2O -‐‑‒ the optimized HTTP server22
The performance pitfalls 
n many server implementations spend CPU cycles in 
the following areas: 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ memory allocation 
⁃ parsing input 
⁃ stringifying output and logs 
⁃ timeout handling 
H2O -‐‑‒ the optimized HTTP server23
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Memory allocation 
H2O -‐‑‒ the optimized HTTP server24
Memory allocation in H2O 
n uses region-‐‑‒based memory management 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ memory pool of Apache 
n strategy: 
⁃ memory block is assigned to the Request object 
⁃ small allocations returns portions of the block 
⁃ memory is never returned to the block 
⁃ The entire memory block gets freed when the 
Request object is destroyed 
H2O -‐‑‒ the optimized HTTP server25
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Memory allocation in H2O (cont'd) 
n malloc (of small chunks) 
void *h2o_mempool_alloc(h2o_mempool_t *pool, size_t sz)! 
{! 
(snip)! 
void *ret = pool-chunks-bytes + pool-chunks-offset;! 
pool-chunks-offset += sz;! 
return ret;! 
} ! 
n free 
⁃ no code (as explained) 
H2O -‐‑‒ the optimized HTTP server26
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Parsing input 
H2O -‐‑‒ the optimized HTTP server27
Parsing input 
n HTTP/1 request parser may or may not be a 
bottleneck, depending on its performance 
⁃ if the parser is capable of handling 1M reqs/sec, 
then it will spend 10% of time if the server 
handles 100K reqs/sec. 
3,500,000 
3,000,000 
2,500,000 
2,000,000 
1,500,000 
1,000,000 
500,000 
H2O -‐‑‒ the optimized HTTP server28 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
329,033 
3,162,745 
0 
hYp-­‐parser@5fd51fd 
picohYpparser@56975cd 
requests 
/ 
second 
HTTP/1 
Parser 
Performance 
Comparison
Parsing input (cont'd) 
n it's good to know the logical upper-‐‑‒bound 
⁃ or we might try to optimize something that can 
no more be faster 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
n Q. How fast could a text parser be? 
H2O -‐‑‒ the optimized HTTP server29
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Q. How fast could a text server be? 
Answer: around 1GB/sec. is a good target 
⁃ since any parser needs to read every byte and 
execute a conditional branch depending on the 
value 
• # of instructions: 1 load + 1 inc + 1 test + 1 
conditional branch 
• would likely take several CPU cycles (even if 
superscalar) 
• unless we use SIMD instructions 
H2O -‐‑‒ the optimized HTTP server30
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Parsing input 
n What's wrong with this parser? 
for (; s != end; ++s) {! 
int ch = *s;! 
switch (ctx.state) {! 
case AAA:! 
if (ch == ' ')! 
ctx.state = BBB;! 
break;! 
case BBB:! 
...! 
}! 
H2O -‐‑‒ the optimized HTTP server31
Parsing input (cont'd) 
n never write a character-‐‑‒level state machine if 
performance matters 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
for (; s != end; ++s) {! 
int ch = *s;! 
switch (ctx.state) { // ß executed for every char! 
case AAA:! 
if (ch == ' ')! 
ctx.state = BBB;! 
break;! 
case BBB:! 
...! 
}! 
H2O -‐‑‒ the optimized HTTP server32
Parsing input fast 
n each state should consume a sequence of bytes 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
while (s != end) {! 
switch (ctx.state) {! 
case AAA:! 
do {! 
if (*s++ == ' ') {! 
ctx.state = BBB;! 
break;! 
}! 
} while (s != end);! 
break;! 
case BBB:! 
... 
H2O -‐‑‒ the optimized HTTP server33
Stateless parsing 
n stateless in the sense that no state value exists 
⁃ stateless parsers are generally faster than 
stateful parsers, since it does not have state -‐‑‒ a 
variable used for a conditional branch 
n HTTP/1 parsing can be stateless since the request-‐‑‒ 
line and the headers arrive in a single packet (in 
most cases) 
⁃ and even if they did not, it is easy to check if the 
end-‐‑‒of-‐‑‒headers has arrived (by looking for CR-‐‑‒ 
LF-‐‑‒CR-‐‑‒LF) and then parse the input 
• this countermeasure is essential to handle the 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Slowloris attack 
H2O -‐‑‒ the optimized HTTP server34
picohttpparser is stateless 
n states are the execution contexts (instead of being a 
variable) 
const char* parse_request(const char* buf, const char* buf_end, …)! 
{! 
/* parse request line */! 
ADVANCE_TOKEN(*method, *method_len);! 
++buf;! 
ADVANCE_TOKEN(*path, *path_len);! 
++buf;! 
if ((buf = parse_http_version(buf, buf_end, minor_version, ret)) == NULL)! 
return NULL;! 
EXPECT_CHAR('015');! 
EXPECT_CHAR('012');! 
return parse_headers(buf, buf_end, headers, num_headers, max_headers, …);! 
}! 
 
H2O -‐‑‒ the optimized HTTP server35 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
loop exists within a function (≒state) 
n the code looks for the end of the header value 
#define IS_PRINTABLE(c) ((unsigned char)(c) - 040u  0137u)! 
! 
static const char* get_token_to_eol(const char* buf, const char* buf_end, …! 
{! 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
while (likely(buf_end - buf = 8)) {! 
#define DOIT() if (unlikely(! IS_PRINTABLE(*buf))) goto NonPrintable; ++buf! 
DOIT(); DOIT(); DOIT(); DOIT();! 
DOIT(); DOIT(); DOIT(); DOIT();! 
#undef DOIT! 
continue;! 
NonPrintable:! 
if ((likely((uchar)*buf  '040')  likely(*buf != '011'))! 
|| unlikely(*buf == '177'))! 
goto FOUND_CTL;! 
} 
H2O -‐‑‒ the optimized HTTP server36
The hottest loop of picohttpparser (cont'd) 
n after compilation, uses 4 instructions per char 
movzbl (%r9), %r11d! 
movl %r11d, %eax! 
addl $-32, %eax! 
cmpl $94, %eax! 
ja LBB5_5! 
movzbl 1(%r9), %r11d // load char! 
leal -32(%r11), %eax // subtract! 
cmpl $94, %eax // and check if is printable! 
ja LBB5_4 // if not, break! 
movzbl 2(%r9), %r11d // load next char! 
leal -32(%r11), %eax // subtract! 
cmpl $94, %eax // and check if is printable! 
ja LBB5_15 // if not, break! 
movzbl 3(%r9), %r11d // load next char! 
… 
H2O -‐‑‒ the optimized HTTP server37 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
strlen 
vs. 
picoh?pparser 
strlen 
(simple) 
picohYpparser@56975cd 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
strlen vs. picohttparser 
n not as fast as strlen, but close 
size_t strlen(const char *s) {! 
const char *p = s;! 
for (; *p != '0'; ++p)! 
;! 
return p - s;! 
}! 
n ! 
not much room 
! 
 
left for further 
optimization (wo. 
using SIMD 
insns.)! 
! 
 
0.90 
0.80 
0.70 
0.60 
0.50 
0.40 
0.30 
0.20 
0.10 
0.00 
bytes 
/ 
clock 
H2O -‐‑‒ the optimized HTTP server38
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
picohttpparser is small and simple 
$ wc picohttpparser.?! 
376 1376 10900 picohttpparser.c! 
62 333 2225 picohttpparser.h! 
438 1709 13125 total! 
$ ! 
! 
n good example of do-‐‑‒it-‐‑‒simple-‐‑‒for-‐‑‒speed approach 
⁃ H2O (incl. the HTTP/2 parser) is designed using 
the approach 
H2O -‐‑‒ the optimized HTTP server39
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Stringification 
H2O -‐‑‒ the optimized HTTP server40
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Stringification 
n HTTP/1 responses are in strings 
sprintf(buf, HTTP/1.%d %d %srn, …)! 
n s(n)printf is known to be slow 
⁃ but the interface is great 
⁃ it's tiresome to write like: 
p = strappend_s(p, HTTP/1.);! 
p = strappend_n(p, minor_version);! 
*p++ = ' ';! 
P = strappend_n(p, status);! 
*p++ = ' ';! 
p = strappend_s(p, reason);! 
p = strappend_s(p, rn); 
H2O -‐‑‒ the optimized HTTP server41
Stringification (cont'd) 
n stringification is important for HTTP/2 servers too 
⁃ many elements still need to be stringified 
• headers (status, date, last-‐‑‒modified, etag, …) 
• access log (IP address, date, # of bytes, …) 
H2O -‐‑‒ the optimized HTTP server42 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Why is s(n)printf slow? 
n it's a state machine 
⁃ interprets the format string (e.g. hello: %s) at 
runtime 
n it uses the locale 
⁃ not for all types of variables, but… 
n it uses varargs 
n it's complicated 
⁃ sprintf may parse a number when used for 
stringifying a number 
sprintf(buf, %11d, status)! 
H2O -‐‑‒ the optimized HTTP server43
How should we optimize s(n)printf? 
n by compiling the format string at compile-‐‑‒time 
⁃ instead of interpreting it at runtime 
⁃ possible since the supplied format string is 
almost always a string literal 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
n and that's qrintf 
H2O -‐‑‒ the optimized HTTP server44
qrintf 
n qrintf is a preprocessor that rewrites s(n)printf 
invocations to set of functions calls specialized to 
each format string 
n qrintf-‐‑‒gcc is a wrapper of GCC that 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ first applies the GCC preprocessor 
⁃ then applies the qrintf preprocessor 
⁃ then calls the GCC compiler 
n similar wrapper could be implemented for Clang 
⁃ but it's a bit harder 
⁃ help wanted! 
H2O -‐‑‒ the optimized HTTP server45
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Example 
// original code (248 nanoseconds)! 
snprintf(buf, sizeof(buf), %u.%u.%u.%u, ! 
(addr  24)  0xff, (addr  16)  0xff, (addr  8)  0xff, addr  0xff);! 
! 
// after preprocessed by qrintf (21.5 nanoseconds)! 
_qrintf_chk_finalize(! 
_qrintf_chk_u(_qrintf_chk_c(! 
_qrintf_chk_u(_qrintf_chk_c(! 
_qrintf_chk_u(_qrintf_chk_c(! 
_qrintf_chk_u(! 
_qrintf_chk_init(buf, sizeof(buf)), (addr  24)  0xff),! 
'.'), (addr  16)  0xff),! 
'.'), (addr  8)  0xff),! 
'.'), addr  0xff)); 
H2O -‐‑‒ the optimized HTTP server46
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Performance impact on H2O 
n 20% performance gain 
⁃ gcc: 82,900 reqs/sec 
⁃ qrintf-‐‑‒gcc: 99,200 reqs/sec. 
n benchmark condition: 
⁃ 6-‐‑‒byte file GET over HTTP/1.1 
⁃ access logging to /dev/null 
H2O -‐‑‒ the optimized HTTP server47
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Timeout handling 
H2O -‐‑‒ the optimized HTTP server48
Timeout handling by the event loops 
n most event loops use balanced trees to handle 
timeouts 
⁃ so that timeout events can be triggered fast 
⁃ cons. is that it takes time to set the timeouts 
n in case of HTTP, timeout should be set at least once 
per request 
⁃ otherwise the server cannot close a stale 
connection 
H2O -‐‑‒ the optimized HTTP server49 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
Timeout requirements of a HTTP server 
n much more set than triggered 
⁃ is set more than once per request 
⁃ most requests succeed before timeout 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
n the timeout values are uniform 
⁃ e.g. request timeout for every connection would 
be the same (or i/o timeout or whatever) 
n balanced-‐‑‒tree does not seem like a good approach 
⁃ any other choice? 
H2O -‐‑‒ the optimized HTTP server50
Use pre-‐‑‒sorted link-‐‑‒list 
n H2O maintains a linked-‐‑‒list for each timeout 
configuration 
⁃ request timeout has its own linked-‐‑‒list, i/o 
timeout has its own, … 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
n how to set the timeout: 
⁃ timeout entry is inserted at the end of the linked-‐‑‒ 
list 
• thus the list is naturally sorted 
n how the timeouts get triggered: 
⁃ H2O iterates from the start of each linked-‐‑‒list, 
and triggers those that have timed-‐‑‒out 
H2O -‐‑‒ the optimized HTTP server51
note: 
N: 
number 
of 
]meout 
entries, 
M: 
number 
of 
]meout 
configura]ons, 
trigger 
performance 
of 
list 
of 
linked-­‐list 
can 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Comparison Chart 
OperaAon 
(frequency 
in 
HTTPD)Balanced-­‐treeList 
of 
linked-­‐list 
set 
(high)O(log 
N)O(1) 
clear 
(high)O(log 
N)O(1) 
trigger 
(low)O(1)O(M) 
be 
reduced 
to 
O(1) 
H2O -‐‑‒ the optimized HTTP server52
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Miscellaneous 
H2O -‐‑‒ the optimized HTTP server53
Miscellaneous 
n the entire stack of H2O is carefully designed (for 
simplicity and for performance) 
⁃ for example, the built-‐‑‒in event loop of H2O 
(which is the default for h2o), is faster than libuv 
0 
10,000 
20,000 
30,000 
40,000 
50,000 
60,000 
70,000 
80,000 
H2O -‐‑‒ the optimized HTTP server54 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
6 
bytes 
4,096 
bytes 
requests 
/ 
sec.core 
size 
of 
content 
Benchmark: 
libuv 
vs. 
internal 
libuv-­‐network-­‐and-­‐file@7876f53 
libuv-­‐network-­‐only@da85742 
internal 
(master@a5d1105)
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Writing H2O modules 
H2O -‐‑‒ the optimized HTTP server55
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Module types of H2O 
n handler 
⁃ generates the contents 
• e.g. file handler, proxy handler 
n filter 
⁃ modifies the content 
• e.g. chunked encoder, deflate 
⁃ can be chained 
n logger 
H2O -‐‑‒ the optimized HTTP server56
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Writing a hello world handler 
static int on_req(h2o_handler_t *self, h2o_req_t *req) {! 
static h2o_generator_t generator = {};! 
static h2o_buf_t body = H2O_STRLIT(hello worldn);! 
if (! h2o_memis(req-method.base, req-method.len, H2O_STRLIT(GET)))! 
return -1;! 
req-res.status = 200;! 
req-res.reason = OK;! 
h2o_add_header(req-pool, req-res.headers, H2O_TOKEN_CONTENT_TYPE,! 
H2O_STRLIT(text/plain));! 
h2o_start_response(req, generator);! 
h2o_send(req, body, 1, 1);! 
return 0;! 
}! 
! 
h2o_handler_t *handler = h2o_create_handler( host_config, sizeof(*handler));! 
handler-on_req = on_req; 
H2O -‐‑‒ the optimized HTTP server57
The handler API 
/**! 
* called by handlers to set the generator! 
* @param req the request! 
* @param generator the generator! 
*/! 
void h2o_start_response(h2o_req_t *req, h2o_generator_t *generator);! 
/**! 
* called by the generators to send output! 
* note: generator should close the resources opened by itself after sending the 
final chunk (i.e. calling the function with is_final set to true)! 
* @param req the request! 
* @param bufs an array of buffers! 
* @param bufcnt length of the buffers array! 
* @param is_final if the output is final! 
*/! 
void h2o_send(h2o_req_t *req, h2o_buf_t *bufs, size_t bufcnt, int is_final);! 
 
H2O -‐‑‒ the optimized HTTP server58 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
The handler API (cont'd) 
/**! 
* an object that generates a response.! 
* The object is typically constructed by handlers that call h2o_start_response.! 
*/! 
typedef struct st_h2o_generator_t {! 
/**! 
* called by the core to request new data to be pushed via h2o_send! 
*/! 
void (*proceed)(struct st_h2o_generator_t *self, h2o_req_t *req);! 
/**! 
* called by the core when there is a need to terminate the response! 
*/! 
void (*stop)(struct st_h2o_generator_t *self, h2o_req_t *req);! 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
} h2o_generator_t;! 
 
H2O -‐‑‒ the optimized HTTP server59
Module examples 
n Simple examples exist in the examples/ dir 
n lib/chunked.c is a good example of the filter API 
H2O -‐‑‒ the optimized HTTP server60 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Current Status  the Future 
H2O -‐‑‒ the optimized HTTP server61
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Development Status 
n core 
⁃ mostly feature complete 
n protocol 
⁃ http/1 – mostly feature complete 
⁃ http/2 – interoperable 
n modules 
⁃ file – complete 
⁃ proxy – interoperable 
• name resolution is blocking 
• does not support keep-‐‑‒alive 
H2O -‐‑‒ the optimized HTTP server62
HTTP/2 status of H2O 
n interoperable, but some parts are missing 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ HPACK resize 
⁃ priority handling 
n priority handling is essential for HTTP/2 
⁃ without, HTTP/2 is slower than HTTP/1 L 
n need to tweak performance 
⁃ SSL-‐‑‒related code is not yet optimized 
• first benchmark was taken last Saturday J 
H2O -‐‑‒ the optimized HTTP server63
HTTP/2 over TLS benchmark 
n need to fix the dropdown, likely caused by: 
⁃ H2O uses writev to gather data into a single 
socket op., but OpenSSL does not provide 
scatter-‐‑‒gather I/O 
120,000 
100,000 
80,000 
60,000 
40,000 
20,000 
H2O -‐‑‒ the optimized HTTP server64 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
0 
6 
bytes 
1,024 
bytes 
10,240 
bytes 
HTTPS/2 
(remote; 
linux) 
nghYpd 
h2o 
⁃ in H2O, every file 
handler has its own 
buffer and pushes 
content to the 
protocol layer 
• nghttpd pulls 
instead, which is 
more memory-‐‑‒ 
efficient / no need
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Goal of the project 
n to become the best HTTP/2 server 
⁃ with excellent performance in serving static 
files / as a reverse proxy 
• note: picohttpserver and other libraries are also used 
in the reverse proxy implementation 
n to become the favored HTTP server library 
⁃ esp. for server products 
⁃ to widen the acceptance of HTTP protocol even 
more 
H2O -‐‑‒ the optimized HTTP server65
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Help wanted 
n looking for contributors in all areas 
⁃ addition of modules might be the easiest, since it 
would not interfere with the development of the 
core / protocol layer 
⁃ examples, docs, tests are also welcome 
n it's easy to start 
⁃ since the code-‐‑‒base is young and simple 
Subsystemwc 
–l 
(incl. 
unit-­‐tests) 
Core2,334 
Library1,856 
Socket 
 
event 
loop1,771 
HTTP/1 
(incl. 
picohYpparser)886 
HTTP/22,507 
Modules1,906 
Server573 
H2O -‐‑‒ the optimized HTTP server66
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Questions regarding HTTP/2 
H2O -‐‑‒ the optimized HTTP server67
Sorry, I do not have much to talk 
n since it is a well-‐‑‒designed protocol 
n and in terms of performance, apparently binary 
protocols are easier to implement than a text 
protocol J 
⁃ there's a efficient algorithm for the static 
Huffman decoder 
• @tatsuhiro-‐‑‒t implemented it, I copied 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
n OTOH I have some questions re HTTP/2 
H2O -‐‑‒ the optimized HTTP server68
Q. would there be a max-‐‑‒open-‐‑‒files issue? 
n according to the draft, recommended value of 
MAX_̲CONCURRENT_̲STREAMS is = 100 
n if max-‐‑‒connections is 1024, it would mean that the 
max fd would be above 10k 
⁃ on linux, the default (NR_̲OPEN) is 1,048,576 
and is adjustable 
⁃ but on other OS? 
n H2O by default limits the number of in-‐‑‒flight 
requests internally to 16 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
⁃ the value is configurable 
H2O -‐‑‒ the optimized HTTP server69
Q. good way to determine the window size? 
n initial window size (64k) might be too small to 
saturate the avaiable bandwidth depending on the 
latency 
⁃ but for responsiveness we would not want the 
value to be too high 
⁃ is there any recommendation on how we should 
tune the variable? 
H2O -‐‑‒ the optimized HTTP server70 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Q. should we continue to use CDN? 
n HTTP/2 has priority control 
⁃ CDN and primary website would use different 
TCP connection 
• means that priority control would not work bet. CDN 
and the primary website 
n should we better serve all the asset files from the 
primary website? 
H2O -‐‑‒ the optimized HTTP server71
Never hide the Server header 
n name and version info. is essential for interoperability 
⁃ many (if not all) webapps use the User-‐‑‒Agent value to 
evade bugs 
⁃ used to be same at the HTTP/1 layer in the early days 
n there will be interoperability problems bet. HTTP/2 impls. 
⁃ the Server header is essential for implementing 
workarounds 
n some believe that hiding the header improves security 
⁃ we should speak that they are wrong; that security-‐‑‒by-‐‑‒ 
obscurity does not work on the Net, and hiding the 
value harms interoperability and the adoption of HTTP/ 
2 
H2O -‐‑‒ the optimized HTTP server72 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved. 
Summary 
H2O -‐‑‒ the optimized HTTP server73
Summary 
n H2O is an optimized HTTP server implementation 
⁃ with neat design to support both HTTP/1 and 
HTTP/2 
⁃ is still very young 
• lots of areas to work on! 
• incl. improving the HTTP/2 support 
n help wanted! Let's write the HTTPD of the future! 
H2O -‐‑‒ the optimized HTTP server74 
Copyright 
(C) 
2014 
DeNA 
Co.,Ltd. 
All 
Rights 
Reserved.

More Related Content

H2O - the optimized HTTP server

  • 1. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. H2O the optimized HTTP server DeNA Co., Ltd. Kazuho Oku 1
  • 2. Who am I? n long experience in network-‐‑‒related / high-‐‑‒ performance programming n works in the field: ⁃ Palmscape / Xiino • world's first web browser for Palm OS, bundled by Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Sony, IBM, NTT DoCoMo ⁃ MySQL extensions: Q4M, mycached, … • MySQL Conference Community Awards (as DeNA) ⁃ JSX • altJS with an optimizing compiler H2O -‐‑‒ the optimized HTTP server2
  • 3. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Agenda n Introduction of H2O n The motives behind n Writing a fast server n Writing H2O modules n Current status the future n Questions regarding HTTP/2 H2O -‐‑‒ the optimized HTTP server3
  • 4. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Introducing H2O H2O -‐‑‒ the optimized HTTP server4
  • 5. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. H2O – the umbrella project n h2o – the standalone HTTP server ⁃ libh2o – can be used as a library as well n picohttpparser – the HTTP/1 parser n picotest – TAP-‐‑‒compatible testing library n qrintf – C preprocessor for optimizing s(n)printf n yoml – DOM-‐‑‒like wrapper for libyaml github.com/h2o H2O -‐‑‒ the optimized HTTP server5
  • 6. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. h2o n the standalone HTTP server n protocols: ⁃ HTTP/1.x ⁃ HTTP/2 • via Upgrade, NPN, ALPN, direct ⁃ WebSocket (uses wslay) ⁃ with SSL support (uses OpenSSL) n modules: ⁃ file (static files), reverse-‐‑‒proxy, reproxy, deflate n configuration using yaml H2O -‐‑‒ the optimized HTTP server6
  • 7. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. libh2o n h2o is also available as a library n event loop can be selected ⁃ libuv ⁃ h2o's embedded event loop n configurable via API and/or yaml ⁃ dependency to libyaml is optional H2O -‐‑‒ the optimized HTTP server7
  • 8. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Modular design n library layer: ⁃ memory, string, socket, timeout, event-‐‑‒loop, http1client, … n protocol layer: ⁃ http1, http2, websocket, loopback n handlers: ⁃ file, reverse-‐‑‒proxy n output filters: ⁃ chunked-‐‑‒encoder, deflate, reproxy n loggers: ⁃ access-‐‑‒log H2O -‐‑‒ the optimized HTTP server8
  • 9. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Testing n two levels of testing for better quality ⁃ essential for keeping the protocol implementations and module-‐‑‒level API apart n unit-‐‑‒testing ⁃ every module has (can have) it's own unit-‐‑‒test ⁃ tests run using the loopback protocol handler • module-‐‑‒level unit-‐‑‒tests do not depend on the protocol n end-‐‑‒to-‐‑‒end testing ⁃ spawns the server and connect via network ⁃ uses nghttp2 H2O -‐‑‒ the optimized HTTP server9
  • 10. Internals n uses h2o_buf_t (pair of [char*, size_̲t]) is used to represent data ⁃ common header names are interned into tokens • those defined in HPACK static_̲table + α n mostly zero-‐‑‒copy n incoming data allocated using: malloc, realloc, mmap Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ requires 64-‐‑‒bit arch for heavy use n uses writev for sending data H2O -‐‑‒ the optimized HTTP server10
  • 11. 6 bytes 1,024 bytes 10,240 bytes 6 bytes 1,024 bytes 10,240 bytes 6 bytes 1,024 bytes 10,240 bytes 6 bytes 1,024 bytes 10,240 bytes HTTP/1 (local; osx) HTTP/1 (local; linux) HTTP/1 (remote; linux) HTTPS/1 (remote; linux) Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Fast 120,000 100,000 80,000 60,000 40,000 20,000 0 Requests / second.core nginx-­‐1.7.7 h2o Note: used MacBook Pro Early 2014 (Core [email protected]), Amazon EC2 cc2.8xlarge, no logging H2O -‐‑‒ the optimized HTTP server11
  • 12. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Why is it fast? Why should it be fast? H2O -‐‑‒ the optimized HTTP server12
  • 13. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. It all started with PSGI/Plack n PSGI/Plack is the WSGI/Rack for Perl n on Sep 7th 2010: ⁃ first commit to github.com/plack/Plack ⁃ I asked: why ever use FastCGI? • at the time, HTTP was believed to be slow, and FastCGI is necessary ⁃ the other choice was to use Apache+mod_̲perl ⁃ I proposed: • write a fast HTTP parser in C, and use it from Perl • get rid of specialized protocols / tightly-‐‑‒coupled legacy servers ⁃ for ease of dev., H2O -‐‑‒ the optimized HTTP server deploy., admin.13
  • 14. So I wrote HTTP::Parser::XS and picohttpparser. H2O -‐‑‒ the optimized HTTP server14 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 15. How fast is picohttpparser? n 10x faster than http-‐‑‒parser according to 3p bench. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ github.com/fukamachi/fast-‐‑‒http HTTP Parser Performance Comparison 329,033 3,162,745 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 0 hYp-­‐parser@5fd51fd picohYpparser@56975cd requests / second H2O -‐‑‒ the optimized HTTP server15
  • 16. HTTP::Parser::XS n the de-‐‑‒facto HTTP parser used by PSGI/Plack Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ PSGI/Plack is the WSGI/Rack for Perl n modern Perl-‐‑‒based services rarely use FastCGI or mod_̲perl n the application servers used (Starlet, Starman, etc.) speak HTTP using HTTP::Parser::XS ⁃ application servers can be and in fact are written in Perl, since the slow part is handled by HTTP::Parser::XS n picohttpparser is the C-‐‑‒based backend of HTTP::Parser::XS H2O -‐‑‒ the optimized HTTP server16
  • 17. The lessons learned n using one protocol (HTTP) everywhere reduces the TCO ⁃ easier to develop, debug, test, monitor, administer ⁃ popular protocols tend to be better designed implemented thanks to the competition Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. n similar transition happens everywhere ⁃ WAP has been driven out by HTTP HTML ⁃ we rarely use FTP these days H2O -‐‑‒ the optimized HTTP server17
  • 18. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. but HTTP is not yet used everywhere n web browser ⁃ HTTP/1 is used now, transiting to HTTP/2 n SOA / microservices ⁃ HTTP/1 is used now • harder to transit to HTTP/2 since many proglangs use blocking I/O ⁃ other protocols coexist: RDBMS, memcached, … • are they the next target of HTTP (like FastCGI?) n IoT • MQTT is emerging H2O -‐‑‒ the optimized HTTP server18
  • 19. So I decided to write H2O n in July 2014 n life of the developers becomes easier if all the services use HTTP n but for the purpose, it seems like we need to raise the bar (of performance) ⁃ or other protocols may emerge / continue to be used n now (at the time of transition to HTTP/2) might be a good moment to start a performance race between HTTP implementers H2O -‐‑‒ the optimized HTTP server19 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 20. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Writing a fast server H2O -‐‑‒ the optimized HTTP server20
  • 21. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Two things to be aware of n characteristics of a fast program 1. executes less instructions • speed is a result of simplicity, not complexity 2. causes less pipeline hazards • minimum number of conditional branches / indirect calls • use branch-‐‑‒predictor-‐‑‒friendly logic ⁃ e.g. conditional branch exists, but it is taken 95% H2O -‐‑‒ the optimized HTTP server21
  • 22. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. H2O -‐‑‒ design principles n do it right ⁃ local bottlenecks can be fixed afterwards ⁃ large-‐‑‒scale design issues are hard to notice / fix n do it simple ⁃ as explained ⁃ provide / use hooks only at high-‐‑‒level • hooks exist for: protocol, generator, filter, logger H2O -‐‑‒ the optimized HTTP server22
  • 23. The performance pitfalls n many server implementations spend CPU cycles in the following areas: Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ memory allocation ⁃ parsing input ⁃ stringifying output and logs ⁃ timeout handling H2O -‐‑‒ the optimized HTTP server23
  • 24. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Memory allocation H2O -‐‑‒ the optimized HTTP server24
  • 25. Memory allocation in H2O n uses region-‐‑‒based memory management Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ memory pool of Apache n strategy: ⁃ memory block is assigned to the Request object ⁃ small allocations returns portions of the block ⁃ memory is never returned to the block ⁃ The entire memory block gets freed when the Request object is destroyed H2O -‐‑‒ the optimized HTTP server25
  • 26. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Memory allocation in H2O (cont'd) n malloc (of small chunks) void *h2o_mempool_alloc(h2o_mempool_t *pool, size_t sz)! {! (snip)! void *ret = pool-chunks-bytes + pool-chunks-offset;! pool-chunks-offset += sz;! return ret;! } ! n free ⁃ no code (as explained) H2O -‐‑‒ the optimized HTTP server26
  • 27. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Parsing input H2O -‐‑‒ the optimized HTTP server27
  • 28. Parsing input n HTTP/1 request parser may or may not be a bottleneck, depending on its performance ⁃ if the parser is capable of handling 1M reqs/sec, then it will spend 10% of time if the server handles 100K reqs/sec. 3,500,000 3,000,000 2,500,000 2,000,000 1,500,000 1,000,000 500,000 H2O -‐‑‒ the optimized HTTP server28 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. 329,033 3,162,745 0 hYp-­‐parser@5fd51fd picohYpparser@56975cd requests / second HTTP/1 Parser Performance Comparison
  • 29. Parsing input (cont'd) n it's good to know the logical upper-‐‑‒bound ⁃ or we might try to optimize something that can no more be faster Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. n Q. How fast could a text parser be? H2O -‐‑‒ the optimized HTTP server29
  • 30. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Q. How fast could a text server be? Answer: around 1GB/sec. is a good target ⁃ since any parser needs to read every byte and execute a conditional branch depending on the value • # of instructions: 1 load + 1 inc + 1 test + 1 conditional branch • would likely take several CPU cycles (even if superscalar) • unless we use SIMD instructions H2O -‐‑‒ the optimized HTTP server30
  • 31. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Parsing input n What's wrong with this parser? for (; s != end; ++s) {! int ch = *s;! switch (ctx.state) {! case AAA:! if (ch == ' ')! ctx.state = BBB;! break;! case BBB:! ...! }! H2O -‐‑‒ the optimized HTTP server31
  • 32. Parsing input (cont'd) n never write a character-‐‑‒level state machine if performance matters Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. for (; s != end; ++s) {! int ch = *s;! switch (ctx.state) { // ß executed for every char! case AAA:! if (ch == ' ')! ctx.state = BBB;! break;! case BBB:! ...! }! H2O -‐‑‒ the optimized HTTP server32
  • 33. Parsing input fast n each state should consume a sequence of bytes Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. while (s != end) {! switch (ctx.state) {! case AAA:! do {! if (*s++ == ' ') {! ctx.state = BBB;! break;! }! } while (s != end);! break;! case BBB:! ... H2O -‐‑‒ the optimized HTTP server33
  • 34. Stateless parsing n stateless in the sense that no state value exists ⁃ stateless parsers are generally faster than stateful parsers, since it does not have state -‐‑‒ a variable used for a conditional branch n HTTP/1 parsing can be stateless since the request-‐‑‒ line and the headers arrive in a single packet (in most cases) ⁃ and even if they did not, it is easy to check if the end-‐‑‒of-‐‑‒headers has arrived (by looking for CR-‐‑‒ LF-‐‑‒CR-‐‑‒LF) and then parse the input • this countermeasure is essential to handle the Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Slowloris attack H2O -‐‑‒ the optimized HTTP server34
  • 35. picohttpparser is stateless n states are the execution contexts (instead of being a variable) const char* parse_request(const char* buf, const char* buf_end, …)! {! /* parse request line */! ADVANCE_TOKEN(*method, *method_len);! ++buf;! ADVANCE_TOKEN(*path, *path_len);! ++buf;! if ((buf = parse_http_version(buf, buf_end, minor_version, ret)) == NULL)! return NULL;! EXPECT_CHAR('015');! EXPECT_CHAR('012');! return parse_headers(buf, buf_end, headers, num_headers, max_headers, …);! }! H2O -‐‑‒ the optimized HTTP server35 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 36. loop exists within a function (≒state) n the code looks for the end of the header value #define IS_PRINTABLE(c) ((unsigned char)(c) - 040u 0137u)! ! static const char* get_token_to_eol(const char* buf, const char* buf_end, …! {! Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. while (likely(buf_end - buf = 8)) {! #define DOIT() if (unlikely(! IS_PRINTABLE(*buf))) goto NonPrintable; ++buf! DOIT(); DOIT(); DOIT(); DOIT();! DOIT(); DOIT(); DOIT(); DOIT();! #undef DOIT! continue;! NonPrintable:! if ((likely((uchar)*buf '040') likely(*buf != '011'))! || unlikely(*buf == '177'))! goto FOUND_CTL;! } H2O -‐‑‒ the optimized HTTP server36
  • 37. The hottest loop of picohttpparser (cont'd) n after compilation, uses 4 instructions per char movzbl (%r9), %r11d! movl %r11d, %eax! addl $-32, %eax! cmpl $94, %eax! ja LBB5_5! movzbl 1(%r9), %r11d // load char! leal -32(%r11), %eax // subtract! cmpl $94, %eax // and check if is printable! ja LBB5_4 // if not, break! movzbl 2(%r9), %r11d // load next char! leal -32(%r11), %eax // subtract! cmpl $94, %eax // and check if is printable! ja LBB5_15 // if not, break! movzbl 3(%r9), %r11d // load next char! … H2O -‐‑‒ the optimized HTTP server37 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 38. strlen vs. picoh?pparser strlen (simple) picohYpparser@56975cd Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. strlen vs. picohttparser n not as fast as strlen, but close size_t strlen(const char *s) {! const char *p = s;! for (; *p != '0'; ++p)! ;! return p - s;! }! n ! not much room ! left for further optimization (wo. using SIMD insns.)! ! 0.90 0.80 0.70 0.60 0.50 0.40 0.30 0.20 0.10 0.00 bytes / clock H2O -‐‑‒ the optimized HTTP server38
  • 39. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. picohttpparser is small and simple $ wc picohttpparser.?! 376 1376 10900 picohttpparser.c! 62 333 2225 picohttpparser.h! 438 1709 13125 total! $ ! ! n good example of do-‐‑‒it-‐‑‒simple-‐‑‒for-‐‑‒speed approach ⁃ H2O (incl. the HTTP/2 parser) is designed using the approach H2O -‐‑‒ the optimized HTTP server39
  • 40. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Stringification H2O -‐‑‒ the optimized HTTP server40
  • 41. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Stringification n HTTP/1 responses are in strings sprintf(buf, HTTP/1.%d %d %srn, …)! n s(n)printf is known to be slow ⁃ but the interface is great ⁃ it's tiresome to write like: p = strappend_s(p, HTTP/1.);! p = strappend_n(p, minor_version);! *p++ = ' ';! P = strappend_n(p, status);! *p++ = ' ';! p = strappend_s(p, reason);! p = strappend_s(p, rn); H2O -‐‑‒ the optimized HTTP server41
  • 42. Stringification (cont'd) n stringification is important for HTTP/2 servers too ⁃ many elements still need to be stringified • headers (status, date, last-‐‑‒modified, etag, …) • access log (IP address, date, # of bytes, …) H2O -‐‑‒ the optimized HTTP server42 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 43. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Why is s(n)printf slow? n it's a state machine ⁃ interprets the format string (e.g. hello: %s) at runtime n it uses the locale ⁃ not for all types of variables, but… n it uses varargs n it's complicated ⁃ sprintf may parse a number when used for stringifying a number sprintf(buf, %11d, status)! H2O -‐‑‒ the optimized HTTP server43
  • 44. How should we optimize s(n)printf? n by compiling the format string at compile-‐‑‒time ⁃ instead of interpreting it at runtime ⁃ possible since the supplied format string is almost always a string literal Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. n and that's qrintf H2O -‐‑‒ the optimized HTTP server44
  • 45. qrintf n qrintf is a preprocessor that rewrites s(n)printf invocations to set of functions calls specialized to each format string n qrintf-‐‑‒gcc is a wrapper of GCC that Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ first applies the GCC preprocessor ⁃ then applies the qrintf preprocessor ⁃ then calls the GCC compiler n similar wrapper could be implemented for Clang ⁃ but it's a bit harder ⁃ help wanted! H2O -‐‑‒ the optimized HTTP server45
  • 46. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Example // original code (248 nanoseconds)! snprintf(buf, sizeof(buf), %u.%u.%u.%u, ! (addr 24) 0xff, (addr 16) 0xff, (addr 8) 0xff, addr 0xff);! ! // after preprocessed by qrintf (21.5 nanoseconds)! _qrintf_chk_finalize(! _qrintf_chk_u(_qrintf_chk_c(! _qrintf_chk_u(_qrintf_chk_c(! _qrintf_chk_u(_qrintf_chk_c(! _qrintf_chk_u(! _qrintf_chk_init(buf, sizeof(buf)), (addr 24) 0xff),! '.'), (addr 16) 0xff),! '.'), (addr 8) 0xff),! '.'), addr 0xff)); H2O -‐‑‒ the optimized HTTP server46
  • 47. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Performance impact on H2O n 20% performance gain ⁃ gcc: 82,900 reqs/sec ⁃ qrintf-‐‑‒gcc: 99,200 reqs/sec. n benchmark condition: ⁃ 6-‐‑‒byte file GET over HTTP/1.1 ⁃ access logging to /dev/null H2O -‐‑‒ the optimized HTTP server47
  • 48. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Timeout handling H2O -‐‑‒ the optimized HTTP server48
  • 49. Timeout handling by the event loops n most event loops use balanced trees to handle timeouts ⁃ so that timeout events can be triggered fast ⁃ cons. is that it takes time to set the timeouts n in case of HTTP, timeout should be set at least once per request ⁃ otherwise the server cannot close a stale connection H2O -‐‑‒ the optimized HTTP server49 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 50. Timeout requirements of a HTTP server n much more set than triggered ⁃ is set more than once per request ⁃ most requests succeed before timeout Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. n the timeout values are uniform ⁃ e.g. request timeout for every connection would be the same (or i/o timeout or whatever) n balanced-‐‑‒tree does not seem like a good approach ⁃ any other choice? H2O -‐‑‒ the optimized HTTP server50
  • 51. Use pre-‐‑‒sorted link-‐‑‒list n H2O maintains a linked-‐‑‒list for each timeout configuration ⁃ request timeout has its own linked-‐‑‒list, i/o timeout has its own, … Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. n how to set the timeout: ⁃ timeout entry is inserted at the end of the linked-‐‑‒ list • thus the list is naturally sorted n how the timeouts get triggered: ⁃ H2O iterates from the start of each linked-‐‑‒list, and triggers those that have timed-‐‑‒out H2O -‐‑‒ the optimized HTTP server51
  • 52. note: N: number of ]meout entries, M: number of ]meout configura]ons, trigger performance of list of linked-­‐list can Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Comparison Chart OperaAon (frequency in HTTPD)Balanced-­‐treeList of linked-­‐list set (high)O(log N)O(1) clear (high)O(log N)O(1) trigger (low)O(1)O(M) be reduced to O(1) H2O -‐‑‒ the optimized HTTP server52
  • 53. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Miscellaneous H2O -‐‑‒ the optimized HTTP server53
  • 54. Miscellaneous n the entire stack of H2O is carefully designed (for simplicity and for performance) ⁃ for example, the built-‐‑‒in event loop of H2O (which is the default for h2o), is faster than libuv 0 10,000 20,000 30,000 40,000 50,000 60,000 70,000 80,000 H2O -‐‑‒ the optimized HTTP server54 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. 6 bytes 4,096 bytes requests / sec.core size of content Benchmark: libuv vs. internal libuv-­‐network-­‐and-­‐file@7876f53 libuv-­‐network-­‐only@da85742 internal (master@a5d1105)
  • 55. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Writing H2O modules H2O -‐‑‒ the optimized HTTP server55
  • 56. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Module types of H2O n handler ⁃ generates the contents • e.g. file handler, proxy handler n filter ⁃ modifies the content • e.g. chunked encoder, deflate ⁃ can be chained n logger H2O -‐‑‒ the optimized HTTP server56
  • 57. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Writing a hello world handler static int on_req(h2o_handler_t *self, h2o_req_t *req) {! static h2o_generator_t generator = {};! static h2o_buf_t body = H2O_STRLIT(hello worldn);! if (! h2o_memis(req-method.base, req-method.len, H2O_STRLIT(GET)))! return -1;! req-res.status = 200;! req-res.reason = OK;! h2o_add_header(req-pool, req-res.headers, H2O_TOKEN_CONTENT_TYPE,! H2O_STRLIT(text/plain));! h2o_start_response(req, generator);! h2o_send(req, body, 1, 1);! return 0;! }! ! h2o_handler_t *handler = h2o_create_handler( host_config, sizeof(*handler));! handler-on_req = on_req; H2O -‐‑‒ the optimized HTTP server57
  • 58. The handler API /**! * called by handlers to set the generator! * @param req the request! * @param generator the generator! */! void h2o_start_response(h2o_req_t *req, h2o_generator_t *generator);! /**! * called by the generators to send output! * note: generator should close the resources opened by itself after sending the final chunk (i.e. calling the function with is_final set to true)! * @param req the request! * @param bufs an array of buffers! * @param bufcnt length of the buffers array! * @param is_final if the output is final! */! void h2o_send(h2o_req_t *req, h2o_buf_t *bufs, size_t bufcnt, int is_final);! H2O -‐‑‒ the optimized HTTP server58 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 59. The handler API (cont'd) /**! * an object that generates a response.! * The object is typically constructed by handlers that call h2o_start_response.! */! typedef struct st_h2o_generator_t {! /**! * called by the core to request new data to be pushed via h2o_send! */! void (*proceed)(struct st_h2o_generator_t *self, h2o_req_t *req);! /**! * called by the core when there is a need to terminate the response! */! void (*stop)(struct st_h2o_generator_t *self, h2o_req_t *req);! Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. } h2o_generator_t;! H2O -‐‑‒ the optimized HTTP server59
  • 60. Module examples n Simple examples exist in the examples/ dir n lib/chunked.c is a good example of the filter API H2O -‐‑‒ the optimized HTTP server60 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 61. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Current Status the Future H2O -‐‑‒ the optimized HTTP server61
  • 62. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Development Status n core ⁃ mostly feature complete n protocol ⁃ http/1 – mostly feature complete ⁃ http/2 – interoperable n modules ⁃ file – complete ⁃ proxy – interoperable • name resolution is blocking • does not support keep-‐‑‒alive H2O -‐‑‒ the optimized HTTP server62
  • 63. HTTP/2 status of H2O n interoperable, but some parts are missing Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ HPACK resize ⁃ priority handling n priority handling is essential for HTTP/2 ⁃ without, HTTP/2 is slower than HTTP/1 L n need to tweak performance ⁃ SSL-‐‑‒related code is not yet optimized • first benchmark was taken last Saturday J H2O -‐‑‒ the optimized HTTP server63
  • 64. HTTP/2 over TLS benchmark n need to fix the dropdown, likely caused by: ⁃ H2O uses writev to gather data into a single socket op., but OpenSSL does not provide scatter-‐‑‒gather I/O 120,000 100,000 80,000 60,000 40,000 20,000 H2O -‐‑‒ the optimized HTTP server64 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. 0 6 bytes 1,024 bytes 10,240 bytes HTTPS/2 (remote; linux) nghYpd h2o ⁃ in H2O, every file handler has its own buffer and pushes content to the protocol layer • nghttpd pulls instead, which is more memory-‐‑‒ efficient / no need
  • 65. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Goal of the project n to become the best HTTP/2 server ⁃ with excellent performance in serving static files / as a reverse proxy • note: picohttpserver and other libraries are also used in the reverse proxy implementation n to become the favored HTTP server library ⁃ esp. for server products ⁃ to widen the acceptance of HTTP protocol even more H2O -‐‑‒ the optimized HTTP server65
  • 66. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Help wanted n looking for contributors in all areas ⁃ addition of modules might be the easiest, since it would not interfere with the development of the core / protocol layer ⁃ examples, docs, tests are also welcome n it's easy to start ⁃ since the code-‐‑‒base is young and simple Subsystemwc –l (incl. unit-­‐tests) Core2,334 Library1,856 Socket event loop1,771 HTTP/1 (incl. picohYpparser)886 HTTP/22,507 Modules1,906 Server573 H2O -‐‑‒ the optimized HTTP server66
  • 67. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Questions regarding HTTP/2 H2O -‐‑‒ the optimized HTTP server67
  • 68. Sorry, I do not have much to talk n since it is a well-‐‑‒designed protocol n and in terms of performance, apparently binary protocols are easier to implement than a text protocol J ⁃ there's a efficient algorithm for the static Huffman decoder • @tatsuhiro-‐‑‒t implemented it, I copied Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. n OTOH I have some questions re HTTP/2 H2O -‐‑‒ the optimized HTTP server68
  • 69. Q. would there be a max-‐‑‒open-‐‑‒files issue? n according to the draft, recommended value of MAX_̲CONCURRENT_̲STREAMS is = 100 n if max-‐‑‒connections is 1024, it would mean that the max fd would be above 10k ⁃ on linux, the default (NR_̲OPEN) is 1,048,576 and is adjustable ⁃ but on other OS? n H2O by default limits the number of in-‐‑‒flight requests internally to 16 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. ⁃ the value is configurable H2O -‐‑‒ the optimized HTTP server69
  • 70. Q. good way to determine the window size? n initial window size (64k) might be too small to saturate the avaiable bandwidth depending on the latency ⁃ but for responsiveness we would not want the value to be too high ⁃ is there any recommendation on how we should tune the variable? H2O -‐‑‒ the optimized HTTP server70 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 71. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Q. should we continue to use CDN? n HTTP/2 has priority control ⁃ CDN and primary website would use different TCP connection • means that priority control would not work bet. CDN and the primary website n should we better serve all the asset files from the primary website? H2O -‐‑‒ the optimized HTTP server71
  • 72. Never hide the Server header n name and version info. is essential for interoperability ⁃ many (if not all) webapps use the User-‐‑‒Agent value to evade bugs ⁃ used to be same at the HTTP/1 layer in the early days n there will be interoperability problems bet. HTTP/2 impls. ⁃ the Server header is essential for implementing workarounds n some believe that hiding the header improves security ⁃ we should speak that they are wrong; that security-‐‑‒by-‐‑‒ obscurity does not work on the Net, and hiding the value harms interoperability and the adoption of HTTP/ 2 H2O -‐‑‒ the optimized HTTP server72 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.
  • 73. Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved. Summary H2O -‐‑‒ the optimized HTTP server73
  • 74. Summary n H2O is an optimized HTTP server implementation ⁃ with neat design to support both HTTP/1 and HTTP/2 ⁃ is still very young • lots of areas to work on! • incl. improving the HTTP/2 support n help wanted! Let's write the HTTPD of the future! H2O -‐‑‒ the optimized HTTP server74 Copyright (C) 2014 DeNA Co.,Ltd. All Rights Reserved.