Update 1 (Mar 16, 2011): Apache MPM-Event benchmark added
Update 2 (Mar 16, 2011): Second run of Varnish benchmark added
Update 3 (Mar 16, 2011): Cherokee benchmark added
Update 4 (Mar 25, 2011): New benchmark with the optimized settings is available
Introduction
Apache is the de facto web server on Unix system. Nginx is nowadays a popular and performant web server for serving static files (i.e. static html pages, CSS files, Javascript files, pictures, …). On the other hand, Varnish Cache is increasingly used to make websites “fly” by caching static content in memory. Recently, I came across a new application server called G-WAN. I’m only interested here in serving static content, even if G-WAN is also able to serve dynamic content, using ANSI C scripting. Finally, I also included Cherokee in the benchmark.
Setup
The following version of the software are used for this benchmark:
- Apache MPM-worker: 2.2.16-1ubuntu3.1 (64 bit)
- Apache MPM-event: 2.2.16-1ubuntu3.1 (64 bit)
- Nginx: 0.7.67-3ubuntu1 (64 bit)
- Varnish: 2.1.3-7ubuntu0.1 (64 bit)
- G-WAN: 2.1.20 (32 bit)
- Cherokee: 1.2.1-1~maverick~ppa1 (64 bit)
All tests are performed on an ASUS U30JC (Intel Core i3 – 370M @ 2.4 Ghz, Hard drive 5400 rpm, Memory: 4GB DDR3 1066MHz) running Ubuntu 10.10 64 bit (kernel 2.6.35).
Benchmark setup
- HTTP Keep-Alives: enabled
- TCP/IP settings: OS default
- Server settings: default
- Concurrency: from 0 to 1’000, step 10
- Requests: 1’000’000
The following file of 100 byte is used as static content: /var/www/100.html
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Disclaimer
Doing a correct benchmark is clearly not an easy task. There are many walls (TCP/IP stack, OS settings, the client, …) that may corrupt the results, and there is always the risk to compare apples with oranges (e.g. benchmarking the TCP/IP stack instead of the server itself).
In this benchmark, every server is tested using its default settings. The same applies for the OS. Of course, on a production environment, each setting will be optimized. This has been done in a second benchmark. If you have comments, improvements, ideas, please feel free to contact me, I’m always open to improve myself and to learn new things.
Client
The client (available here: http://gwan.ch/source/ab.c.txt) relies on ApacheBench (ab). The client as well as the web server tested are hosted on the same computer.
Apache (MPM-worker)
Configuration
Relevant part of file /etc/apache2/apache2.conf
StartServers 2 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxClients 150 MaxRequestsPerChild 0
Benchmark results
The benchmark took 1174 seconds in total.
Apache (MPM-event)
Configuration
Relevant part of file /etc/apache2/apache2.conf
StartServers 2 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 75 ThreadLimit 64 ThreadsPerChild 25 MaxRequestsPerChild 0
Benchmark results
The benchmark took 1904 seconds in total.
Nginx
Configuration
File /etc/nginx/nginx.conf
user www-data; worker_processes 1; error_log /var/log/nginx/error.log; pid /var/run/nginx.pid; events { worker_connections 1024; # multi_accept on; } http { include /etc/nginx/mime.types; access_log /var/log/nginx/access.log; sendfile on; #tcp_nopush on; #keepalive_timeout 0; keepalive_timeout 65; tcp_nodelay on; gzip on; gzip_disable "MSIE [1-6]\.(?!.*SV1)"; include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; }
File /etc/nginx/sites-enabled/default
server { listen 80; ## listen for ipv4 server_name localhost; access_log /var/log/nginx/localhost.access.log; location / { root /var/www; index index.html index.htm; } }
Benchmark results
The benchmark took 1048 seconds in total.
Varnish
Varnish uses Nginx as backend. However, only one request every 2 minutes hits Nginx, the other requests are served directly by Varnish.
Configuration
File /etc/varnish/default.vcl
backend default { .host = "127.0.0.1"; .port = "80"; }
File /etc/default/varnish
START=yes NFILES=131072 MEMLOCK=82000 INSTANCE=$(uname -n) DAEMON_OPTS="-a :6081 \ -T localhost:6082 \ -f /etc/varnish/default.vcl \ -S /etc/varnish/secret \ -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"
Benchmark results
Run: 1
The benchmark took 1297 seconds in total.
Run: 2
The benchmark took 1313 seconds in total.
As some people requested more details regarding the benchmark of Varnish, here is the output of varnishstat -1:
client_conn 504664 281.31 Client connections accepted client_drop 0 0.00 Connection dropped, no sess/wrk client_req 20245482 11285.11 Client requests received cache_hit 20245471 11285.10 Cache hits cache_hitpass 0 0.00 Cache hits for pass cache_miss 11 0.01 Cache misses backend_conn 11 0.01 Backend conn. success backend_unhealthy 0 0.00 Backend conn. not attempted backend_busy 0 0.00 Backend conn. too many backend_fail 0 0.00 Backend conn. failures backend_reuse 0 0.00 Backend conn. reuses backend_toolate 10 0.01 Backend conn. was closed backend_recycle 11 0.01 Backend conn. recycles backend_unused 0 0.00 Backend conn. unused fetch_head 0 0.00 Fetch head fetch_length 0 0.00 Fetch with Length fetch_chunked 11 0.01 Fetch chunked fetch_eof 0 0.00 Fetch EOF fetch_bad 0 0.00 Fetch had bad headers fetch_close 0 0.00 Fetch wanted close fetch_oldhttp 0 0.00 Fetch pre HTTP/1.1 closed fetch_zero 0 0.00 Fetch zero len fetch_failed 0 0.00 Fetch failed n_sess_mem 2963 . N struct sess_mem n_sess 1980 . N struct sess n_object 0 . N struct object n_vampireobject 0 . N unresurrected objects n_objectcore 393 . N struct objectcore n_objecthead 393 . N struct objecthead n_smf 2 . N struct smf n_smf_frag 0 . N small free smf n_smf_large 2 . N large free smf n_vbe_conn 1 . N struct vbe_conn n_wrk 396 . N worker threads n_wrk_create 500 0.28 N worker threads created n_wrk_failed 0 0.00 N worker threads not created n_wrk_max 118979 66.32 N worker threads limited n_wrk_queue 0 0.00 N queued work requests n_wrk_overflow 133755 74.56 N overflowed work requests n_wrk_drop 0 0.00 N dropped work requests n_backend 1 . N backends n_expired 11 . N expired objects n_lru_nuked 0 . N LRU nuked objects n_lru_saved 0 . N LRU saved objects n_lru_moved 557 . N LRU moved objects n_deathrow 0 . N objects on deathrow losthdr 7470 4.16 HTTP header overflows n_objsendfile 0 0.00 Objects sent with sendfile n_objwrite 20215571 11268.43 Objects sent with write n_objoverflow 0 0.00 Objects overflowing workspace s_sess 504664 281.31 Total Sessions s_req 20245482 11285.11 Total Requests s_pipe 0 0.00 Total pipe s_pass 0 0.00 Total pass s_fetch 11 0.01 Total fetch s_hdrbytes 5913383706 3296200.51 Total header bytes s_bodybytes 526382532 293412.78 Total body bytes sess_closed 382711 213.33 Session Closed sess_pipeline 0 0.00 Session Pipeline sess_readahead 0 0.00 Session Read Ahead sess_linger 20245482 11285.11 Session Linger sess_herd 124222 69.24 Session herd shm_records 689986796 384608.02 SHM records shm_writes 21885539 12199.30 SHM writes shm_flushes 0 0.00 SHM flushes due to overflow shm_cont 282730 157.60 SHM MTX contention shm_cycles 200 0.11 SHM cycles through buffer sm_nreq 22 0.01 allocator requests sm_nobj 0 . outstanding allocations sm_balloc 0 . bytes allocated sm_bfree 1073741824 . bytes free sma_nreq 0 0.00 SMA allocator requests sma_nobj 0 . SMA outstanding allocations sma_nbytes 0 . SMA outstanding bytes sma_balloc 0 . SMA bytes allocated sma_bfree 0 . SMA bytes free sms_nreq 0 0.00 SMS allocator requests sms_nobj 0 . SMS outstanding allocations sms_nbytes 0 . SMS outstanding bytes sms_balloc 0 . SMS bytes allocated sms_bfree 0 . SMS bytes freed backend_req 11 0.01 Backend requests made n_vcl 1 0.00 N vcl total n_vcl_avail 1 0.00 N vcl available n_vcl_discard 0 0.00 N vcl discarded n_purge 1 . N total active purges n_purge_add 1 0.00 N new purges added n_purge_retire 0 0.00 N old purges deleted n_purge_obj_test 0 0.00 N objects tested n_purge_re_test 0 0.00 N regexps tested against n_purge_dups 0 0.00 N duplicate purges removed hcb_nolock 20219699 11270.74 HCB Lookups without lock hcb_lock 1 0.00 HCB Lookups with lock hcb_insert 1 0.00 HCB Inserts esi_parse 0 0.00 Objects ESI parsed (unlock) esi_errors 0 0.00 ESI parse errors (unlock) accept_fail 0 0.00 Accept failures client_drop_late 0 0.00 Connection dropped late uptime 1794 1.00 Client uptime
G-WAN
Configuration
The configuration of G-WAN is done through the file hierarchy. Therefore, unzipping the G-WAN archive was enough to have a fully working server.
Benchmark results
The benchmark took 607 seconds in total.
Cherokee
Configuration
Relevant part of file /etc/cherokee/cherokee.conf
# Server # server!bind!1!port = 80 server!timeout = 15 server!keepalive = 1 server!keepalive_max_requests = 500 server!server_tokens = full server!panic_action = /usr/share/cherokee/cherokee-panic server!pid_file = /var/run/cherokee.pid server!user = www-data server!group = www-data # Default virtual server # vserver!1!nick = default vserver!1!document_root = /var/www vserver!1!directory_index = index.html
Benchmark results
The benchmark took 1068 seconds in total.
Discussion
Let’s now compare the minimum, the average and the maximum requests per second rate of each server.
Minimum RPS
Average RPS
Maximum RPS
Conclusion
G-WAN is the clear winner of this benchmark, while Nginx and Varnish have simliar average performance. It’s not a real surprise to see Apache at the last position.
- G-WAN can serve 2.25 times more requests per second on average compared to Cherokee, from 4.25 to 6.5 times compared to Nginx and Varnish, and from 9 to 13.5 times more than Apache.
- Nginx / Varnish can serve 2.1 times more requests per second on average compared to Apache.
- Nginx needs 1.73 more time to serve the same amount of requests compared to G-WAN.
- Varnish needs 2.14 more time to serve the same amount of requests compared to G-WAN.
- Apache needs 1.93 more time to serve a similar amount of requests compared to G-WAN (i.e. Apache sometimes replied with an error 503 and didn’t serve the exact same amount of requests).
Again, keep in mind that this benchmark compares only the servers with their out of the box settings locally (no networking is involved), and therefore the results might be misleading.
What about Cherokee?
http://www.cherokee-project.com/
I will include it in my next benchmark. Thanks !
(EDIT) done !
Try Apache’s MPM Event. It’s production-ready (read the mailing lists) and closer to Nginx.
Yes, It was already planned. In this first benchmark, I considered only the “out of the box” choices. On Ubuntu, if you install Apache2, by default it installs the MPM worker version. Thanks for your comment.
(EDIT) done !
Are you actually sure there were no (set-)cookies involved ?
Your numbers looked like varnish didn’t cache at all…
(By default, out of the box, varnish won’t cache anything with cookies, since we cannot possibly know what they mean or do)
Poul-Henning
The upstream server (i.e Nginx) received only 1 request every 120 seconds (which is the default_ttl of Varnish). Moreover, I checked with varnishlog that the requests hit the cache. So, I can safely assume that Varnish actually served the requests.
As I’m about to do a second serie of benchmarks where the server are optimized, can you please give me some advice or pointers to optimze the configuration of Varnish ? Thanks a lot !
This looks plain wrong. I have tested Varnish to great extent — and nginx — and this does not match any reality I’m familiar with. Given the tiny difference between nginx, Apache and Varnish in that result, it’s very hard to believe that this isn’t a bogus test and that some elementary mistake in the testing process has been made.
Given the claims made, I’m inclined to ask for output of varnishstat -1, top and netstat -n.
Take a look at http://kristianlyng.wordpress.com/2010/10/23/275k-req/ for comparison. The limit at that point was the bandwidth. This does not at all match what you are presenting.
Hi, thanks for your comment. I’ve redone the Varnish benchmark and provided the output of “varnishstat -1” after the benchmark.
The output of “top” is looks like this throughout the benchmark:
13060 nobody 20 0 5365m 113m 80m S 153 3.0 25:42.53 varnishd
15962 nico 20 0 43320 8648 1668 R 12 0.2 0:00.06 ab
Regarding “netstat -n”:
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 127.0.0.1:47469 127.0.0.1:6081 ESTABLISHED
tcp 318 0 127.0.0.1:47720 127.0.0.1:6081 ESTABLISHED
tcp 0 318 127.0.0.1:6081 127.0.0.1:47666 ESTABLISHED
tcp 0 318 127.0.0.1:6081 127.0.0.1:47694 ESTABLISHED
tcp 318 0 127.0.0.1:47734 127.0.0.1:6081 ESTABLISHED
[…]
The output of “netstat -n | grep ESTABLISHED | wc -l” is always at least 2 times greater than the number of concurrent clients throughout the benchmark.
Keep in mind that I’m testing locally the default setting of each server (without any optimization, such as tuning the thread_pools in the case of Varnish) on a laptop with a relatively slow CPU (Intel Core i3 – 370M @ 2.4 Ghz).
Can you please point me the elementary mistake ? I’ll fix it, and redo all the tests. Thanks again for your help !
I’m certainly not a varnish expert but maybe try benchmarking with memory/malloc caches instead of purely disk based caches.
Also you are testing on the same computer as the servers are running? Generally it’s best to test on a separate machine so that your benchmarks are not restricting the software you are testing.
I ended up trying to replicate your test on a virtual server we have setup to benchmark Varnish. I created the same 100.html file on our production web site (Varnish is configured to pass-thru to our normal site for load time comparison). This virtual server is configured with a 1G malloc store and I ran the same benchmark script with a slight change to shorten the run:
#define FROM 800
#define TO 1000
#define STEP 10
#define ITER 10 /
From a geographically distance data center: min:102648 avg:145234 max:170035 Time:218 second(s) [00:03:38]
From the same data center: min:175157 avg:187689 max:194601 Time:232 second(s) [00:03:52]
Based on my results I am thinking something is happening on your Varnish benchmark that is invalidating the results. Maybe your hard drive is the limiting factor.
Hi, thanks for your input !
Please have a look at my second benchmark, where Varnish is configured with malloc and tmpfs:
Actually, I was also a bit disappointed with the poor performance of Varnish. I’ve tried many different setups, and I still get the same behavior. However, I cannot exclude that something is biaised with my settings,
Hate to be a killjoy, but: http://kristianlyng.wordpress.com/2011/03/16/the-many-pitfalls-of-benchmarking/
Essentially: after testing this myself, I was able to generate pretty much any result I wanted. This test needs a lot of work when it’s presented as a comparison. The only way I got results similar to yours was if the test too was failing due to the different performance patterns.
Thanks for sharing 🙂 I’m always happy to learn new things. So, as your are an expert in benchmarking, could you please come up with a “benchmark suite” (i.e. reproducible test steps) or a recipe to compare in a fair manner those servers (Again, I’m only interested in serving static content) ? It would be interesting for many people to have a proper and “unified” way to benchmark web servers …
Great article write up. Have you had a look at benchmarking another light weight web server called Monkey HTTP Daemon -> http://monkey-project.com/
Thanks for posting the varnish stats. From those it’s quite clear that you are running out of threads. Also, since this is a pure memory workload you should use the malloc allocator rather than the file one.
As for testing tools, I’d recommend using httperf over ab. Taking a look at http://www.web-polygraph.org/ might also be useful. It also looks like you’re sending a massive amount of requests per connection (about 40), if you want something a bit more realistic you should limit the number of resources fetched over a single connection to about five (which is what we’re seeing on live traffic).
Hi,
As explained in the setup, I didn’t tune the settings of the different servers. And clearly, Varnish can do much better than that !! This is left for a second benchmark.
So, regarding the tuning of Varnish, except the threadpools and the allocator, what else can I optimize ?
I will also consider using httperf as you mentionned.
Thanks a lot for your helpful comment !
It should be a good start, at least. While you can certainly tune it further, that requires a bit of experience and isn’t merely adding a go-faster option, it depends on how your test is set up. For the next test, it’d be useful if you ran the settings and the numbers you’re getting past the developers of the various projects, as that’ll help pick up any errors or weird configuration settings that impact performance.
One thing you should probably do is make /var/lib/varnish a tmpfs, since your disk is quite slow and varnish logs a fair bit to the shared memory log.
Hi, nice benchmark. I have some suggestions, they are:
1. How about show the respon time in a line chart and compare it too? So we can compare the response time in every concurrency.
2. How about measure CPU usage (kernel CPU usage and server CPU usage) in every concurrency? This will be useful for someone who use VPS to deploy their web server.
3. How about measure memory usage in every concurrency? The motivation is the same with #2
I’m doing some tests with Funkload, which provides all the interesting metrics you mentionned (response time, CPU/memory/network usage, …). However, Funkload saturates way before the tested server. So, I need to put in place a distributed setup with several clients.
What about memory usage?
In the past I’ve found Nginx to be much more memory efficient than Varnish.
Or how about a test that runs for 1 hour so count failures.
Nginx (a Web server) and Varnish (a Proxy server) serve different needs, so there is no surprise that their design is different – and this leads to different choices (Varnish is heavily relying on virtual memory because – as a “Web server accelerator” – it has the goal of storing a HUGE cache).
As I understand it, the point of this benchmark was to focus on the respective ability of those different server technologies to serve a small static file.
As small static files (< 100 KB) account for 90% of all the traffic served by today's Web infrastructure, I am thankful that someone had the idea of checking what solution works best for this specific need.
Hi Nicolas,
Great post! Did you test Apache Traffic Server? It’s the cache server open sourced by yahoo…
Hi,
Great idea ! Thanks, I didn’t think about this one. I will also probably include Lighttpd in the next serie, as it seems to be even faster than Nginx.
Pingback: Serving small static files: which server to use ? « Spoot!
Interesting comparison. Wish list: response times, IIS on Windows 2003, Apache on Mac OS X (this is highly unlikely to be fulfilled using your current hardware).
It would be interesting to test the _tuned_ settings
In fact, no one is going to run them at 35 000 requests per second on the base settings, you know.
People who just start their own web server wont need more than 20 req/s at best.
What we’re interested in, is more the complete picture when servers are optimized. It’s a difficult task of course, in fact, probably more of a process that you fine tune as you go, but it would be a LOT more interesting
All fair points zob but the fact that certain software is amazing out-of-the-box usually implies that it’s good stuff and when *that* gets tuned it can really perform.
Also, having fast and reliable servers does matter even if your traffic is low because it enhances the user experience.
Pingback: printf(" SaltwaterC "); » Blog Archive » Why I don’t benchmark HTTP static object serving
Pingback: Anonymous
How about Lighttpd? I’m wondering how well it would perform against the others. It’s my personal goto choice anyway but I expect it to be slightly slower than Nginx. Actually a simple feedback on people’s experience would be fine as well.
Hum, I just found this: http://superjared.com/entry/benching-lighttpd-vs-nginx-static-files/
This is admittedly dated but it tends to indicate that Lighttpd actually outperforms Nginx for serving static files. It looks like we have a new potential contender don’t we? 🙂
Hi,
Please have a look at my second benchmark where Lighttpd is included:
These results are interesting, but practically useless.
While it is true we don’t want to end up “testing the TCP stack”, one cannot benchmark a bevy of webservers from the same server and expect meaningful results.
First and foremost, your benchmarking scripts/tools are competing for resources with said webserver processes. This skews the results and causes weird “bumps” in the graphs.
Secondly, in practical applications, users will ALWAYS come over the network, so it is completely necessary to test from one or several satellite servers which hit the server being benchmarked over the network.
I would like to see this benchmark redone from satellite servers and a notation made of the differences between local testing.
It would be great, if you included node.js (with connect and static serving + chaching) in the benchmark.