Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN

Update 1 (Mar 16, 2011): Apache MPM-Event benchmark added
Update 2 (Mar 16, 2011): Second run of Varnish benchmark added
Update 3 (Mar 16, 2011): Cherokee benchmark added
Update 4 (Mar 25, 2011): New benchmark with the optimized settings is available

Introduction

Apache is the de facto web server on Unix system. Nginx is nowadays a popular and performant web server for serving static files (i.e. static html pages, CSS files, Javascript files, pictures, …). On the other hand, Varnish Cache is increasingly used to make websites “fly” by caching static content in memory. Recently, I came across a new application server called G-WAN. I’m only interested here in serving static content, even if G-WAN is also able to serve dynamic content, using ANSI C scripting. Finally, I also included Cherokee in the benchmark.

Setup

The following version of the software are used for this benchmark:

  • Apache MPM-worker: 2.2.16-1ubuntu3.1 (64 bit)
  • Apache MPM-event: 2.2.16-1ubuntu3.1 (64 bit)
  • Nginx: 0.7.67-3ubuntu1 (64 bit)
  • Varnish:  2.1.3-7ubuntu0.1 (64 bit)
  • G-WAN: 2.1.20 (32 bit)
  • Cherokee: 1.2.1-1~maverick~ppa1 (64 bit)

All tests are performed on an ASUS U30JC (Intel Core i3 – 370M @ 2.4 Ghz, Hard drive 5400 rpm, Memory: 4GB DDR3 1066MHz) running Ubuntu 10.10 64 bit (kernel 2.6.35).

Benchmark setup

  • HTTP Keep-Alives: enabled
  • TCP/IP settings: OS default
  • Server settings: default
  • Concurrency: from 0 to 1’000, step 10
  • Requests: 1’000’000

The following file of 100 byte is used as static content: /var/www/100.html

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Disclaimer

Doing a correct benchmark is clearly not an easy task. There are many walls (TCP/IP stack, OS settings, the client, …) that may corrupt the results, and there is always the risk to compare apples with oranges (e.g. benchmarking the TCP/IP stack instead of the server itself).

In this benchmark, every server is tested using its default settings. The same applies for the OS. Of course, on a production environment, each setting will be optimized. This has been done in a second benchmark. If you have comments, improvements, ideas, please feel free to contact me, I’m always open to improve myself and to learn new things.

Client

The client (available here: http://gwan.ch/source/ab.c.txt) relies on ApacheBench (ab). The client as well as the web server tested are hosted on the same computer.

Apache (MPM-worker)

Configuration

Relevant part of file /etc/apache2/apache2.conf


    StartServers          2
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxClients          150
    MaxRequestsPerChild   0

Benchmark results

The benchmark took 1174 seconds in total.

Apache (MPM-event)

Configuration

Relevant part of file /etc/apache2/apache2.conf


    StartServers          2
    MaxClients          150
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit          64
    ThreadsPerChild      25
    MaxRequestsPerChild   0

Benchmark results

The benchmark took 1904 seconds in total.

Nginx

Configuration

File /etc/nginx/nginx.conf

user www-data;
worker_processes  1;
error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;
events {
    worker_connections  1024;
    # multi_accept on;
}
http {
    include       /etc/nginx/mime.types;
    access_log  /var/log/nginx/access.log;
    sendfile        on;
    #tcp_nopush     on;
    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nodelay        on;
    gzip  on;
    gzip_disable "MSIE [1-6]\.(?!.*SV1)";
    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

File /etc/nginx/sites-enabled/default

server {
        listen   80; ## listen for ipv4
        server_name  localhost;
        access_log  /var/log/nginx/localhost.access.log;
        location / {
                root   /var/www;
                index  index.html index.htm;
        }
}

Benchmark results

The benchmark took 1048 seconds in total.

Varnish

Varnish uses Nginx as backend. However, only one request every 2 minutes hits Nginx, the other requests are served directly by Varnish.

Configuration

File /etc/varnish/default.vcl

backend default {
   .host = "127.0.0.1";
   .port = "80";
}

File /etc/default/varnish

START=yes
NFILES=131072
MEMLOCK=82000
INSTANCE=$(uname -n)
DAEMON_OPTS="-a :6081 \
             -T localhost:6082 \
             -f /etc/varnish/default.vcl \
             -S /etc/varnish/secret \
             -s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"

Benchmark results

Run: 1
The benchmark took 1297 seconds in total.


Run: 2
The benchmark took 1313 seconds in total.

As some people requested more details regarding the benchmark of Varnish, here is the output of varnishstat -1:

client_conn            504664       281.31 Client connections accepted
client_drop                 0         0.00 Connection dropped, no sess/wrk
client_req           20245482     11285.11 Client requests received
cache_hit            20245471     11285.10 Cache hits
cache_hitpass               0         0.00 Cache hits for pass
cache_miss                 11         0.01 Cache misses
backend_conn               11         0.01 Backend conn. success
backend_unhealthy            0         0.00 Backend conn. not attempted
backend_busy                0         0.00 Backend conn. too many
backend_fail                0         0.00 Backend conn. failures
backend_reuse               0         0.00 Backend conn. reuses
backend_toolate            10         0.01 Backend conn. was closed
backend_recycle            11         0.01 Backend conn. recycles
backend_unused              0         0.00 Backend conn. unused
fetch_head                  0         0.00 Fetch head
fetch_length                0         0.00 Fetch with Length
fetch_chunked              11         0.01 Fetch chunked
fetch_eof                   0         0.00 Fetch EOF
fetch_bad                   0         0.00 Fetch had bad headers
fetch_close                 0         0.00 Fetch wanted close
fetch_oldhttp               0         0.00 Fetch pre HTTP/1.1 closed
fetch_zero                  0         0.00 Fetch zero len
fetch_failed                0         0.00 Fetch failed
n_sess_mem               2963          .   N struct sess_mem
n_sess                   1980          .   N struct sess
n_object                    0          .   N struct object
n_vampireobject             0          .   N unresurrected objects
n_objectcore              393          .   N struct objectcore
n_objecthead              393          .   N struct objecthead
n_smf                       2          .   N struct smf
n_smf_frag                  0          .   N small free smf
n_smf_large                 2          .   N large free smf
n_vbe_conn                  1          .   N struct vbe_conn
n_wrk                     396          .   N worker threads
n_wrk_create              500         0.28 N worker threads created
n_wrk_failed                0         0.00 N worker threads not created
n_wrk_max              118979        66.32 N worker threads limited
n_wrk_queue                 0         0.00 N queued work requests
n_wrk_overflow         133755        74.56 N overflowed work requests
n_wrk_drop                  0         0.00 N dropped work requests
n_backend                   1          .   N backends
n_expired                  11          .   N expired objects
n_lru_nuked                 0          .   N LRU nuked objects
n_lru_saved                 0          .   N LRU saved objects
n_lru_moved               557          .   N LRU moved objects
n_deathrow                  0          .   N objects on deathrow
losthdr                  7470         4.16 HTTP header overflows
n_objsendfile               0         0.00 Objects sent with sendfile
n_objwrite           20215571     11268.43 Objects sent with write
n_objoverflow               0         0.00 Objects overflowing workspace
s_sess                 504664       281.31 Total Sessions
s_req                20245482     11285.11 Total Requests
s_pipe                      0         0.00 Total pipe
s_pass                      0         0.00 Total pass
s_fetch                    11         0.01 Total fetch
s_hdrbytes         5913383706   3296200.51 Total header bytes
s_bodybytes         526382532    293412.78 Total body bytes
sess_closed            382711       213.33 Session Closed
sess_pipeline               0         0.00 Session Pipeline
sess_readahead              0         0.00 Session Read Ahead
sess_linger          20245482     11285.11 Session Linger
sess_herd              124222        69.24 Session herd
shm_records         689986796    384608.02 SHM records
shm_writes           21885539     12199.30 SHM writes
shm_flushes                 0         0.00 SHM flushes due to overflow
shm_cont               282730       157.60 SHM MTX contention
shm_cycles                200         0.11 SHM cycles through buffer
sm_nreq                    22         0.01 allocator requests
sm_nobj                     0          .   outstanding allocations
sm_balloc                   0          .   bytes allocated
sm_bfree           1073741824          .   bytes free
sma_nreq                    0         0.00 SMA allocator requests
sma_nobj                    0          .   SMA outstanding allocations
sma_nbytes                  0          .   SMA outstanding bytes
sma_balloc                  0          .   SMA bytes allocated
sma_bfree                   0          .   SMA bytes free
sms_nreq                    0         0.00 SMS allocator requests
sms_nobj                    0          .   SMS outstanding allocations
sms_nbytes                  0          .   SMS outstanding bytes
sms_balloc                  0          .   SMS bytes allocated
sms_bfree                   0          .   SMS bytes freed
backend_req                11         0.01 Backend requests made
n_vcl                       1         0.00 N vcl total
n_vcl_avail                 1         0.00 N vcl available
n_vcl_discard               0         0.00 N vcl discarded
n_purge                     1          .   N total active purges
n_purge_add                 1         0.00 N new purges added
n_purge_retire              0         0.00 N old purges deleted
n_purge_obj_test            0         0.00 N objects tested
n_purge_re_test             0         0.00 N regexps tested against
n_purge_dups                0         0.00 N duplicate purges removed
hcb_nolock           20219699     11270.74 HCB Lookups without lock
hcb_lock                    1         0.00 HCB Lookups with lock
hcb_insert                  1         0.00 HCB Inserts
esi_parse                   0         0.00 Objects ESI parsed (unlock)
esi_errors                  0         0.00 ESI parse errors (unlock)
accept_fail                 0         0.00 Accept failures
client_drop_late            0         0.00 Connection dropped late
uptime                   1794         1.00 Client uptime

G-WAN

Configuration

The configuration of G-WAN is done through the file hierarchy. Therefore, unzipping the G-WAN archive was enough to have a fully working server.

Benchmark results

The benchmark took 607 seconds in total.

Cherokee

Configuration

Relevant part of file /etc/cherokee/cherokee.conf

# Server
#
server!bind!1!port = 80
server!timeout = 15
server!keepalive = 1
server!keepalive_max_requests = 500
server!server_tokens = full
server!panic_action = /usr/share/cherokee/cherokee-panic
server!pid_file = /var/run/cherokee.pid
server!user = www-data
server!group = www-data

# Default virtual server
#
vserver!1!nick = default
vserver!1!document_root = /var/www
vserver!1!directory_index = index.html

Benchmark results

The benchmark took 1068 seconds in total.

Discussion

Let’s now compare the minimum, the average and the maximum requests per second rate of each server.

Minimum RPS

Average RPS

Maximum RPS

Conclusion

G-WAN is the clear winner of this benchmark, while Nginx and Varnish have simliar average performance. It’s not a real surprise to see Apache at the last position.

  • G-WAN can serve 2.25 times more requests per second on average compared to Cherokee, from 4.25 to 6.5 times compared to Nginx and Varnish, and from 9 to 13.5 times more than Apache.
  • Nginx / Varnish can serve 2.1 times more requests per second on average compared to Apache.
  • Nginx needs 1.73 more time to serve the same amount of requests compared to G-WAN.
  • Varnish needs 2.14 more time to serve the same amount of requests compared to G-WAN.
  • Apache needs 1.93 more time to serve a similar amount of requests compared to G-WAN (i.e. Apache sometimes replied with an error 503 and didn’t serve the exact same amount of requests).

Again, keep in mind that this benchmark compares only the servers with their out of the box settings locally (no networking is involved), and therefore the results might be misleading.

34 thoughts on “Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN

  1. Try Apache’s MPM Event. It’s production-ready (read the mailing lists) and closer to Nginx.

    • Yes, It was already planned. In this first benchmark, I considered only the “out of the box” choices. On Ubuntu, if you install Apache2, by default it installs the MPM worker version. Thanks for your comment.

      (EDIT) done !

  2. Are you actually sure there were no (set-)cookies involved ?

    Your numbers looked like varnish didn’t cache at all…

    (By default, out of the box, varnish won’t cache anything with cookies, since we cannot possibly know what they mean or do)

    Poul-Henning

    • The upstream server (i.e Nginx) received only 1 request every 120 seconds (which is the default_ttl of Varnish). Moreover, I checked with varnishlog that the requests hit the cache. So, I can safely assume that Varnish actually served the requests.

      As I’m about to do a second serie of benchmarks where the server are optimized, can you please give me some advice or pointers to optimze the configuration of Varnish ? Thanks a lot !

  3. This looks plain wrong. I have tested Varnish to great extent — and nginx — and this does not match any reality I’m familiar with. Given the tiny difference between nginx, Apache and Varnish in that result, it’s very hard to believe that this isn’t a bogus test and that some elementary mistake in the testing process has been made.

    Given the claims made, I’m inclined to ask for output of varnishstat -1, top and netstat -n.

    Take a look at http://kristianlyng.wordpress.com/2010/10/23/275k-req/ for comparison. The limit at that point was the bandwidth. This does not at all match what you are presenting.

    • Hi, thanks for your comment. I’ve redone the Varnish benchmark and provided the output of “varnishstat -1” after the benchmark.

      The output of “top” is looks like this throughout the benchmark:
      13060 nobody 20 0 5365m 113m 80m S 153 3.0 25:42.53 varnishd
      15962 nico 20 0 43320 8648 1668 R 12 0.2 0:00.06 ab

      Regarding “netstat -n”:
      Active Internet connections (w/o servers)
      Proto Recv-Q Send-Q Local Address Foreign Address State
      tcp 0 0 127.0.0.1:47469 127.0.0.1:6081 ESTABLISHED
      tcp 318 0 127.0.0.1:47720 127.0.0.1:6081 ESTABLISHED
      tcp 0 318 127.0.0.1:6081 127.0.0.1:47666 ESTABLISHED
      tcp 0 318 127.0.0.1:6081 127.0.0.1:47694 ESTABLISHED
      tcp 318 0 127.0.0.1:47734 127.0.0.1:6081 ESTABLISHED
      […]
      The output of “netstat -n | grep ESTABLISHED | wc -l” is always at least 2 times greater than the number of concurrent clients throughout the benchmark.

      Keep in mind that I’m testing locally the default setting of each server (without any optimization, such as tuning the thread_pools in the case of Varnish) on a laptop with a relatively slow CPU (Intel Core i3 – 370M @ 2.4 Ghz).

      Can you please point me the elementary mistake ? I’ll fix it, and redo all the tests. Thanks again for your help !

      • I’m certainly not a varnish expert but maybe try benchmarking with memory/malloc caches instead of purely disk based caches.

        Also you are testing on the same computer as the servers are running? Generally it’s best to test on a separate machine so that your benchmarks are not restricting the software you are testing.

        • I ended up trying to replicate your test on a virtual server we have setup to benchmark Varnish. I created the same 100.html file on our production web site (Varnish is configured to pass-thru to our normal site for load time comparison). This virtual server is configured with a 1G malloc store and I ran the same benchmark script with a slight change to shorten the run:
          #define FROM 800
          #define TO 1000
          #define STEP 10
          #define ITER 10 /
          From a geographically distance data center: min:102648 avg:145234 max:170035 Time:218 second(s) [00:03:38]
          From the same data center: min:175157 avg:187689 max:194601 Time:232 second(s) [00:03:52]

          Based on my results I am thinking something is happening on your Varnish benchmark that is invalidating the results. Maybe your hard drive is the limiting factor.

          • Hi, thanks for your input !

            Please have a look at my second benchmark, where Varnish is configured with malloc and tmpfs:

            Serving small static files: which server to use ?

            Actually, I was also a bit disappointed with the poor performance of Varnish. I’ve tried many different setups, and I still get the same behavior. However, I cannot exclude that something is biaised with my settings,

    • Thanks for sharing 🙂 I’m always happy to learn new things. So, as your are an expert in benchmarking, could you please come up with a “benchmark suite” (i.e. reproducible test steps) or a recipe to compare in a fair manner those servers (Again, I’m only interested in serving static content) ? It would be interesting for many people to have a proper and “unified” way to benchmark web servers …

  4. Thanks for posting the varnish stats. From those it’s quite clear that you are running out of threads. Also, since this is a pure memory workload you should use the malloc allocator rather than the file one.

    As for testing tools, I’d recommend using httperf over ab. Taking a look at http://www.web-polygraph.org/ might also be useful. It also looks like you’re sending a massive amount of requests per connection (about 40), if you want something a bit more realistic you should limit the number of resources fetched over a single connection to about five (which is what we’re seeing on live traffic).

    • Hi,

      As explained in the setup, I didn’t tune the settings of the different servers. And clearly, Varnish can do much better than that !! This is left for a second benchmark.
      So, regarding the tuning of Varnish, except the threadpools and the allocator, what else can I optimize ?
      I will also consider using httperf as you mentionned.
      Thanks a lot for your helpful comment !

  5. It should be a good start, at least. While you can certainly tune it further, that requires a bit of experience and isn’t merely adding a go-faster option, it depends on how your test is set up. For the next test, it’d be useful if you ran the settings and the numbers you’re getting past the developers of the various projects, as that’ll help pick up any errors or weird configuration settings that impact performance.

    One thing you should probably do is make /var/lib/varnish a tmpfs, since your disk is quite slow and varnish logs a fair bit to the shared memory log.

  6. Hi, nice benchmark. I have some suggestions, they are:
    1. How about show the respon time in a line chart and compare it too? So we can compare the response time in every concurrency.
    2. How about measure CPU usage (kernel CPU usage and server CPU usage) in every concurrency? This will be useful for someone who use VPS to deploy their web server.
    3. How about measure memory usage in every concurrency? The motivation is the same with #2

    • I’m doing some tests with Funkload, which provides all the interesting metrics you mentionned (response time, CPU/memory/network usage, …). However, Funkload saturates way before the tested server. So, I need to put in place a distributed setup with several clients.

  7. What about memory usage?

    In the past I’ve found Nginx to be much more memory efficient than Varnish.

    Or how about a test that runs for 1 hour so count failures.

    • Nginx (a Web server) and Varnish (a Proxy server) serve different needs, so there is no surprise that their design is different – and this leads to different choices (Varnish is heavily relying on virtual memory because – as a “Web server accelerator” – it has the goal of storing a HUGE cache).

      As I understand it, the point of this benchmark was to focus on the respective ability of those different server technologies to serve a small static file.

      As small static files (< 100 KB) account for 90% of all the traffic served by today's Web infrastructure, I am thankful that someone had the idea of checking what solution works best for this specific need.

    • Hi,
      Great idea ! Thanks, I didn’t think about this one. I will also probably include Lighttpd in the next serie, as it seems to be even faster than Nginx.

  8. Pingback: Serving small static files: which server to use ? « Spoot!

  9. It would be interesting to test the _tuned_ settings
    In fact, no one is going to run them at 35 000 requests per second on the base settings, you know.

    People who just start their own web server wont need more than 20 req/s at best.

    What we’re interested in, is more the complete picture when servers are optimized. It’s a difficult task of course, in fact, probably more of a process that you fine tune as you go, but it would be a LOT more interesting

    • All fair points zob but the fact that certain software is amazing out-of-the-box usually implies that it’s good stuff and when *that* gets tuned it can really perform.

      Also, having fast and reliable servers does matter even if your traffic is low because it enhances the user experience.

  10. Pingback: printf(" SaltwaterC "); » Blog Archive » Why I don’t benchmark HTTP static object serving

  11. Pingback: Anonymous

  12. How about Lighttpd? I’m wondering how well it would perform against the others. It’s my personal goto choice anyway but I expect it to be slightly slower than Nginx. Actually a simple feedback on people’s experience would be fine as well.

  13. These results are interesting, but practically useless.

    While it is true we don’t want to end up “testing the TCP stack”, one cannot benchmark a bevy of webservers from the same server and expect meaningful results.

    First and foremost, your benchmarking scripts/tools are competing for resources with said webserver processes. This skews the results and causes weird “bumps” in the graphs.

    Secondly, in practical applications, users will ALWAYS come over the network, so it is completely necessary to test from one or several satellite servers which hit the server being benchmarked over the network.

    I would like to see this benchmark redone from satellite servers and a notation made of the differences between local testing.

  14. It would be great, if you included node.js (with connect and static serving + chaching) in the benchmark.

Comments are closed.