Wednesday, January 19, 2011

How can I optimize nginx? From benchmarking it seems Apache2 is faster for static delivery

On one of my vps servers I've setup both Apache2 and nginx, nginx on port 8080 and Apache2 on 80, and have created a static HTML file.

static HTML/Apache2:

meder@meder-desktop:~$ sudo ab -n 1000 -c 5 http://medero.org/index.html
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking medero.org (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        Apache/2.2.9
Server Hostname:        medero.org
Server Port:            80

Document Path:          /index.html
Document Length:        1014 bytes

Concurrency Level:      5
Time taken for tests:   6.186 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      1334000 bytes
HTML transferred:       1014000 bytes
Requests per second:    161.67 [#/sec] (mean)
Time per request:       30.928 [ms] (mean)
Time per request:       6.186 [ms] (mean, across all concurrent requests)
Transfer rate:          210.61 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       12   15   2.1     14      35
Processing:    12   16   3.4     15      48
Waiting:       12   16   2.8     15      37
Total:         25   31   4.3     29      63

Percentage of the requests served within a certain time (ms)
  50%     29
  66%     30
  75%     31
  80%     32
  90%     35
  95%     39
  98%     47
  99%     51
 100%     63 (longest request)

static HTML/Nginx:

meder@meder-desktop:~$ sudo ab -n 1000 -c 5 http://medero.org:8080/index.html
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking medero.org (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests


Server Software:        nginx/0.6.32
Server Hostname:        medero.org
Server Port:            8080

Document Path:          /index.html
Document Length:        1014 bytes

Concurrency Level:      5
Time taken for tests:   6.424 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      1226000 bytes
HTML transferred:       1014000 bytes
Requests per second:    155.67 [#/sec] (mean)
Time per request:       32.119 [ms] (mean)
Time per request:       6.424 [ms] (mean, across all concurrent requests)
Transfer rate:          186.38 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:       13   15   2.5     14      36
Processing:    12   17  11.0     15     184
Waiting:       11   16   9.6     14     171
Total:         25   32  11.4     30     200

Percentage of the requests served within a certain time (ms)
  50%     30
  66%     31
  75%     33
  80%     33
  90%     35
  95%     38
  98%     45
  99%     50
 100%    200 (longest request)

I've done this numerous times and the results are pretty much the same, with Apache2 taking less time to process than Nginx.

Here's the config for nginx:

user www-data;
worker_processes  4;

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
}

http {
    include       /etc/nginx/mime.types;
    default_type  application/octet-stream;

    access_log  /var/log/nginx/access.log;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nodelay        on;

    gzip  on;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}

And Apache 2.2.9-10 (prefork - nonthreaded):

MaxKeepAliveRequests 100
KeepAliveTimeout 15
<IfModule mpm_prefork_module>
    StartServers          5
    MinSpareServers       5
    MaxSpareServers      10
    MaxClients          150
    MaxRequestsPerChild   0
</IfModule>

Loaded modules:

meder@host:~$ sudo apache2ctl -t -D DUMP_MODULES
Loaded Modules:
 core_module (static)
 log_config_module (static)
 logio_module (static)
 mpm_prefork_module (static)
 http_module (static)
 so_module (static)
 alias_module (shared)
 auth_basic_module (shared)
 authn_file_module (shared)
 authz_default_module (shared)
 authz_groupfile_module (shared)
 authz_host_module (shared)
 authz_user_module (shared)
 autoindex_module (shared)
 cgi_module (shared)
 dir_module (shared)
 env_module (shared)
 mime_module (shared)
 negotiation_module (shared)
 php5_module (shared)
 rewrite_module (shared)
 setenvif_module (shared)
 status_module (shared)
 wsgi_module (shared)
Syntax OK

Server details:

Debian Lenny 5.0.3
32-bit Unmanaged VPS
384MB Ram

processor   : 7
vendor_id   : GenuineIntel
cpu family  : 6
model       : 23
model name  : Intel(R) Xeon(R) CPU           E5405  @ 2.00GHz
stepping    : 6
cpu MHz     : 1995.006
cache size  : 6144 KB
physical id : 1
siblings    : 4
core id     : 3
cpu cores   : 4
apicid      : 7
fpu     : yes
fpu_exception   : yes
cpuid level : 10
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm
bogomips    : 3990.03
clflush size    : 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

Memory info:

cat /proc/meminfo 
MemTotal:       393216 kB
MemFree:        304828 kB
Buffers:             0 kB
Cached:              0 kB
SwapCached:          0 kB
Active:              0 kB
Inactive:            0 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       393216 kB
LowFree:        304828 kB
SwapTotal:           0 kB
SwapFree:            0 kB
Dirty:               0 kB
Writeback:           0 kB
AnonPages:           0 kB
Mapped:          88388 kB
Slab:                0 kB
PageTables:          0 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:         0 kB
Committed_AS:   354892 kB
VmallocTotal:        0 kB
VmallocUsed:         0 kB
VmallocChunk:        0 kB
HugePages_Total:     0
HugePages_Free:      0
HugePages_Rsvd:      0
Hugepagesize:     2048 kB

It appears that mod_deflate isn't even enabled so I'm not even using gzip on Apache2, yet it serves the static HTML faster than nginx. I'm a bit puzzled, could it be that I need to just reconfigure the settings for nginx? Any advice appreciated.

Update #1 - I installed apache2-utils and ran dstat. I also changed the test file so it's now using a 9.7 mb html file, Apache2 and nginx are still pretty consistent. Perhaps I need to limit the amount of memory available or something to bottleneck it..

Here is the dstat running while I queried the 9.7 mb consecutive times:

sudo dstat
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw 
  0   0 100   0   0   0|   0     0 |   0     0 |   0     0 |   0  7230 
  0   0 100   0   0   0|   0     0 |6071B   20k|   0     0 |   0  5534 
  0   0 100   0   0   0|   0     0 | 720B   21k|   0     0 |   0  4749 
  0   0 100   0   0   0|   0     0 | 822B 4788B|   0     0 |   0  5487 
  0   0 100   0   0   0|   0     0 | 288B  408B|   0     0 |   0  4625 
  0   0 100   0   0   0|   0     0 |5595B 4057B|   0     0 |   0  5966 
  0   0 100   0   0   0|   0     0 | 957B 3710B|   0     0 |   0  4904 
  0   0 100   0   0   0|   0     0 | 986B 5013B|   0     0 |   0  6906 
  0   0 100   0   0   0|   0     0 | 872B 3636B|   0     0 |   0  5614 
  0   0 100   0   0   0|   0     0 |  80B  368B|   0     0 |   0  5506 
  0   0 100   0   0   0|   0     0 |  80B  660B|   0     0 |   0  4883 
  0   0 100   0   0   0|   0     0 |1604B 5105B|   0     0 |   0  5087 
  0   0 100   0   0   0|   0     0 | 860B 3708B|   0     0 |   0    13k
  0   0 100   0   0   0|   0     0 | 909B 3619B|   0     0 |   0    11k
  0   0 100   0   0   0|   0     0 |  16k   44k|   0     0 |   0  5920 
  0   0 100   0   0   0|   0     0 | 132B 3256B|   0     0 |   0  6946 
  0   0 100   0   0   0|   0     0 | 184B 3589B|   0     0 |   0  5083 
  0   0 100   0   0   0|   0     0 | 869B 3637B|   0     0 |   0  5528 
  0   0 100   0   0   0|   0     0 | 917B 3576B|   0     0 |   0  5638 
  0   0 100   0   0   0|   0     0 | 9.8k 2299B|   0     0 |   0  5255 
  0   0 100   0   0   0|   0     0 |6205B   11k|   0     0 |   0  7230 
  0   0 100   0   0   0|   0     0 |1712B   35k|   0     0 |   0  4863 
  0   1  99   0   0   0|   0     0 | 243k   25M|   0     0 |   0  7432 
  0   1  99   0   0   0|   0     0 | 337k   33M|   0     0 |   0  8716 
  0   1  99   0   0   0|   0     0 | 297k   35M|   0     0 |   0  6786 
  0   1  99   0   0   0|   0     0 | 349k   33M|   0     0 |   0  7655 
  0   1  99   0   0   0|   0     0 | 338k   33M|   0     0 |   0  7605 
  0   1  99   0   0   0|   0     0 | 324k   34M|   0     0 |   0  7967 
  0   1  99   0   0   0|   0     0 | 320k   35M|   0     0 |   0  7235 
  0   1  99   0   0   0|   0     0 | 333k   35M|   0     0 |   0  7062 
  0   1  99   0   0   0|   0     0 | 355k   35M|   0     0 |   0  6209 
  0   1  99   0   0   0|   0     0 | 299k   33M|   0     0 |   0  8732 
  0   1  99   0   0   0|   0     0 | 369k   34M|   0     0 |   0  8610 
  0   0 100   0   0   0|   0     0 | 352k   34M|   0     0 |   0  7635 
  0   1  99   0   0   0|   0     0 | 331k   34M|   0     0 |   0  8087 
  0   1  99   0   0   0|   0     0 | 312k   35M|   0     0 |   0  6445 
  0   0 100   0   0   0|   0     0 |  81k 7879k|   0     0 |   0  6131 
  0   0 100   0   0   0|   0     0 |  80B 1848B|   0     0 |   0  5124 
  0   0 100   0   0   0|   0     0 | 120B 6216B|   0     0 |   0  5426 
  0   0 100   0   0   0|   0     0 | 120B 3256B|   0     0 |   0  4947 
  0   0 100   0   0   0|   0     0 |  15k   43k|   0     0 |   0  5632 
  0   0 100   0   0   0|   0     0 | 829B 8504B|   0     0 |   0  5913 
  0   0 100   0   0   0|   0     0 |  92B  384B|   0     0 |   0  8680 
  0   0 100   0   0   0|   0     0 | 926B  571B|   0     0 |   0  4843 
  0   0 100   0   0   0|   0     0 | 795B  675B|   0     0 |   0  5479 
  0   0 100   0   0   0|   0     0 | 280B 2048B|   0     0 |   0  4536 
  0   0 100   0   0   0|   0     0 | 172B 1760B|   0     0 |   0  6334 
  0   0 100   0   0   0|   0     0 | 120B  456B|   0     0 |   0  5710 
  0   0 100   0   0   0|   0     0 |  80B  408B|   0     0 |   0  6225 
  0   0 100   0   0   0|   0     0 | 120B  368B|   0     0 |   0  6639 
  0   0 100   0   0   0|   0     0 | 140B  328B|   0     0 |   0  5507 
  0   0 100   0   0   0|   0     0 |7487B 9697B|   0     0 |   0  7201 
  0   0 100   0   0   0|   0     0 | 920B   37k|   0     0 |   0  6086 
  0   0 100   0   0   0|   0     0 | 320B  536B|   0     0 |   0  5756 
  0   0 100   0   0   0|   0     0 |  40B  384B|   0     0 |   0  7153 
  0   0 100   0   0   0|   0     0 |  80B  368B|   0     0 |   0  5227 
  0   0 100   0   0   0|   0     0 |  80B  408B|   0     0 |   0  6042 
  0   0 100   0   0   0|   0     0 | 160B  368B|   0     0 |   0  6730 
  0   0 100   0   0   0|   0     0 |  80B  280B|   0     0 |   0  5424 
  0   0 100   0   0   0|   0     0 |  80B  336B|   0     0 |   0  8042 
  0   0 100   0   0   0|   0     0 |  40B  384B|   0     0 |   0  5559 
  0   0 100   0   0   0|   0     0 |  80B  280B|   0     0 |   0  6266 
  0   0 100   0   0   0|   0     0 |  80B  296B|   0     0 |   0  6198 
  0   0 100   0   0   0|   0     0 |  80B  456B|   0     0 |   0  6499 
  0   0 100   0   0   0|   0     0 |  80B  368B|   0     0 |   0  7143 
  • On a connection that isn't bandwidth-constrained (as I suspect your tiny little connections here are), gzip-compressed content will be slower to transfer than non-gzip-compressed content, because of the extra CPU involved. Compressing your content is usually faster because smaller chunks of data transfer faster, but with that little test it probably won't help. Try comparing apples with apples and see what you get.

    meder : Ok. I turned gzip off on nginx, ran `ab` again and got 7.766s, 6.270s, then 6.5s for those 1000 requests which is still slower than Apache2's 6.1s ( second test was 6.0s ). I did remember to restart nginx, and I'm querying the same exact static HTML content which is 1014 bytes.
    From womble
  • Have you tried optimizing the performance by serving the files from a ramdisk? Some VPSes are notoriously bad for IOwait time, caused by contended access to the disk.

    Try running dstat on the server while the ab process is running, see whether the disks are taking a massive hit.

    meder : I installed `dstat`, edited my original post with some stats while doing the `ab` testing.
  • In order to get realistic results you must have realistic tests. It's completely feasible apache is faster for your test scenario, but are you really serving just a single one-kilobyte file?

    As you're using mpm-prefork, it's safe to say nginx will consume significantly less memory when there are several concurrent transfers. Concurrent transfers pile up easily if you have large files or your clients have slow internet connections. Nginx will win hands down when you have enough concurrent transfers for Apache to eat up all your memory.

    One can argue this is not really an issue as long as there's enough memory for Apache. However, this is not the whole truth. When less memory is consumed by http server, more content from file system will be cached, and every disk seek eliminated will be a small performance victory.

    meder : My server is already being used as a production server and hosts 4-5 sites, one of which gets ~1000ish unique hits per day but I guess that isn't really enough because it always has 300MB of ram available. So what you're saying is Apache2 will always win when there's enough available ram and less concurrent connections?
    womble : Whoa, a whole *1000* hits a day? Look out Twitter!
    meder : Not 1000 hits, 1000 uniques but probably a several thousand *hits*, of course I know that's nothing compared to the sites with millions/billions of hits per day, but I was just trying to say it isn't *completely* underutilized.
    af : Well, if you get a 100k hits a day, every request takes three seconds to serve and they are distributed evenly throughout the day, you'll have 10+X apache processes running with mpm-prefork, where X is you MinSpareServers setting. You'll have plenty of safety margin, so there's not much to optimize.
    From af
  • Your test is flawed. -c 5 does not properly test either server. an event based server like nginx is best at handling thousands of concurrent, and possibly slow downloads at once. You tested 5 concurrent downloads. -n 20000 -c 1000 might start to show nginx performing better.

    Try running this tool against both servers, and see which one falls over first

    I bet it won't be nginx :-)

    meder : thanks, I was waiting for a response like this - I'll do more benchmarking and will update!
    From Justin

0 comments:

Post a Comment