Skip to content

If using weights, upsync does not properly distribute requests between a large number of backends #289

@Valeriyy

Description

@Valeriyy

Hello, guys!
I have 20 nginx upstreams with 60 php-fpm backends each. When I started using upsync, I got high CPU utilization by php-fpm processes on servers. I tested nginx before using upsync and after and got a stunning result.
With the standard upstreams configuration:

upstrem test {
        server 10.0.134.116:8080 weight=15 max_fails=0;
        server 10.0.134.121:8080 weight=13 max_fails=0;
        server 10.0.137.52:8080 weight=14 max_fails=0;
        server 10.0.137.51:8080 weight=14 max_fails=0;
        server 10.0.135.159:8080 weight=14 max_fails=0;
        server 10.0.136.45:8080 weight=14 max_fails=0;
        server 10.0.137.57:8080 weight=15 max_fails=0;
        server 10.0.137.58:8080 weight=15 max_fails=0;
        server 10.0.136.28:8080 weight=14 max_fails=0;
        server 10.0.134.192:8080 weight=15 max_fails=0;
        server 10.0.134.179:8080 weight=13 max_fails=0;
        server 10.0.137.60:8080 weight=15 max_fails=0;
        server 10.0.135.139:8080 weight=14 max_fails=0;
        server 10.0.137.36:8080 weight=14 max_fails=0;
        server 10.0.136.65:8080 weight=14 max_fails=0;
        server 10.0.134.212:8080 weight=13 max_fails=0;
        server 10.0.137.92:8080 weight=14 max_fails=0;
        server 10.0.134.118:8080 weight=15 max_fails=0;
        server 10.0.137.61:8080 weight=14 max_fails=0;
        server 10.0.137.122:8080 weight=14 max_fails=0;
        server 10.0.137.243:8080 weight=13 max_fails=0;
        server 10.0.136.39:8080 weight=14 max_fails=0;
        server 10.0.137.195:8080 weight=13 max_fails=0;
        server 10.0.134.122:8080 weight=13 max_fails=0;
        server 10.0.137.171:8080 weight=13 max_fails=0;
        server 10.0.134.123:8080 weight=13 max_fails=0;
        server 10.0.137.54:8080 weight=15 max_fails=0;
        server 10.0.137.168:8080 weight=13 max_fails=0;
        server 10.0.136.51:8080 weight=14 max_fails=0;
        server 10.0.137.31:8080 weight=14 max_fails=0;
        server 10.0.137.156:8080 weight=14 max_fails=0;
        server 10.0.135.158:8080 weight=14 max_fails=0;
        server 10.0.137.23:8080 weight=14 max_fails=0;
        server 10.0.134.127:8080 weight=13 max_fails=0;
        server 10.0.137.170:8080 weight=13 max_fails=0;
        server 10.0.137.173:8080 weight=13 max_fails=0;
        server 10.0.134.98:8080 weight=14 max_fails=0;
        server 10.0.137.71:8080 weight=14 max_fails=0;
        server 10.0.135.140:8080 weight=14 max_fails=0;
        server 10.0.137.77:8080 weight=14 max_fails=0;
        server 10.0.136.49:8080 weight=14 max_fails=0;
        server 10.0.137.73:8080 weight=14 max_fails=0;
        server 10.0.136.38:8080 weight=14 max_fails=0;
        server 10.0.137.35:8080 weight=14 max_fails=0;
        server 10.0.137.138:8080 weight=14 max_fails=0;
        server 10.0.137.162:8080 weight=14 max_fails=0;
        server 10.0.136.43:8080 weight=14 max_fails=0;
        server 10.0.137.144:8080 weight=14 max_fails=0;
        server 10.0.134.124:8080 weight=13 max_fails=0;
        server 10.0.134.128:8080 weight=13 max_fails=0;
        server 10.0.136.48:8080 weight=14 max_fails=0;
        server 10.0.137.32:8080 weight=14 max_fails=0;
        server 10.0.137.169:8080 weight=13 max_fails=0;
        server 10.0.136.26:8080 weight=14 max_fails=0;
        server 10.0.136.68:8080 weight=14 max_fails=0;
        server 10.0.137.74:8080 weight=14 max_fails=0;
        server 10.0.137.81:8080 weight=14 max_fails=0;
        server 10.0.137.254:8080 weight=14 max_fails=0;
        server 10.0.137.172:8080 weight=13 max_fails=0;
        server 10.0.136.64:8080 weight=14 max_fails=0;
}

server {
        listen 80;

        location / {
                include fastcgi_params;
                fastcgi_pass  test;
                fastcgi_param SCRIPT_FILENAME $document_root/index.php;
        }
}

I have this result with distribute requests between backends:

for t in $(for i in {1..10}; do date "+%d/%b/%Y:%H:%M" --date "-$i min"; done); do grep -r "*$t.*GET \/ " /var/log/nginx/access.log | sed -r 's/.*upstream_addr:\s(.*):8080.*/\1/g'; done | sort | uniq -c | sort -nr
     10 10.0.137.60
     10 10.0.137.58
     10 10.0.137.57
      9 10.0.137.54
      9 10.0.134.192
      9 10.0.134.118
      9 10.0.134.116
      8 10.0.137.77
      8 10.0.137.74
      8 10.0.137.73
      8 10.0.137.71
      8 10.0.137.61
      8 10.0.137.52
      8 10.0.137.51
      8 10.0.137.35
      8 10.0.137.32
      8 10.0.137.23
      8 10.0.135.159
      8 10.0.135.158
      8 10.0.135.140
      8 10.0.135.139
      8 10.0.134.98
      7 10.0.137.92
      7 10.0.137.81
      7 10.0.137.36
      7 10.0.137.31
      7 10.0.137.254
      7 10.0.136.43
      7 10.0.136.28
      6 10.0.137.162
      6 10.0.137.156
      6 10.0.137.144
      6 10.0.137.138
      6 10.0.137.122
      6 10.0.136.68
      6 10.0.136.64
      6 10.0.136.51
      6 10.0.136.49
      6 10.0.136.48
      6 10.0.136.39
      6 10.0.136.38
      6 10.0.136.26
      6 10.0.134.124
      6 10.0.134.123
      5 10.0.137.243
      5 10.0.137.195
      5 10.0.137.173
      5 10.0.137.172
      5 10.0.137.171
      5 10.0.137.170
      5 10.0.137.169
      5 10.0.137.168
      5 10.0.136.45
      5 10.0.134.212
      5 10.0.134.179
      5 10.0.134.128
      5 10.0.134.127
      5 10.0.134.122
      5 10.0.134.121

Now enable upsync:

upstream test {
    upsync 127.0.0.1:2379/v2/keys/upsync/test upsync_interval=5s upsync_timeout=5m upsync_type=etcd strong_dependency=off;
    upsync_dump_path /etc/nginx/conf.d/upsync/test.inc;
    include /etc/nginx/conf.d/upsync/test.inc;
}

server {
        listen 80;

        location / {
                include fastcgi_params;
                fastcgi_pass  test;
                fastcgi_param SCRIPT_FILENAME $document_root/index.php;
        }
}

Adding entries with upstreams to etcd, run my test again and see the following result:

for t in $(for i in {1..10}; do date "+%d/%b/%Y:%H:%M" --date "-$i min"; done); do grep -r "*$t.*GET \/ " /var/log/nginx/access.log | sed -r 's/.*upstream_addr:\s(.*):8080.*/\1/g'; done | sort | uniq -c | sort -nr
     45 10.0.137.54
     30 10.0.137.60
     29 10.0.137.57
     23 10.0.134.192
     21 10.0.134.118
     19 10.0.134.116
     17 10.0.137.58
     12 10.0.137.36
     12 10.0.137.35
     10 10.0.137.61
      8 10.0.137.32
      8 10.0.137.23
      7 10.0.137.81
      7 10.0.137.77
      7 10.0.137.74
      7 10.0.137.162
      7 10.0.136.28
      6 10.0.137.92
      6 10.0.135.159
      6 10.0.135.139
      5 10.0.137.144
      5 10.0.135.158
      4 10.0.137.73
      4 10.0.137.71
      4 10.0.137.254
      4 10.0.137.156
      4 10.0.137.122
      4 10.0.136.51
      4 10.0.136.43
      4 10.0.136.26
      3 10.0.137.138
      3 10.0.136.68
      3 10.0.136.48
      3 10.0.136.39
      3 10.0.136.38
      2 10.0.137.52
      2 10.0.137.51
      2 10.0.137.31
      2 10.0.137.173
      2 10.0.137.170
      2 10.0.137.168
      2 10.0.136.65
      2 10.0.136.64
      2 10.0.136.49
      2 10.0.136.45
      2 10.0.135.140
      2 10.0.134.98
      2 10.0.134.212
      2 10.0.134.127
      2 10.0.134.124
      1 10.0.137.171
      1 10.0.137.169
      1 10.0.134.179
      1 10.0.134.123
      1 10.0.134.122
      1 10.0.134.121

As you can see, nginx with upsync forward a lot more requests to several servers than to others. If I specify weight=1 for every backend, then load will be approximately equal. But this does not suit me, because I have different CPU and RAM configurations on different servers under high load. I need exactly the values of weights that I had without upsync. I have a suspicion that upsync does not work correctly with weights and needs the fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions