-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Go: Load balance between multiple connection pools #1164
Conversation
Instead of using a single connection pool for all requests, this change uses as many pools as CPUs available. This will reduce contention when borrowing a connection for request. I see an improvement of around 16% on my quad core laptop. The improvement may be significantly higher on Peak hardware.
I feel this tuning is ugly... |
The lack of warm-up would affect other hardware platforms in the benchmarks too. Go performs well on other hardware with less parallelism, but falls behind on peak which has higher parallelism support. Hence, reducing contention can help. |
@tuxychandru So higher parallelism warmup is required before higher parallelism benchmark to |
Each test gets a warm-up before a benchmark is run: https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/toolset/benchmark/framework_test.py#L41 I assume that this would be sufficient to fill the pools, but it would need to be tested. |
@tuxychandru thanks for the PR. Some questions before we can bring this in though - is this a common approach used on production servers? I ask because 1) I've never heard of using multiple connection pools, and 2) At the moment the go test is marked realistic (e.g. Also, I second the call for some hard numbers backing up that this will really help the benchmark performance, or minimally getting a second opinion from someone else in the go community, just because I have never heard of multiple connection pools before so I want at least some hard numbers or a second voice saying that it will improve performance. EDIT: oops, I see you provided a number of 16%. THat's pretty great - can you find a second person to test this and confirm that it provides a benefit to them too? Or a second voice saying that this is reasonable |
@hamiltont I don't know if this is a common practice. I remember reading/watching something which suggested sharding to reduce contention. That's where I got this idea from, but I cannot recollect the source. This approach provides significant enough gain only in micro-benchmarks. In larger applications, running with lesser parallelism, an optimization like this may be pointless. Even in these benchmarks, Go performs very well without this optimization, but falls far behind on peak. I don't have access to hardware with large parallelism to confirm that this will improve performance on peak. It did make things faster in the quad-core machines I tested, but only by under 25%. |
I cannot find hardware with higher parallelism to try this on (even among my friends), if that is what this pull request if waiting for. If you think this may not be a common approach in real applications, I don't know why one would not do this if they are going to run their application on a machine like peak with a local databse. |
I want to see the results on Go 1.4. |
I've tested it on EC2 c3.8xlarge *2 (db, wrk+go). Without your patch: [ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
8 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.94ms 2.15ms 48.34ms 91.62%
Req/Sec 8.92k 1.05k 13.22k 70.97%
2026243 requests in 30.00s, 270.10MB read
Requests/sec: 67543.76
Transfer/sec: 9.00MB
[ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
8 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.94ms 2.11ms 44.61ms 91.42%
Req/Sec 8.90k 1.04k 13.22k 70.54%
2024159 requests in 30.00s, 269.82MB read
Requests/sec: 67474.29
Transfer/sec: 8.99MB With your patch: [ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
8 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.04ms 39.90ms 1.60s 99.88%
Req/Sec 21.24k 4.22k 36.33k 68.87%
4820727 requests in 30.00s, 642.60MB read
Requests/sec: 160695.37
Transfer/sec: 21.42MB
[ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
8 threads and 128 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 782.19us 722.92us 14.02ms 87.14%
Req/Sec 21.29k 4.11k 39.56k 66.84%
4840233 requests in 30.00s, 645.20MB read
Requests/sec: 161346.18
Transfer/sec: 21.51MB Huge improvemnts! |
The problem is that concurrent access to Go 1.4 did not do anything to change it. With the API exposed by the I'm not sure if this is a tricky hack given it can be completely isolated in a wrapper in a real application. |
@tuxychandru I see stackdump and recognize what is problem. I'll report it to golang. Mutex contention is only problem for prepared statements. Since go-mysql-driver doesn't support client side parameter binding (aka, client-side prepare), |
In this benchmark the database is local, so roundtrips will not be expensive. Also, I'd not use Reducing contention by using multiple smaller pools instead of one large pool seems like a clean solution to this problem, that can be sufficiently isolated and adopted in real applications with very little drawback. |
@tuxychandru This benchmark, db is not local, but on same LAN. I've tested on placement group and c3.8xlarge. So this benchmark result is very similar to PEAK environ. (low latency, 10Gbit Ethernet). As you can see in this stackdump, lock contention is happen for Stmt.mu, not DB.mu. I'll make another pull request using |
Benchmark updated: https://gist.github.com/methane/1341a204256e5dab52ac |
When writing Go program without ORM, I use prepared statement for most case. But sometimes uses Since fmt.Sprintf with "%d" is safe placeholder replacement for integers, I use it in real world without fear of SQL injection. But I'm not sure it is a common practice.
OK, I agree. I'm +1 now since I will be able to see difference between Go 1.4 and 1.5 on one large pool in other benchmarks like Could you add comment and use smaller pools like: db.SetMaxIdleConns(maxConnectionCount/npool + 1) |
@tuxychandru I've added So now we have another option: Stop using |
Hey @methane and @tuxychandru - thanks for the progress so far, this appears to be yielding large speedups! If I'm reading the thread correctly, this PR is indeed something we are interested in merging, perhaps with a nice comment in the codebase summarizing the main points covered in this thread (and mentioning that this may be repaired by Go 1.5). @methane - you're my trusted expert for understanding Go's ecosystem, so please comment if you disagree with my short summary above, or think of somehting I've missed |
I'm not familiar with Go community much. But I saw some article about prepared-statement vs interpolation recently. On the other hand, I feel multi pool and Sprintf are not common practice nor preferred way. I'll create another pull request to use it. |
Great. Given the speedups I'd be comfortable merging this, but I think it absolutely needs a big comment in the source code explaining what's happening, especially if the reason for multiple connection pools is to work around limitations of some part of the Go ecosystem e.g. |
I've created #1353 |
Just re-reviewed this and I'm confused - are @tuxychandru and @methane are in agreement on the direction the go framework should be taking? I'm eager to get any fixes necessary included in R10, but I'm no Go expert so I need to hear from you It looks like #1353 uses this |
I intended to #1353 replaces this PR. |
Instead of using a single connection pool for all requests, this change
uses as many pools as CPUs available. This will reduce contention when
borrowing a connection for request.
I see an improvement of around 16% on my quad core laptop. The
improvement may be significantly higher on Peak hardware.