Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: Load balance between multiple connection pools #1164

Closed
wants to merge 1 commit into from
Closed

Go: Load balance between multiple connection pools #1164

wants to merge 1 commit into from

Conversation

cskr
Copy link

@cskr cskr commented Nov 2, 2014

Instead of using a single connection pool for all requests, this change
uses as many pools as CPUs available. This will reduce contention when
borrowing a connection for request.

I see an improvement of around 16% on my quad core laptop. The
improvement may be significantly higher on Peak hardware.

Instead of using a single connection pool for all requests, this change
uses as many pools as CPUs available. This will reduce contention when
borrowing a connection for request.

I see an improvement of around 16% on my quad core laptop. The
improvement may be significantly higher on Peak hardware.
@methane
Copy link
Contributor

methane commented Nov 2, 2014

I feel this tuning is ugly...
Could you show the concrete number of improvement?
Have you do warmup before measure to fill pool?

@cskr
Copy link
Author

cskr commented Nov 3, 2014

The lack of warm-up would affect other hardware platforms in the benchmarks too. Go performs well on other hardware with less parallelism, but falls behind on peak which has higher parallelism support. Hence, reducing contention can help.

@methane
Copy link
Contributor

methane commented Nov 3, 2014

@tuxychandru
When there are no idle connection, pool creates new connection.
But prepared statements are not prepared for new connection.
When using stmt for new connection, prepare() is called internally.

So higher parallelism warmup is required before higher parallelism benchmark to
pool prepared connection.

@msmith-techempower
Copy link
Member

Each test gets a warm-up before a benchmark is run: https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/toolset/benchmark/framework_test.py#L41

I assume that this would be sufficient to fill the pools, but it would need to be tested.

@hamiltont
Copy link
Contributor

@tuxychandru thanks for the PR. Some questions before we can bring this in though - is this a common approach used on production servers? I ask because 1) I've never heard of using multiple connection pools, and 2) At the moment the go test is marked realistic (e.g. "approach": "Realistic"). If this is an optimization only for the purpose of gaining a few thousand req/sec on the benchmark that's fine, but we need to know whether that is common in the go community (e.g. it's realistic) or if this is an uncommon trick (e.g. it's "stripped")

Also, I second the call for some hard numbers backing up that this will really help the benchmark performance, or minimally getting a second opinion from someone else in the go community, just because I have never heard of multiple connection pools before so I want at least some hard numbers or a second voice saying that it will improve performance.

EDIT: oops, I see you provided a number of 16%. THat's pretty great - can you find a second person to test this and confirm that it provides a benefit to them too? Or a second voice saying that this is reasonable

@cskr
Copy link
Author

cskr commented Nov 4, 2014

@hamiltont I don't know if this is a common practice. I remember reading/watching something which suggested sharding to reduce contention. That's where I got this idea from, but I cannot recollect the source.

This approach provides significant enough gain only in micro-benchmarks. In larger applications, running with lesser parallelism, an optimization like this may be pointless. Even in these benchmarks, Go performs very well without this optimization, but falls far behind on peak.

I don't have access to hardware with large parallelism to confirm that this will improve performance on peak. It did make things faster in the quad-core machines I tested, but only by under 25%.

@cskr
Copy link
Author

cskr commented Dec 31, 2014

I cannot find hardware with higher parallelism to try this on (even among my friends), if that is what this pull request if waiting for.

If you think this may not be a common approach in real applications, I don't know why one would not do this if they are going to run their application on a machine like peak with a local databse.

@methane
Copy link
Contributor

methane commented Dec 31, 2014

I want to see the results on Go 1.4.
If Go 1.4 have same problem, we can report it to go-nuts and make Go 1.5 more scalable.
I feel it is more productive than introducing tricky hack on this benchmark.

@methane
Copy link
Contributor

methane commented Dec 31, 2014

I've tested it on EC2 c3.8xlarge *2 (db, wrk+go).

Without your patch:

[ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
  8 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.94ms    2.15ms  48.34ms   91.62%
    Req/Sec     8.92k     1.05k   13.22k    70.97%
  2026243 requests in 30.00s, 270.10MB read
Requests/sec:  67543.76
Transfer/sec:      9.00MB
[ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
  8 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.94ms    2.11ms  44.61ms   91.42%
    Req/Sec     8.90k     1.04k   13.22k    70.54%
  2024159 requests in 30.00s, 269.82MB read
Requests/sec:  67474.29
Transfer/sec:      8.99MB

With your patch:

[ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
  8 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.04ms   39.90ms   1.60s    99.88%
    Req/Sec    21.24k     4.22k   36.33k    68.87%
  4820727 requests in 30.00s, 642.60MB read
Requests/sec: 160695.37
Transfer/sec:     21.42MB
[ec2-user@ip-10-0-1-175 wrk]$ ./wrk -t8 -c128 -d30 http://localhost:8080/db
Running 30s test @ http://localhost:8080/db
  8 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   782.19us  722.92us  14.02ms   87.14%
    Req/Sec    21.29k     4.11k   39.56k    66.84%
  4840233 requests in 30.00s, 645.20MB read
Requests/sec: 161346.18
Transfer/sec:     21.51MB

Huge improvemnts!

@cskr
Copy link
Author

cskr commented Dec 31, 2014

The problem is that concurrent access to sql.DB is protected by a mutex and at high parallelism the contention will add significant overhead, especially when like in this benchmark the database server is local and hence there is no network delay involved.

Go 1.4 did not do anything to change it. With the API exposed by the sql package, I don't think much can be done to avoid this.

I'm not sure if this is a tricky hack given it can be completely isolated in a wrapper in a real application.

@methane
Copy link
Contributor

methane commented Dec 31, 2014

@tuxychandru I see stackdump and recognize what is problem. I'll report it to golang.

Mutex contention is only problem for prepared statements.
How about using db.Query() instead of Prepare?

Since go-mysql-driver doesn't support client side parameter binding (aka, client-side prepare),
we should use fmt.Sprintfto avoid additional roundtrip (prepare, exec and close instead of just exec).

@cskr
Copy link
Author

cskr commented Jan 1, 2015

sql.DB itself can be shared between goroutines and is protected by a mutex, so switch to db.Query is not likely to reduce contention.

In this benchmark the database is local, so roundtrips will not be expensive. Also, I'd not use fmt.Sprintf in my production applications for the fear of SQL injection and I am certain that using prepared statements is common practice in most Go applications. Thus, using Sprintf in the benchmark will not reflect real world usage.

Reducing contention by using multiple smaller pools instead of one large pool seems like a clean solution to this problem, that can be sufficiently isolated and adopted in real applications with very little drawback.

@methane
Copy link
Contributor

methane commented Jan 1, 2015

@tuxychandru This benchmark, db is not local, but on same LAN. I've tested on placement group and c3.8xlarge. So this benchmark result is very similar to PEAK environ. (low latency, 10Gbit Ethernet).

As you can see in this stackdump, lock contention is happen for Stmt.mu, not DB.mu.
db.Query() locks DB.mu as you said, but I think it should not be bottleneck since code between lock may not take long time.

I'll make another pull request using db.Query() with benchmark result in this month.
And I'll try to fix golang/go#9484 in Go 1.5.

@methane
Copy link
Contributor

methane commented Jan 2, 2015

Benchmark updated: https://gist.github.com/methane/1341a204256e5dab52ac
Go 1.5 will be nice if my patch is accepted. And Sprintf is fastest on -c128.

@methane
Copy link
Contributor

methane commented Jan 2, 2015

@tuxychandru

Also, I'd not use fmt.Sprintf in my production applications for the fear of SQL injection and I am certain that using prepared statements is common practice in most Go applications. Thus, using Sprintf in the benchmark will not reflect real world usage.

When writing Go program without ORM, I use prepared statement for most case. But sometimes uses db.Query(stmt, params...) for queries with variable length argument (e.g. WHERE id IN (?, ?, ?, ?)).

Since fmt.Sprintf with "%d" is safe placeholder replacement for integers, I use it in real world without fear of SQL injection. But I'm not sure it is a common practice.

Reducing contention by using multiple smaller pools instead of one large pool seems like a clean solution to this problem, that can be sufficiently isolated and adopted in real applications with very little drawback.

OK, I agree. I'm +1 now since I will be able to see difference between Go 1.4 and 1.5 on one large pool in other benchmarks like Go/gin.

Could you add comment and use smaller pools like:

db.SetMaxIdleConns(maxConnectionCount/npool + 1)

@methane
Copy link
Contributor

methane commented Feb 18, 2015

@tuxychandru I've added interpolateParams=true option to go-sql-driver/mysql.
See go-sql-driver/mysql#309 and go-sql-driver/mysql#318.

So now we have another option: Stop using DB.Prepare() and use interpolateParams=true.

@hamiltont
Copy link
Contributor

Hey @methane and @tuxychandru - thanks for the progress so far, this appears to be yielding large speedups! If I'm reading the thread correctly, this PR is indeed something we are interested in merging, perhaps with a nice comment in the codebase summarizing the main points covered in this thread (and mentioning that this may be repaired by Go 1.5).

@methane - you're my trusted expert for understanding Go's ecosystem, so please comment if you disagree with my short summary above, or think of somehting I've missed

@methane
Copy link
Contributor

methane commented Feb 25, 2015

I'm not familiar with Go community much. But I saw some article about prepared-statement vs interpolation recently.
So both of prepared statement and interpolation are common practice.

On the other hand, I feel multi pool and Sprintf are not common practice nor preferred way.
So I'm +1 on using interapolateParams=true option.

I'll create another pull request to use it.

@hamiltont
Copy link
Contributor

Great. Given the speedups I'd be comfortable merging this, but I think it absolutely needs a big comment in the source code explaining what's happening, especially if the reason for multiple connection pools is to work around limitations of some part of the Go ecosystem e.g. sql.DB. A link back to this thread would be good to

@methane
Copy link
Contributor

methane commented Feb 26, 2015

I've created #1353

@hamiltont
Copy link
Contributor

Just re-reviewed this and I'm confused - are @tuxychandru and @methane are in agreement on the direction the go framework should be taking? I'm eager to get any fixes necessary included in R10, but I'm no Go expert so I need to hear from you

It looks like #1353 uses this db.Query stuff, but in the thread above @tuxychandru says that won't create the performance benefits of using multiple connection pools. So, is #1353 intended to replace this PR as a stopgap until the inherent problem in Go is solved, or is #1353 intended to be merged in addition to this PR? The benchmark results above are excellent - has #1353 been tested similarly? If not, benchmarking #1353 would probably add valuable information to this debate - if it achieves similar results in a much more conventional manner, then by all means let's go with that, but if it doesn't achieve similar results then this PR has a lot of value IMO

@methane
Copy link
Contributor

methane commented Mar 6, 2015

I intended to #1353 replaces this PR.
While I hasn't benchmarked it yet, it may be similar to Sprintf in this benchmark.

@LadyMozzarella
Copy link
Contributor

Closing. I merged #1353 as replacement, as @methane intended. Please update and reopen if we should readdress this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants