-
Notifications
You must be signed in to change notification settings - Fork 21.7k
core/types, trie: reduce allocations in derivesha #30747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
eccb604 to
82269e4
Compare
82269e4 to
01f0826
Compare
01f0826 to
6bf9153
Compare
|
|
|
I think you have 'master' and 'PR' reversed in your comment. |
I don't think he had. Not sure which comment you referred to, but, first one: Second one So |
|
Yeah I got confused with the |
|
Deployed on bench07 and 08 for snap sync EDIT: Snap sync finished correctly, with a complete state assembled locally. |
1dfbc99 to
c1930b3
Compare
Alternative to ethereum#30746, potential follow-up to ethereum#30743 . This PR makes the stacktrie always copy incoming value buffers, and reuse them internally. Improvement in ethereum#30743: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.2 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 430.0µ ± 12% -10.00% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.17Ki ± 0% 25.65Ki ± 0% -43.21% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.0 ± 0% 232.0 ± 0% -81.57% (p=0.000 n=10) ``` This PR further enhances that: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.2 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 430.0µ ± 12% 423.6µ ± 13% ~ (p=0.739 n=10) │ derivesha.2 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 25.654Ki ± 0% 4.960Ki ± 0% -80.67% (p=0.000 n=10) │ derivesha.2 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 232.00 ± 0% 37.00 ± 0% -84.05% (p=0.000 n=10) ``` So the total derivesha-improvement over *both PRS* is: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 423.6µ ± 13% -11.33% (p=0.015 n=10) │ derivesha.1 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.171Ki ± 0% 4.960Ki ± 0% -89.02% (p=0.000 n=10) │ derivesha.1 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.00 ± 0% 37.00 ± 0% -97.06% (p=0.000 n=10) ``` Since this PR always copies the incoming value, it adds a little bit of a penalty on the previous insert-benchmark, which copied nothing (always passed the same empty slice as input) : ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/trie cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ stacktrie.7 │ stacktrie.10 │ │ sec/op │ sec/op vs base │ Insert100K-8 88.21m ± 34% 92.37m ± 31% ~ (p=0.280 n=10) │ stacktrie.7 │ stacktrie.10 │ │ B/op │ B/op vs base │ Insert100K-8 3.424Ki ± 3% 4.581Ki ± 3% +33.80% (p=0.000 n=10) │ stacktrie.7 │ stacktrie.10 │ │ allocs/op │ allocs/op vs base │ Insert100K-8 22.00 ± 5% 26.00 ± 4% +18.18% (p=0.000 n=10) ``` --------- Co-authored-by: Gary Rong <[email protected]> Co-authored-by: Felix Lange <[email protected]>
Alternative to ethereum#30746, potential follow-up to ethereum#30743 . This PR makes the stacktrie always copy incoming value buffers, and reuse them internally. Improvement in ethereum#30743: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.2 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 430.0µ ± 12% -10.00% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.17Ki ± 0% 25.65Ki ± 0% -43.21% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.0 ± 0% 232.0 ± 0% -81.57% (p=0.000 n=10) ``` This PR further enhances that: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.2 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 430.0µ ± 12% 423.6µ ± 13% ~ (p=0.739 n=10) │ derivesha.2 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 25.654Ki ± 0% 4.960Ki ± 0% -80.67% (p=0.000 n=10) │ derivesha.2 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 232.00 ± 0% 37.00 ± 0% -84.05% (p=0.000 n=10) ``` So the total derivesha-improvement over *both PRS* is: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 423.6µ ± 13% -11.33% (p=0.015 n=10) │ derivesha.1 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.171Ki ± 0% 4.960Ki ± 0% -89.02% (p=0.000 n=10) │ derivesha.1 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.00 ± 0% 37.00 ± 0% -97.06% (p=0.000 n=10) ``` Since this PR always copies the incoming value, it adds a little bit of a penalty on the previous insert-benchmark, which copied nothing (always passed the same empty slice as input) : ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/trie cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ stacktrie.7 │ stacktrie.10 │ │ sec/op │ sec/op vs base │ Insert100K-8 88.21m ± 34% 92.37m ± 31% ~ (p=0.280 n=10) │ stacktrie.7 │ stacktrie.10 │ │ B/op │ B/op vs base │ Insert100K-8 3.424Ki ± 3% 4.581Ki ± 3% +33.80% (p=0.000 n=10) │ stacktrie.7 │ stacktrie.10 │ │ allocs/op │ allocs/op vs base │ Insert100K-8 22.00 ± 5% 26.00 ± 4% +18.18% (p=0.000 n=10) ``` --------- Co-authored-by: Gary Rong <[email protected]> Co-authored-by: Felix Lange <[email protected]>
Alternative to ethereum#30746, potential follow-up to ethereum#30743 . This PR makes the stacktrie always copy incoming value buffers, and reuse them internally. Improvement in ethereum#30743: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.2 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 430.0µ ± 12% -10.00% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.17Ki ± 0% 25.65Ki ± 0% -43.21% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.0 ± 0% 232.0 ± 0% -81.57% (p=0.000 n=10) ``` This PR further enhances that: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.2 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 430.0µ ± 12% 423.6µ ± 13% ~ (p=0.739 n=10) │ derivesha.2 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 25.654Ki ± 0% 4.960Ki ± 0% -80.67% (p=0.000 n=10) │ derivesha.2 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 232.00 ± 0% 37.00 ± 0% -84.05% (p=0.000 n=10) ``` So the total derivesha-improvement over *both PRS* is: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 423.6µ ± 13% -11.33% (p=0.015 n=10) │ derivesha.1 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.171Ki ± 0% 4.960Ki ± 0% -89.02% (p=0.000 n=10) │ derivesha.1 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.00 ± 0% 37.00 ± 0% -97.06% (p=0.000 n=10) ``` Since this PR always copies the incoming value, it adds a little bit of a penalty on the previous insert-benchmark, which copied nothing (always passed the same empty slice as input) : ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/trie cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ stacktrie.7 │ stacktrie.10 │ │ sec/op │ sec/op vs base │ Insert100K-8 88.21m ± 34% 92.37m ± 31% ~ (p=0.280 n=10) │ stacktrie.7 │ stacktrie.10 │ │ B/op │ B/op vs base │ Insert100K-8 3.424Ki ± 3% 4.581Ki ± 3% +33.80% (p=0.000 n=10) │ stacktrie.7 │ stacktrie.10 │ │ allocs/op │ allocs/op vs base │ Insert100K-8 22.00 ± 5% 26.00 ± 4% +18.18% (p=0.000 n=10) ``` --------- Co-authored-by: Gary Rong <[email protected]> Co-authored-by: Felix Lange <[email protected]>
Alternative to #30746, potential follow-up to #30743 . This PR makes the stacktrie always copy incoming value buffers, and reuse them internally.
Improvement in #30743:
This PR further enhances that:
So the total derivesha-improvement over both PRS is:
Since this PR always copies the incoming value, it adds a little bit of a penalty on the previous insert-benchmark, which copied nothing (always passed the same empty slice as input) :