core/types, trie: reduce allocations in derivesha #30747

holiman · 2024-11-12T09:11:01Z

Alternative to #30746, potential follow-up to #30743 . This PR makes the stacktrie always copy incoming value buffers, and reuse them internally.

Improvement in #30743:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
                          │ derivesha.1 │             derivesha.2              │
                          │   sec/op    │    sec/op     vs base                │
DeriveSha200/stack_trie-8   477.8µ ± 2%   430.0µ ± 12%  -10.00% (p=0.000 n=10)

                          │ derivesha.1  │             derivesha.2              │
                          │     B/op     │     B/op      vs base                │
DeriveSha200/stack_trie-8   45.17Ki ± 0%   25.65Ki ± 0%  -43.21% (p=0.000 n=10)

                          │ derivesha.1 │            derivesha.2             │
                          │  allocs/op  │ allocs/op   vs base                │
DeriveSha200/stack_trie-8   1259.0 ± 0%   232.0 ± 0%  -81.57% (p=0.000 n=10)

This PR further enhances that:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
                          │ derivesha.2  │          derivesha.3           │
                          │    sec/op    │    sec/op     vs base          │
DeriveSha200/stack_trie-8   430.0µ ± 12%   423.6µ ± 13%  ~ (p=0.739 n=10)

                          │  derivesha.2  │             derivesha.3              │
                          │     B/op      │     B/op      vs base                │
DeriveSha200/stack_trie-8   25.654Ki ± 0%   4.960Ki ± 0%  -80.67% (p=0.000 n=10)

                          │ derivesha.2 │            derivesha.3             │
                          │  allocs/op  │ allocs/op   vs base                │
DeriveSha200/stack_trie-8   232.00 ± 0%   37.00 ± 0%  -84.05% (p=0.000 n=10)

So the total derivesha-improvement over both PRS is:

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
                          │ derivesha.1 │             derivesha.3              │
                          │   sec/op    │    sec/op     vs base                │
DeriveSha200/stack_trie-8   477.8µ ± 2%   423.6µ ± 13%  -11.33% (p=0.015 n=10)

                          │  derivesha.1  │             derivesha.3              │
                          │     B/op      │     B/op      vs base                │
DeriveSha200/stack_trie-8   45.171Ki ± 0%   4.960Ki ± 0%  -89.02% (p=0.000 n=10)

                          │ derivesha.1  │            derivesha.3             │
                          │  allocs/op   │ allocs/op   vs base                │
DeriveSha200/stack_trie-8   1259.00 ± 0%   37.00 ± 0%  -97.06% (p=0.000 n=10)

Since this PR always copies the incoming value, it adds a little bit of a penalty on the previous insert-benchmark, which copied nothing (always passed the same empty slice as input) :

goos: linux
goarch: amd64
pkg: github.com/ethereum/go-ethereum/trie
cpu: 12th Gen Intel(R) Core(TM) i7-1270P
             │ stacktrie.7  │          stacktrie.10          │
             │    sec/op    │    sec/op     vs base          │
Insert100K-8   88.21m ± 34%   92.37m ± 31%  ~ (p=0.280 n=10)

             │ stacktrie.7  │             stacktrie.10             │
             │     B/op     │     B/op      vs base                │
Insert100K-8   3.424Ki ± 3%   4.581Ki ± 3%  +33.80% (p=0.000 n=10)

             │ stacktrie.7 │            stacktrie.10            │
             │  allocs/op  │ allocs/op   vs base                │
Insert100K-8    22.00 ± 5%   26.00 ± 4%  +18.18% (p=0.000 n=10)

trie/stacktrie.go

rjl493456442 · 2025-09-02T03:59:37Z


[[ Master ]]
goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: Apple M1 Pro
BenchmarkDeriveSha200
BenchmarkDeriveSha200/std_trie
BenchmarkDeriveSha200/std_trie-8                6921        167578 ns/op       77865 B/op       1731 allocs/op
BenchmarkDeriveSha200/stack_trie
BenchmarkDeriveSha200/stack_trie-8              7146        168260 ns/op       26098 B/op        232 allocs/op

[[ PR ]]
goos: darwin
goarch: arm64
pkg: github.com/ethereum/go-ethereum/core/types
cpu: Apple M1 Pro
BenchmarkDeriveSha200
BenchmarkDeriveSha200/std_trie
BenchmarkDeriveSha200/std_trie-8                7058        172988 ns/op       80054 B/op       1926 allocs/op
BenchmarkDeriveSha200/stack_trie
BenchmarkDeriveSha200/stack_trie-8              7122        166626 ns/op         745 B/op         19 allocs/op

rjl493456442 · 2025-09-02T07:25:12Z

[[ Master ]]

(pprof) focus=DeriveSha
(pprof) top
Active filters:
   focus=DeriveSha
Showing nodes accounting for 75616.73MB, 1.69% of 4479364.35MB total
Dropped 36 nodes (cum <= 22396.82MB)
Showing top 10 nodes out of 49
      flat  flat%   sum%        cum   cum%
49936.95MB  1.11%  1.11% 68792.73MB  1.54%  github.com/ethereum/go-ethereum/core/types.encodeForDerive
   13106MB  0.29%  1.41% 17499.76MB  0.39%  github.com/ethereum/go-ethereum/core/types.Receipts.EncodeIndex
 6319.21MB  0.14%  1.55%  6372.15MB  0.14%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).writeBytes
 4116.95MB 0.092%  1.64%  4116.95MB 0.092%  github.com/ethereum/go-ethereum/rlp.(*EncoderBuffer).AppendToBytes
     988MB 0.022%  1.66%      988MB 0.022%  bytes.growSlice
  560.10MB 0.013%  1.67%   560.10MB 0.013%  github.com/ethereum/go-ethereum/trie.init.func3
  464.51MB  0.01%  1.69%   464.51MB  0.01%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).list (inline)
     113MB 0.0025%  1.69%  6600.56MB  0.15%  github.com/ethereum/go-ethereum/trie.(*StackTrie).hash
      11MB 0.00025%  1.69%  6989.98MB  0.16%  github.com/ethereum/go-ethereum/trie.(*StackTrie).Update
       1MB 2.2e-05%  1.69%      989MB 0.022%  bytes.(*Buffer).grow
(pprof)

[[ PR ]]

(pprof) top
Active filters:
   focus=DeriveSha
Showing nodes accounting for 41593.68MB, 0.93% of 4490562.58MB total
Dropped 37 nodes (cum <= 22452.81MB)
Showing top 10 nodes out of 50
      flat  flat%   sum%        cum   cum%
14087.04MB  0.31%  0.31% 23043.54MB  0.51%  github.com/ethereum/go-ethereum/trie.(*StackTrie).Update
12947.95MB  0.29%   0.6% 17368.17MB  0.39%  github.com/ethereum/go-ethereum/core/types.Receipts.EncodeIndex
 6340.27MB  0.14%  0.74%  6395.78MB  0.14%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).writeBytes
 4156.57MB 0.093%  0.84%  4156.57MB 0.093%  github.com/ethereum/go-ethereum/rlp.(*EncoderBuffer).AppendToBytes
 1662.01MB 0.037%  0.87%  1662.01MB 0.037%  github.com/ethereum/go-ethereum/trie.(*unsafeBytesPool).get (inline)
 1003.13MB 0.022%   0.9%  1003.13MB 0.022%  bytes.growSlice
  836.15MB 0.019%  0.91%   836.15MB 0.019%  github.com/ethereum/go-ethereum/trie.init.func3
  448.04MB  0.01%  0.92%   448.04MB  0.01%  github.com/ethereum/go-ethereum/rlp.(*encBuffer).list (inline)
     111MB 0.0025%  0.93%  6641.22MB  0.15%  github.com/ethereum/go-ethereum/trie.(*StackTrie).hash
    1.50MB 3.3e-05%  0.93%  1004.63MB 0.022%  bytes.(*Buffer).grow
(pprof)

fjl · 2025-09-02T11:08:39Z

I think you have 'master' and 'PR' reversed in your comment.

holiman · 2025-09-05T17:28:40Z

I think you have 'master' and 'PR' reversed in your comment.

I don't think he had. Not sure which comment you referred to, but, first one:

Master
BenchmarkDeriveSha200/stack_trie-8              7146        168260 ns/op       26098 B/op        232 allocs/op
PR
BenchmarkDeriveSha200/stack_trie-8              7122        166626 ns/op         745 B/op         19 allocs/op

Second one

Master
49936.95MB  1.11%  1.11% 68792.73MB  1.54%  github.com/ethereum/go-ethereum/core/types.encodeForDerive
   13106MB  0.29%  1.41% 17499.76MB  0.39%  github.com/ethereum/go-ethereum/core/types.Receipts.EncodeIndex

PR
14087.04MB  0.31%  0.31% 23043.54MB  0.51%  github.com/ethereum/go-ethereum/trie.(*StackTrie).Update
12947.95MB  0.29%   0.6% 17368.17MB  0.39%  github.com/ethereum/go-ethereum/core/types.Receipts.EncodeIndex

So EncodeIndex comparable, whereas pr has, as largest offender, Update at 14G and master has encodeForDerive at 50G.

fjl · 2025-09-08T08:06:20Z

Yeah I got confused with the std_trie vs stack_trie

rjl493456442 · 2025-09-08T12:22:57Z

Deployed on bench07 and 08 for snap sync

EDIT: Snap sync finished correctly, with a complete state assembled locally.

Alternative to ethereum#30746, potential follow-up to ethereum#30743 . This PR makes the stacktrie always copy incoming value buffers, and reuse them internally. Improvement in ethereum#30743: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.2 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 430.0µ ± 12% -10.00% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.17Ki ± 0% 25.65Ki ± 0% -43.21% (p=0.000 n=10) │ derivesha.1 │ derivesha.2 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.0 ± 0% 232.0 ± 0% -81.57% (p=0.000 n=10) ``` This PR further enhances that: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.2 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 430.0µ ± 12% 423.6µ ± 13% ~ (p=0.739 n=10) │ derivesha.2 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 25.654Ki ± 0% 4.960Ki ± 0% -80.67% (p=0.000 n=10) │ derivesha.2 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 232.00 ± 0% 37.00 ± 0% -84.05% (p=0.000 n=10) ``` So the total derivesha-improvement over *both PRS* is: ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/core/types cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ derivesha.1 │ derivesha.3 │ │ sec/op │ sec/op vs base │ DeriveSha200/stack_trie-8 477.8µ ± 2% 423.6µ ± 13% -11.33% (p=0.015 n=10) │ derivesha.1 │ derivesha.3 │ │ B/op │ B/op vs base │ DeriveSha200/stack_trie-8 45.171Ki ± 0% 4.960Ki ± 0% -89.02% (p=0.000 n=10) │ derivesha.1 │ derivesha.3 │ │ allocs/op │ allocs/op vs base │ DeriveSha200/stack_trie-8 1259.00 ± 0% 37.00 ± 0% -97.06% (p=0.000 n=10) ``` Since this PR always copies the incoming value, it adds a little bit of a penalty on the previous insert-benchmark, which copied nothing (always passed the same empty slice as input) : ``` goos: linux goarch: amd64 pkg: github.com/ethereum/go-ethereum/trie cpu: 12th Gen Intel(R) Core(TM) i7-1270P │ stacktrie.7 │ stacktrie.10 │ │ sec/op │ sec/op vs base │ Insert100K-8 88.21m ± 34% 92.37m ± 31% ~ (p=0.280 n=10) │ stacktrie.7 │ stacktrie.10 │ │ B/op │ B/op vs base │ Insert100K-8 3.424Ki ± 3% 4.581Ki ± 3% +33.80% (p=0.000 n=10) │ stacktrie.7 │ stacktrie.10 │ │ allocs/op │ allocs/op vs base │ Insert100K-8 22.00 ± 5% 26.00 ± 4% +18.18% (p=0.000 n=10) ``` --------- Co-authored-by: Gary Rong <[email protected]> Co-authored-by: Felix Lange <[email protected]>

holiman mentioned this pull request Nov 21, 2024

trie: reduce allocations in stacktrie #30743

Merged

holiman commented Jan 14, 2025

View reviewed changes

trie/stacktrie.go Outdated Show resolved Hide resolved

holiman force-pushed the stacktrie_allocs_3 branch from eccb604 to 82269e4 Compare January 14, 2025 12:54

rjl493456442 assigned rjl493456442 and fjl Aug 28, 2025

rjl493456442 force-pushed the stacktrie_allocs_3 branch from 82269e4 to 01f0826 Compare September 2, 2025 03:40

rjl493456442 changed the title ~~trie: [wip] reduce allocations in derivesha~~ core/types, trie: reduce allocations in derivesha Sep 2, 2025

rjl493456442 force-pushed the stacktrie_allocs_3 branch from 01f0826 to 6bf9153 Compare September 2, 2025 03:54

rjl493456442 marked this pull request as ready for review September 2, 2025 06:46

rjl493456442 requested review from gballet and rjl493456442 as code owners September 2, 2025 06:46

holiman and others added 6 commits September 29, 2025 20:55

core/types, trie: reduce allocs in derivesha

e645cdd

trie, core: use unsafe pool internally in stacktrie

f8412c2

core/types, internal, trie: redefine ListHasher interface

89546e1

trie, core: polish

8b1071f

trie: remove unused func

e578f61

Update list_hasher.go

c1930b3

fjl force-pushed the stacktrie_allocs_3 branch from 1dfbc99 to c1930b3 Compare September 29, 2025 19:00

fjl approved these changes Sep 29, 2025

View reviewed changes

fjl added this to the 1.16.5 milestone Sep 29, 2025

fjl merged commit 0576671 into ethereum:master Oct 1, 2025
5 of 6 checks passed

buddh0 mentioned this pull request Dec 31, 2025

upstream: merge geth-v1.16.2 ～ geth-v1.16.7 bnb-chain/bsc#3505

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core/types, trie: reduce allocations in derivesha #30747

core/types, trie: reduce allocations in derivesha #30747

Uh oh!

holiman commented Nov 12, 2024 •

edited

Loading

Uh oh!

Uh oh!

rjl493456442 commented Sep 2, 2025

Uh oh!

rjl493456442 commented Sep 2, 2025

Uh oh!

fjl commented Sep 2, 2025

Uh oh!

holiman commented Sep 5, 2025

Uh oh!

fjl commented Sep 8, 2025

Uh oh!

rjl493456442 commented Sep 8, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

core/types, trie: reduce allocations in derivesha #30747

core/types, trie: reduce allocations in derivesha #30747

Uh oh!

Conversation

holiman commented Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rjl493456442 commented Sep 2, 2025

Uh oh!

rjl493456442 commented Sep 2, 2025

Uh oh!

fjl commented Sep 2, 2025

Uh oh!

holiman commented Sep 5, 2025

Uh oh!

fjl commented Sep 8, 2025

Uh oh!

rjl493456442 commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

holiman commented Nov 12, 2024 •

edited

Loading

rjl493456442 commented Sep 8, 2025 •

edited

Loading