core/state: lock-free `subfetcher` #30782

ARR4N · 2024-11-21T14:03:33Z

Trie prefetching with a state.subfetcher previously relied on a pattern of {lock, set tasks, unlock, send void on wake channel}, which I've collapsed into a single channel send. This allows the lock to be removed entirely.

The old implementation had a subtle race condition between schedule() and terminate() (demonstrated in a test), which is fixed by this new pattern. There was a guarantee of task handoff but not of receipt by loop() because of the wake channel's buffering and deliberate dropping of excess sends.

The new tasks channel is never closed, just like the old wake one wasn't. If that needs to be addressed I think it's best done in another PR.

I know that #30629 made significant allocation improvements and I don't want to inadvertently make things worse, but I can't find the benchmarks that were used there. Happy to run them if someone can please point me in the right direction.

Note

The code changed significantly after the first review.

See Go Blog: Share Memory By Communicating for background.

holiman

I'm leaning towards rejecting this PR.

Aside from the pretty large change in semantics, I also don't like the nested selects. IMO it's un-intuitive and hard to reason about.

holiman · 2024-11-27T06:36:43Z

core/state/trie_prefetcher.go

-		// Wake signal sent
-	default:
-		// Wake signal not sent as a previous one is already queued
+	case sf.tasks <- tasks:


This is now blocking. Previously, any number of threads could append to the sf.tasks, and they could all exit afterwards, since they select was not blocking.

Now, instead, they all allocate their own slice of tasks, and each thread need to wait until it has been delivered over the sf.tasks channel.

This seems to be quite a bit change in semantics, yet not something you mentioned on the PR description. I'm guessing it was unintentional. Unless I have misunderstood something, I think this highlights why we should reject this PR: it touches some very intricate parts of geth, which currently works fine. It's very easy to slip when making changes here.

I've responsed to this in main conversation thread.

core/state/trie_prefetcher.go

ARR4N · 2024-11-27T09:47:36Z

I'm gonna put this back into draft while I try to address your concerns, @holiman . If I don't think I can get to something cleaner then I'll close it.

ARR4N · 2024-11-27T11:20:16Z

Thank you for the thorough review and for the pushback @holiman—both are appreciated and made me completely rethink this. I won't object if you still want to reject the PR based on the slippery terrain, but I at least wanted to offer something better.

There's now only a small (non-blocking) "fan-in" go routine and a channel range over the fanned in work. Each element of the pipeline is now self-contained so it's easy to reason about. To me, the wake channel suggested an anti-pattern and it's now unnecessary due to native channel semantics.

I'll move this out of draft once I've thought it through a little more, but if you still want to close the PR then please do so I don't waste time.

I also don't like the nested selects. IMO it's un-intuitive and hard to reason about.

Removed

This is now blocking. ... I'm guessing it was unintentional.

Yes, it was.

This is effectively the same thing as both wait on `<-sf.term` but it makes for a clearer test that demonstrates the bug because `terminate(false)` is documented as blocking until "all disk loads finish".

ARR4N · 2024-11-28T12:59:02Z

In redesigning following holiman's review, I found a race condition in the old implementation that this also fixes. It can be demonstrated by running TestSchdeulerTerminationRaceCondition against master.

holiman · 2024-12-03T13:58:11Z

I found a race condition in the old implementation that this also fixes.

IIUC, this race happens if new tasks are added while we're closing the thing. However, as far as I know, we add tasks while executing transactions in the block, and only close it after we're done with the execution.

So, from a unit-testing perspective, it's a race, but given the context it's being used in, it's fine. Does this sound correct?

rjl493456442 · 2024-12-04T05:45:51Z

@holiman I have rewritten the implementation a bit and the nested select has been removed. https://github.com/rjl493456442/go-ethereum/tree/arr4n/lockless-trie-subfetcher

I am a bit hesitant. You are right the we will never invoke terminate and schedule at the same time. The existing data race is not a real problem for us. But as Felix said that using channel and lock is a bit wonky.

I would like to get some preliminary feedback about my changes first and then to decide whether changing the code is worthwhile.

ARR4N · 2024-12-04T11:13:21Z

So, from a unit-testing perspective, it's a race, but given the context it's being used in, it's fine. Does this sound correct?

Yup, as you both think that scheduling and termination won't coincide then I agree that the race is theoretical. The test can definitely be excluded however I do think the rest of the changes benefit the code w.r.t clarity and general correctness of implementation vs expectation (that scheduled work is performed and sf.terminate(false) blocks accordingly).

But as Felix said that using channel and lock is a bit wonky.

This is what motivated me to try find something cleaner. My first attempt wasn't great but I think the latest version (sans test) removes as much complexity as possible.

@holiman I have rewritten the implementation a bit and the nested select has been removed. https://github.com/rjl493456442/go-ethereum/tree/arr4n/lockless-trie-subfetcher

@rjl493456442 I see that your changes were on top of my rewrite that also removed the nested select. Was there something in particular that you wanted to improve on?

I would like to get some preliminary feedback about my changes first and then to decide whether changing the code is worthwhile.

I find it a little difficult to reason about the replacement and checking of the done channel plus the multiple spawning points of sf.process(). IIUC the goal is to have only one background processor at a time, which I think is simpler with a channel range because it allows the done channel (and its case) to be removed.

rjl493456442 · 2024-12-05T01:37:51Z

Was there something in particular that you wanted to improve on?

Yep, nested select looks a bit hacky to me.

IIUC the goal is to have only one background processor at a time

Yes it's true. At most one process be spawned at the same time.

The done channel is used to not block the task buffering even when the process is running.

holiman · 2024-12-05T08:04:46Z

So, let's imagine we invoke schedule 1000 times very quickly. On master,

All tasks are appended to the sf.tasks. The main loop is woken once, then maybe couple of times more to "re-arm" the tasks.
All tasks, once they are appended to the sf.tasks are sequential. So, if the 1000 invokations are adding one task each, 1 to 1000, then they get executed in strict order (just mentioning this attribute, but I will admit that I don't think strict ordering of tasks is necessary in this case)

As far as I can tell of the current state of this PR, in @ARR4N 's version,

The main loop will greedily pop items off the sf.tasks, and spin up goroutines. So it will have 1000 goroutines trying to deliver their tasks on work.
Once the goroutines are trying to deliver their tasks, they lose ordering (between batches, not internally. But in the exampe of 1000 invocations with 1 task each, we lose all ordering).

So IMO that exhibits two behavioural changes that seem to be unintentional side-effects. Neither of those behavioural changes are good, IMO.

As for @rjl493456442 's changes

The tasks will be appended to, just like on master: the difference being that the appending happens within the main loop, instead of in the callsite. So avoiding the need to have a lock.
Ordering is maintained.
I do think this version is slightly cleaner than current master: without the lock, and with the processing moved to a standalone method.
On master, it's trivially obvious that adding a subfetcher task cannot block. Not so with this version: adding a task requires the main loop to receive it. Which should be a fairly quick operation, though.
This version (well, both) will have a lot more short-lived goroutines than master. Which might be fine.

IMO the changes in this PR are not acceptable, so I'll close this PR.

The changes in @rjl493456442 's version are, I think, possibly acceptable. I'd rather discuss those in a separate PR, if you want to pursue it.

ARR4N · 2024-12-05T08:24:47Z

IMO the changes in this PR are not acceptable, so I'll close this PR.

Thanks for considering it and for the very thorough reviews.

Yep, nested select looks a bit hacky to me.

I think you might have mixed up the original nested with the updated, non-nested version. I'm happy with @holiman's decision given the rationale so sharing this only in case you want to see an alternate approach, @rjl493456442.

holiman · 2024-12-05T09:20:22Z

I think you might have mixed up the original nested with the updated, non-nested version.

I think he referrred to this:

Such towers are pretty non-intuitive, IMO.

core/state: lock-free subfetcher

ed463c1

ARR4N requested review from karalabe, holiman and rjl493456442 as code owners November 21, 2024 14:03

rjl493456442 self-assigned this Nov 26, 2024

holiman reviewed Nov 27, 2024

View reviewed changes

ARR4N marked this pull request as draft November 27, 2024 09:47

ARR4N added 2 commits November 27, 2024 10:38

refactor: simplify select to exclude task handling

86da86c

refactor: schedule() doesn't have to wait for sf.term

fde127e

ARR4N added 2 commits November 28, 2024 12:47

test: demonstrate fix of race condition in locking impl

2e75e74

test: block on terminate(false) instead of wait()

5b4ac11

This is effectively the same thing as both wait on `<-sf.term` but it makes for a clearer test that demonstrates the bug because `terminate(false)` is documented as blocking until "all disk loads finish".

ARR4N marked this pull request as ready for review November 28, 2024 12:59

chore: report aggregate race conditions in test

e15cb3e

fjl assigned holiman Dec 3, 2024

holiman closed this Dec 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

core/state: lock-free `subfetcher` #30782

core/state: lock-free `subfetcher` #30782

Uh oh!

ARR4N commented Nov 21, 2024 •

edited

Loading

Uh oh!

holiman left a comment

Uh oh!

holiman Nov 27, 2024

Uh oh!

ARR4N Nov 27, 2024

Uh oh!

Uh oh!

ARR4N commented Nov 27, 2024

Uh oh!

ARR4N commented Nov 27, 2024

Uh oh!

ARR4N commented Nov 28, 2024 •

edited

Loading

Uh oh!

holiman commented Dec 3, 2024

Uh oh!

rjl493456442 commented Dec 4, 2024

Uh oh!

ARR4N commented Dec 4, 2024

Uh oh!

rjl493456442 commented Dec 5, 2024

Uh oh!

holiman commented Dec 5, 2024

Uh oh!

ARR4N commented Dec 5, 2024

Uh oh!

holiman commented Dec 5, 2024

Uh oh!

Uh oh!

core/state: lock-free subfetcher #30782

core/state: lock-free subfetcher #30782

Uh oh!

Conversation

ARR4N commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holiman left a comment

Choose a reason for hiding this comment

Uh oh!

holiman Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

ARR4N Nov 27, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ARR4N commented Nov 27, 2024

Uh oh!

ARR4N commented Nov 27, 2024

Uh oh!

ARR4N commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

holiman commented Dec 3, 2024

Uh oh!

rjl493456442 commented Dec 4, 2024

Uh oh!

ARR4N commented Dec 4, 2024

Uh oh!

rjl493456442 commented Dec 5, 2024

Uh oh!

holiman commented Dec 5, 2024

Uh oh!

ARR4N commented Dec 5, 2024

Uh oh!

holiman commented Dec 5, 2024

Uh oh!

Uh oh!

core/state: lock-free `subfetcher` #30782

core/state: lock-free `subfetcher` #30782

ARR4N commented Nov 21, 2024 •

edited

Loading

ARR4N commented Nov 28, 2024 •

edited

Loading