Skip to content

core/state: lock-free subfetcher #30782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

ARR4N
Copy link
Contributor

@ARR4N ARR4N commented Nov 21, 2024

Trie prefetching with a state.subfetcher previously relied on a pattern of {lock, set tasks, unlock, send void on wake channel}, which I've collapsed into a single channel send. This allows the lock to be removed entirely.

The old implementation had a subtle race condition between schedule() and terminate() (demonstrated in a test), which is fixed by this new pattern. There was a guarantee of task handoff but not of receipt by loop() because of the wake channel's buffering and deliberate dropping of excess sends.

The new tasks channel is never closed, just like the old wake one wasn't. If that needs to be addressed I think it's best done in another PR.

I know that #30629 made significant allocation improvements and I don't want to inadvertently make things worse, but I can't find the benchmarks that were used there. Happy to run them if someone can please point me in the right direction.

Note

The code changed significantly after the first review.

See Go Blog: Share Memory By Communicating for background.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards rejecting this PR.

Aside from the pretty large change in semantics, I also don't like the nested selects. IMO it's un-intuitive and hard to reason about.

// Wake signal sent
default:
// Wake signal not sent as a previous one is already queued
case sf.tasks <- tasks:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now blocking. Previously, any number of threads could append to the sf.tasks, and they could all exit afterwards, since they select was not blocking.

Now, instead, they all allocate their own slice of tasks, and each thread need to wait until it has been delivered over the sf.tasks channel.

This seems to be quite a bit change in semantics, yet not something you mentioned on the PR description. I'm guessing it was unintentional. Unless I have misunderstood something, I think this highlights why we should reject this PR: it touches some very intricate parts of geth, which currently works fine. It's very easy to slip when making changes here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've responsed to this in main conversation thread.

@ARR4N
Copy link
Contributor Author

ARR4N commented Nov 27, 2024

I'm gonna put this back into draft while I try to address your concerns, @holiman . If I don't think I can get to something cleaner then I'll close it.

@ARR4N ARR4N marked this pull request as draft November 27, 2024 09:47
@ARR4N
Copy link
Contributor Author

ARR4N commented Nov 27, 2024

Thank you for the thorough review and for the pushback @holiman—both are appreciated and made me completely rethink this. I won't object if you still want to reject the PR based on the slippery terrain, but I at least wanted to offer something better.

There's now only a small (non-blocking) "fan-in" go routine and a channel range over the fanned in work. Each element of the pipeline is now self-contained so it's easy to reason about. To me, the wake channel suggested an anti-pattern and it's now unnecessary due to native channel semantics.

I'll move this out of draft once I've thought it through a little more, but if you still want to close the PR then please do so I don't waste time.

I also don't like the nested selects. IMO it's un-intuitive and hard to reason about.

Removed

This is now blocking. ... I'm guessing it was unintentional.

Yes, it was.

This is effectively the same thing as both wait on `<-sf.term` but it makes for a clearer test that demonstrates the bug because `terminate(false)` is documented as blocking until "all disk loads finish".
@ARR4N
Copy link
Contributor Author

ARR4N commented Nov 28, 2024

In redesigning following holiman's review, I found a race condition in the old implementation that this also fixes. It can be demonstrated by running TestSchdeulerTerminationRaceCondition against master.

@ARR4N ARR4N marked this pull request as ready for review November 28, 2024 12:59
@holiman
Copy link
Contributor

holiman commented Dec 3, 2024

I found a race condition in the old implementation that this also fixes.

IIUC, this race happens if new tasks are added while we're closing the thing. However, as far as I know, we add tasks while executing transactions in the block, and only close it after we're done with the execution.

So, from a unit-testing perspective, it's a race, but given the context it's being used in, it's fine. Does this sound correct?

@rjl493456442
Copy link
Member

@holiman I have rewritten the implementation a bit and the nested select has been removed. https://github.com/rjl493456442/go-ethereum/tree/arr4n/lockless-trie-subfetcher

I am a bit hesitant. You are right the we will never invoke terminate and schedule at the same time. The existing data race is not a real problem for us. But as Felix said that using channel and lock is a bit wonky.

I would like to get some preliminary feedback about my changes first and then to decide whether changing the code is worthwhile.

@ARR4N
Copy link
Contributor Author

ARR4N commented Dec 4, 2024

So, from a unit-testing perspective, it's a race, but given the context it's being used in, it's fine. Does this sound correct?

Yup, as you both think that scheduling and termination won't coincide then I agree that the race is theoretical. The test can definitely be excluded however I do think the rest of the changes benefit the code w.r.t clarity and general correctness of implementation vs expectation (that scheduled work is performed and sf.terminate(false) blocks accordingly).

But as Felix said that using channel and lock is a bit wonky.

This is what motivated me to try find something cleaner. My first attempt wasn't great but I think the latest version (sans test) removes as much complexity as possible.

@holiman I have rewritten the implementation a bit and the nested select has been removed. https://github.com/rjl493456442/go-ethereum/tree/arr4n/lockless-trie-subfetcher

@rjl493456442 I see that your changes were on top of my rewrite that also removed the nested select. Was there something in particular that you wanted to improve on?

I would like to get some preliminary feedback about my changes first and then to decide whether changing the code is worthwhile.

I find it a little difficult to reason about the replacement and checking of the done channel plus the multiple spawning points of sf.process(). IIUC the goal is to have only one background processor at a time, which I think is simpler with a channel range because it allows the done channel (and its case) to be removed.

@rjl493456442
Copy link
Member

Was there something in particular that you wanted to improve on?

Yep, nested select looks a bit hacky to me.

IIUC the goal is to have only one background processor at a time

Yes it's true. At most one process be spawned at the same time.

The done channel is used to not block the task buffering even when the process is running.

@holiman
Copy link
Contributor

holiman commented Dec 5, 2024

So, let's imagine we invoke schedule 1000 times very quickly. On master,

  • All tasks are appended to the sf.tasks. The main loop is woken once, then maybe couple of times more to "re-arm" the tasks.
  • All tasks, once they are appended to the sf.tasks are sequential. So, if the 1000 invokations are adding one task each, 1 to 1000, then they get executed in strict order (just mentioning this attribute, but I will admit that I don't think strict ordering of tasks is necessary in this case)

As far as I can tell of the current state of this PR, in @ARR4N 's version,

  • The main loop will greedily pop items off the sf.tasks, and spin up goroutines. So it will have 1000 goroutines trying to deliver their tasks on work.
  • Once the goroutines are trying to deliver their tasks, they lose ordering (between batches, not internally. But in the exampe of 1000 invocations with 1 task each, we lose all ordering).

So IMO that exhibits two behavioural changes that seem to be unintentional side-effects. Neither of those behavioural changes are good, IMO.

As for @rjl493456442 's changes

  • The tasks will be appended to, just like on master: the difference being that the appending happens within the main loop, instead of in the callsite. So avoiding the need to have a lock.
  • Ordering is maintained.
  • I do think this version is slightly cleaner than current master: without the lock, and with the processing moved to a standalone method.
  • On master, it's trivially obvious that adding a subfetcher task cannot block. Not so with this version: adding a task requires the main loop to receive it. Which should be a fairly quick operation, though.
  • This version (well, both) will have a lot more short-lived goroutines than master. Which might be fine.

IMO the changes in this PR are not acceptable, so I'll close this PR.

The changes in @rjl493456442 's version are, I think, possibly acceptable. I'd rather discuss those in a separate PR, if you want to pursue it.

@holiman holiman closed this Dec 5, 2024
@ARR4N
Copy link
Contributor Author

ARR4N commented Dec 5, 2024

IMO the changes in this PR are not acceptable, so I'll close this PR.

Thanks for considering it and for the very thorough reviews.

Yep, nested select looks a bit hacky to me.

I think you might have mixed up the original nested with the updated, non-nested version. I'm happy with @holiman's decision given the rationale so sharing this only in case you want to see an alternate approach, @rjl493456442.

@holiman
Copy link
Contributor

holiman commented Dec 5, 2024

I think you might have mixed up the original nested with the updated, non-nested version.

I think he referrred to this:

Screenshot 2024-12-05 at 10-18-36 go-ethereum_core_state_trie_prefetcher go at e15cb3e553ac8701e8308a605d0900ae4f9db25d · ARR4N_go-ethereum

Such towers are pretty non-intuitive, IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants