-
Notifications
You must be signed in to change notification settings - Fork 36.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wallet: Ensure best block matches wallet scan state #30221
base: master
Are you sure you want to change the base?
Conversation
The following sections might be updated with supplementary metadata relevant to reviewers and maintainers. Code Coverage & BenchmarksFor details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30221. ReviewsSee the guideline for information on the review process.
If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update. ConflictsReviewers, this pull request conflicts with the following ones:
If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first. |
I'm not sure, but it seems like a potentially good thing to me that The last block processed is the block that CWallet in-memory state has been synced to (particularly CWalletTx::m_state which includes mempool / abandoned information not serialized to disk). The BESTBLOCK record is the last block whose data has been flushed to disk, that the wallet should begin syncing from next time it is reloaded. It seems like the first commit ae3b828 could lead to a performance regression if it is writing the flushing the BESTBLOCK record on every single blockConnected event from the node. For example during reindexing it could write and sync the database each time a block is connected, even if the block is before the wallets birthday, or doesn't have any relevant transactions? I think it probably makes sense to write BESTBLOCK in a smarter way, and probably write it more frequently, but it seems unnecessary to have to write it every block, and it might be bad for performance now or make optimizations harder in the future. I'm also curious about the idea of saving height with the best block record. I could imagine this being useful, since IIRC parts of wallet loading are complicated by not knowing heights of transactions before the chain is attached, but it doesn't really seem like the height is used for much yet. Was this intended to be used by something else later? |
But none of that state is relevant to the last block processed. Any state related to blocks (confirmed or conflicted) is written to disk.
The point is that this discrepancy can result in skipping blocks. It doesn't make sense to me that we wouldn't store the block the wallet's state has been synced to, and it doesn't make sense that we should rely on a field that means something else to determine what point to start rescanning from on the next load. If BESTBLOCK doesn't match the chainstate, that's fine because it's a locator and we will just find the fork and sync from there. I don't think it's necessary to record which block we think the chainstate is synced to. |
You may be right this is the better approach. But I think the previous approach does make some sense, too. You might not want to write to the wallet database each time a block is connected if the block doesn't contain any relevant transactions, especially during reindexing. |
I think you would since not writing it would result in possibly rescanning that block at the next loading which takes a little bit of time, regardless of whether any relevant transactions are in the block.
Perhaps an easy solution to this is to just write the best block every 1000 blocks (or some other interval) when we are in IBD?
This was mainly done to avoid looking up the height every time we load since that was being problematic and causing a tests to fail. But it definitely could be more useful in the future. |
The idea of saving heights is interesting to me because wallet code assumes it know block heights many places, but it doesn't actually store heights anywhere. So for example, if the wallet stored a mapping of block hashes to heights (or other block information) it might be useful for allowing the wallet to work in an offline mode or letting the wallet CLI tool have more capabilities. This is all pretty abstract though, so I'm not suggesting something specific.
Yes but I think that just adds back the complexity you were trying to get rid of, in a form that seems worse than the status quo. If you implement that, the wallet will be back to tracking the last block processed separately from the best block written to the database. And now, instead of the node determining when data should be flushed to disk and having all wallets flush that data simultaneously, each wallet will have internal logic deciding when to flush. This would be more complicated, and could be worse for power consumption and performance if there are multiple wallets and flushes are happening more frequently at different times. I think overall this PR is doing some good things, but the goals seem either not clear, or not clearly good, so I wonder if maybe you could take these changes and make more targeted PRs for the things you want to achieve? Or, if this PR is definitely the direction you want to go, I'm happy to review it. I don't like some things it is doing, but overall I think it is a reasonable change. |
e5c9876
to
56c71b8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concept ACK
I can confirm that best block locator and last_processed_block
being different is confusing, see also #30455 (comment)
Currently, this needs a rebase and I'm curious if @achow101 plans to make further changes based on @ryanofsky 's comments.
test/functional/wallet_assumeutxo.py
Outdated
@@ -84,8 +85,6 @@ def run_test(self): | |||
assert_equal(n.getblockchaininfo()[ | |||
"headers"], SNAPSHOT_BASE_HEIGHT) | |||
|
|||
w.backupwallet("backup_w.dat") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that it might make sense to keep this backup where it is and add second backup created at 199. Then this could test that both cases work as expected, i.e. 199 errors below and 299 doesn't. 299 is an interesting edge case in general (snapshot height == backup height).
I wonder if something else could be done to resolve this confusion other than writing to every wallet file every time a new block is connected, even if the block doesn't contain any relevant transactions. To me, "last block processed" and "last block flushed" seem like different concepts, and we could force them to be the same but only if we:
|
Maybe mark as draft while CI is red? |
0967195
to
fd3dad7
Compare
fd3dad7
to
41e0ad0
Compare
-BEGIN VERIFY SCRIPT- sed -i "s/SetLastBlockProcessed/SetBestBlock/" $(git grep -l "SetLastBlockProcessed") sed -i "s/GetLastBlock/GetBestBlock/g" $(git grep -l "GetLastBlock") -END VERIFY SCRIPT-
m_last_block_processed and m_last_block_processed_height are combined into a new struct BestBlock.
The only reason to call chainStateFlushed during wallet loading is to ensure that the best block is written. Do these writes explicitly to prepare for removing chainStateFlushed.
The BestBlock struct now contains the block locator, and is also serialized with the block height. For backwards compatibility, the height is optional. The best block hash is retrieved from the block locator when deserializing. Since the best block record has to store the height, additional changes to AttachChain are made to faciliate auto upgrade as well as setting the in-memory best block.
Instead of setting the best block info later during AttachChain, read it during LoadWallet so it is set (if it exists) before we get there.
Instead of calling ReadBestBlock to get the best block locator, use the m_best_block.m_locator that we already have in CWallet.
Instead of writing the best block record directly during a rescan, use SetBestBlock. This is safe because progress is only saved in a rescan when doing a rescan on loading. cs_wallet is held the entire time so new blocks or reorgs will not cause the best block record to be changed unexpectedly.
chainStateFlushed is no longer needed since the best block is updated after a block is scanned. Since the chainstate being flushed does not necessarily coincide with the wallet having processed said block, it does not entirely make sense for the wallet to be recording that block as its best block, and this can cause race conditions where some blocks are not processed. Thus, remove this notification.
41e0ad0
to
d4c7013
Compare
Implements the idea discussed in #29652 (comment)
Currently,
m_last_block_processed
andm_last_block_processed_height
are not guaranteed to match the block locator stored in the wallet, nor do either of those fields actually represent the last block that the wallet is synced up to. This is confusing and unintuitive.This PR changes the those last block fields to actually be in sync with the record stored on disk. This requires adding the block height to the
BESTBLOCK_NOMERKLE
record, which was done in a backwards compatible manner. Additionally, the issue withchainStateFlushed
fixed in #29652 is also fixed here by removingchainStateFlushed
entirely. The last block fields, now simply the best block, is updated for eachblockConnected
andblockDisconnected
so that it does actually represent the block for which the wallet is synced up to.To make it easier to deal with, the best block fields are consolidated into a
BestBlock
struct which also has the serialization code with backwards compatibility.