wallet: Ensure best block matches wallet scan state #30221

achow101 · 2024-06-03T21:58:09Z

Implements the idea discussed in #29652 (comment)

Currently, m_last_block_processed and m_last_block_processed_height are not guaranteed to match the block locator stored in the wallet, nor do either of those fields actually represent the last block that the wallet is synced up to. This is confusing and unintuitive.

This PR changes the those last block fields to actually be in sync with the record stored on disk. This requires adding the block height to the BESTBLOCK_NOMERKLE record, which was done in a backwards compatible manner. Additionally, the issue with chainStateFlushed fixed in #29652 is also fixed here by removing chainStateFlushed entirely. The last block fields, now simply the best block, is updated for each blockConnected and blockDisconnected so that it does actually represent the block for which the wallet is synced up to.

To make it easier to deal with, the best block fields are consolidated into a BestBlock struct which also has the serialization code with backwards compatibility.

DrahtBot · 2024-06-03T21:58:12Z

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage & Benchmarks

For details see: https://corecheck.dev/bitcoin/bitcoin/pulls/30221.

Reviews

See the guideline for information on the review process.

Type	Reviewers
Concept ACK	fjahr

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

#31296 (wallet: Translate [default wallet] string in progress messages by ryanofsky)
#31250 (wallet: Disable creating and loading legacy wallets by achow101)
#31061 (refactor: Check translatable format strings at compile-time by maflcko)
#30909 (wallet, assumeutxo: Don't Assume m_chain_tx_count, Improve wallet RPC errors by fjahr)
#30343 (wallet, logging: Replace WalletLogPrintf() with LogInfo() by ryanofsky)
#29652 (wallet: Avoid potentially writing incorrect best block locator by ryanofsky)
#28710 (Remove the legacy wallet and BDB dependency by achow101)
#28616 (Show transactions as not fully confirmed during background validation by Sjors)
#27865 (wallet: Track no-longer-spendable TXOs separately by achow101)
#27286 (wallet: Keep track of the wallet's own transaction outputs in memory by achow101)
#25722 (refactor: Use util::Result class for wallet loading by ryanofsky)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

ryanofsky · 2024-06-04T15:34:54Z

I'm not sure, but it seems like a potentially good thing to me that last_block_processed and BESTBLOCK are two distinct concepts.

The last block processed is the block that CWallet in-memory state has been synced to (particularly CWalletTx::m_state which includes mempool / abandoned information not serialized to disk).

The BESTBLOCK record is the last block whose data has been flushed to disk, that the wallet should begin syncing from next time it is reloaded.

It seems like the first commit ae3b828 could lead to a performance regression if it is writing the flushing the BESTBLOCK record on every single blockConnected event from the node. For example during reindexing it could write and sync the database each time a block is connected, even if the block is before the wallets birthday, or doesn't have any relevant transactions?

I think it probably makes sense to write BESTBLOCK in a smarter way, and probably write it more frequently, but it seems unnecessary to have to write it every block, and it might be bad for performance now or make optimizations harder in the future.

I'm also curious about the idea of saving height with the best block record. I could imagine this being useful, since IIRC parts of wallet loading are complicated by not knowing heights of transactions before the chain is attached, but it doesn't really seem like the height is used for much yet. Was this intended to be used by something else later?

achow101 · 2024-06-04T15:48:43Z

The last block processed is the block that CWallet in-memory state has been synced to (particularly CWalletTx::m_state which includes mempool / abandoned information not serialized to disk).

But none of that state is relevant to the last block processed. Any state related to blocks (confirmed or conflicted) is written to disk.

The BESTBLOCK record is the last block whose data has been flushed to disk, that the wallet should begin syncing from next time it is reloaded.

The point is that this discrepancy can result in skipping blocks. It doesn't make sense to me that we wouldn't store the block the wallet's state has been synced to, and it doesn't make sense that we should rely on a field that means something else to determine what point to start rescanning from on the next load.

If BESTBLOCK doesn't match the chainstate, that's fine because it's a locator and we will just find the fork and sync from there. I don't think it's necessary to record which block we think the chainstate is synced to.

ryanofsky · 2024-06-04T15:56:56Z

It doesn't make sense to me that we wouldn't store the block the wallet's state has been synced to

You may be right this is the better approach. But I think the previous approach does make some sense, too. You might not want to write to the wallet database each time a block is connected if the block doesn't contain any relevant transactions, especially during reindexing.

achow101 · 2024-06-04T21:03:52Z

You might not want to write to the wallet database each time a block is connected if the block doesn't contain any relevant transactions

I think you would since not writing it would result in possibly rescanning that block at the next loading which takes a little bit of time, regardless of whether any relevant transactions are in the block.

especially during reindexing.

Perhaps an easy solution to this is to just write the best block every 1000 blocks (or some other interval) when we are in IBD?

I'm also curious about the idea of saving height with the best block record. I could imagine this being useful, since IIRC parts of wallet loading are complicated by not knowing heights of transactions before the chain is attached, but it doesn't really seem like the height is used for much yet. Was this intended to be used by something else later?

This was mainly done to avoid looking up the height every time we load since that was being problematic and causing a tests to fail. But it definitely could be more useful in the future.

ryanofsky · 2024-06-05T15:16:32Z

The idea of saving heights is interesting to me because wallet code assumes it know block heights many places, but it doesn't actually store heights anywhere. So for example, if the wallet stored a mapping of block hashes to heights (or other block information) it might be useful for allowing the wallet to work in an offline mode or letting the wallet CLI tool have more capabilities. This is all pretty abstract though, so I'm not suggesting something specific.

Perhaps an easy solution to this is to just write the best block every 1000 blocks (or some other interval) when we are in IBD?

Yes but I think that just adds back the complexity you were trying to get rid of, in a form that seems worse than the status quo. If you implement that, the wallet will be back to tracking the last block processed separately from the best block written to the database. And now, instead of the node determining when data should be flushed to disk and having all wallets flush that data simultaneously, each wallet will have internal logic deciding when to flush. This would be more complicated, and could be worse for power consumption and performance if there are multiple wallets and flushes are happening more frequently at different times.

I think overall this PR is doing some good things, but the goals seem either not clear, or not clearly good, so I wonder if maybe you could take these changes and make more targeted PRs for the things you want to achieve?

Or, if this PR is definitely the direction you want to go, I'm happy to review it. I don't like some things it is doing, but overall I think it is a reasonable change.

fjahr

Concept ACK

I can confirm that best block locator and last_processed_block being different is confusing, see also #30455 (comment)

Currently, this needs a rebase and I'm curious if @achow101 plans to make further changes based on @ryanofsky 's comments.

fjahr · 2024-08-20T11:18:28Z

test/functional/wallet_assumeutxo.py

@@ -84,8 +85,6 @@ def run_test(self):
            assert_equal(n.getblockchaininfo()[
                         "headers"], SNAPSHOT_BASE_HEIGHT)

-        w.backupwallet("backup_w.dat")


I'm thinking that it might make sense to keep this backup where it is and add second backup created at 199. Then this could test that both cases work as expected, i.e. 199 errors below and 299 doesn't. 299 is an interesting edge case in general (snapshot height == backup height).

ryanofsky · 2024-08-20T14:28:07Z

I can confirm that best block locator and last_processed_block being different is confusing

I wonder if something else could be done to resolve this confusion other than writing to every wallet file every time a new block is connected, even if the block doesn't contain any relevant transactions. To me, "last block processed" and "last block flushed" seem like different concepts, and we could force them to be the same but only if we:

Give up flexibility of not needing to flush each wallet each time a block is connected (which seems inefficient during sync)
Give up ability to do coordinated flushes across the chainstate database, index databases and wallet databases time (which seems like it could waste resources and hurt system performance)

maflcko · 2024-10-16T15:35:56Z

Maybe mark as draft while CI is red?

-BEGIN VERIFY SCRIPT- sed -i "s/SetLastBlockProcessed/SetBestBlock/" $(git grep -l "SetLastBlockProcessed") sed -i "s/GetLastBlock/GetBestBlock/g" $(git grep -l "GetLastBlock") -END VERIFY SCRIPT-

m_last_block_processed and m_last_block_processed_height are combined into a new struct BestBlock.

The only reason to call chainStateFlushed during wallet loading is to ensure that the best block is written. Do these writes explicitly to prepare for removing chainStateFlushed.

The BestBlock struct now contains the block locator, and is also serialized with the block height. For backwards compatibility, the height is optional. The best block hash is retrieved from the block locator when deserializing. Since the best block record has to store the height, additional changes to AttachChain are made to faciliate auto upgrade as well as setting the in-memory best block.

Instead of setting the best block info later during AttachChain, read it during LoadWallet so it is set (if it exists) before we get there.

Instead of calling ReadBestBlock to get the best block locator, use the m_best_block.m_locator that we already have in CWallet.

Instead of writing the best block record directly during a rescan, use SetBestBlock. This is safe because progress is only saved in a rescan when doing a rescan on loading. cs_wallet is held the entire time so new blocks or reorgs will not cause the best block record to be changed unexpectedly.

chainStateFlushed is no longer needed since the best block is updated after a block is scanned. Since the chainstate being flushed does not necessarily coincide with the wallet having processed said block, it does not entirely make sense for the wallet to be recording that block as its best block, and this can cause race conditions where some blocks are not processed. Thus, remove this notification.

DrahtBot added the Wallet label Jun 3, 2024

achow101 mentioned this pull request Jun 3, 2024

wallet: Avoid potentially writing incorrect best block locator #29652

Draft

Thompson1985 approved these changes Jun 4, 2024

View reviewed changes

This was referenced Jun 4, 2024

refactor: Improve assumeutxo state representation #30214

Open

refactor, wallet: get serialized size of CRecipients directly #30050

Merged

fuzz: wallet: add target for CreateTransaction #29936

Merged

DrahtBot mentioned this pull request Jun 8, 2024

Remove the legacy wallet and BDB dependency #28710

Draft

DrahtBot mentioned this pull request Jun 18, 2024

refactor: Use util::Result class for wallet loading #25722

Draft

DrahtBot mentioned this pull request Jun 27, 2024

wallet, logging: Replace WalletLogPrintf() with LogInfo() #30343

Draft

DrahtBot mentioned this pull request Jul 20, 2024

fuzz: reduce keypool size in scriptpubkeyman target #30494

Merged

DrahtBot added the Needs rebase label Jul 22, 2024

achow101 force-pushed the wallet-no-chainstateflushed branch from e5c9876 to 56c71b8 Compare July 22, 2024 21:13

DrahtBot removed the Needs rebase label Jul 22, 2024

DrahtBot mentioned this pull request Aug 1, 2024

fuzz: improve scriptpubkeyman target #30563

Merged

DrahtBot added the Needs rebase label Aug 12, 2024

furszy mentioned this pull request Aug 20, 2024

wallet: Write best block to disk before backup #30678

Merged

fjahr reviewed Aug 20, 2024

View reviewed changes

DrahtBot added the Needs rebase label Oct 24, 2024

achow101 force-pushed the wallet-no-chainstateflushed branch from 0967195 to fd3dad7 Compare October 24, 2024 22:36

DrahtBot removed the Needs rebase label Oct 24, 2024

DrahtBot mentioned this pull request Oct 25, 2024

tinyformat: enforce compile-time checks for format string literals #31149

Closed

DrahtBot added the Needs rebase label Oct 25, 2024

achow101 force-pushed the wallet-no-chainstateflushed branch from fd3dad7 to 41e0ad0 Compare October 25, 2024 19:20

DrahtBot removed Needs rebase CI failed labels Oct 25, 2024

This was referenced Nov 7, 2024

wallet: Disable creating and loading legacy wallets #31250

Draft

scripted-diff: Type-safe settings retrieval #31260

Open

DrahtBot added the Needs rebase label Nov 15, 2024

achow101 added 10 commits November 19, 2024 12:33

wallet: Update best block record after block dis/connect

2490986

scripted-diff: Rename LastBlock to BestBlock

39bfbc1

-BEGIN VERIFY SCRIPT- sed -i "s/SetLastBlockProcessed/SetBestBlock/" $(git grep -l "SetLastBlockProcessed") sed -i "s/GetLastBlock/GetBestBlock/g" $(git grep -l "GetLastBlock") -END VERIFY SCRIPT-

wallet: Combine last block processed hash and height into a struct

e17cc1f

m_last_block_processed and m_last_block_processed_height are combined into a new struct BestBlock.

wallet: Replace chainStateFlushed in loading with WriteBestBlock

3c951f7

The only reason to call chainStateFlushed during wallet loading is to ensure that the best block is written. Do these writes explicitly to prepare for removing chainStateFlushed.

wallet: Remove WriteBestBlock from chainStateFlushed

bab0d89

wallet: Load best block record in LoadWallet

64facd5

Instead of setting the best block info later during AttachChain, read it during LoadWallet so it is set (if it exists) before we get there.

wallet: Use best block locator in migrate instead of re-reading it

1f209f9

Instead of calling ReadBestBlock to get the best block locator, use the m_best_block.m_locator that we already have in CWallet.

achow101 force-pushed the wallet-no-chainstateflushed branch from 41e0ad0 to d4c7013 Compare November 19, 2024 17:34

DrahtBot removed the Needs rebase label Nov 19, 2024

DrahtBot mentioned this pull request Nov 20, 2024

wallet: Translate [default wallet] string in progress messages #31296

Open

psgreco mentioned this pull request Dec 12, 2024

Potential crash (assert) rescanning wallet #31474

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wallet: Ensure best block matches wallet scan state #30221

wallet: Ensure best block matches wallet scan state #30221

achow101 commented Jun 3, 2024

DrahtBot commented Jun 3, 2024 •

edited

Loading

ryanofsky commented Jun 4, 2024

achow101 commented Jun 4, 2024

ryanofsky commented Jun 4, 2024

achow101 commented Jun 4, 2024

ryanofsky commented Jun 5, 2024 •

edited

Loading

fjahr left a comment

fjahr Aug 20, 2024

ryanofsky commented Aug 20, 2024

maflcko commented Oct 16, 2024

wallet: Ensure best block matches wallet scan state #30221

Are you sure you want to change the base?

wallet: Ensure best block matches wallet scan state #30221

Conversation

achow101 commented Jun 3, 2024

DrahtBot commented Jun 3, 2024 • edited Loading

Code Coverage & Benchmarks

Reviews

Conflicts

ryanofsky commented Jun 4, 2024

achow101 commented Jun 4, 2024

ryanofsky commented Jun 4, 2024

achow101 commented Jun 4, 2024

ryanofsky commented Jun 5, 2024 • edited Loading

fjahr left a comment

Choose a reason for hiding this comment

fjahr Aug 20, 2024

Choose a reason for hiding this comment

ryanofsky commented Aug 20, 2024

maflcko commented Oct 16, 2024

DrahtBot commented Jun 3, 2024 •

edited

Loading

ryanofsky commented Jun 5, 2024 •

edited

Loading