fix: Improve disk usage in HA by eagerly deleting received WAL file on the replica#3300
Merged
fix: Improve disk usage in HA by eagerly deleting received WAL file on the replica#3300
Conversation
Contributor
Author
Tracking
Standard development
CI Testing Labels
Documentation checklist
|
7b2e023 to
f642567
Compare
f642567 to
59a3241
Compare
11 tasks
b4f5e4f to
71e5fd6
Compare
|
Please retry analysis of this Pull-Request directly on SonarQube Cloud |
50 tasks
andrejtonev
approved these changes
Sep 30, 2025
as51340
added a commit
that referenced
this pull request
Oct 24, 2025
…n the replica (#3300) For a brief period of time there will be snapshot_retention_count+1 snapshots in the system because we first create a snapshot and then delete the oldest one -> important when doing a capacity planning on the K8s side. Replica doesn't manage efficiently disk space: - its WAL files and possibly received snapshots never get deleted (in normal circumstances, replica up, main up) because the cleaning of WAL files is done only when the snapshot is created and replicas don't create snapshots. On replica they get transferred to .old files if SnapshotRpc or WalFilesRpc is received as part of the force reset operation. WAL files get cleared on the current main only when there are exactly storage_snapshot_retention_count snapshots in the system, including the current one. The PR adds the following: - replica will delete the WAL file it received from the current main. - found corrupted snapshot won't be deleted anymore - tested and refactored retention code for WAL files and snapshots - refactored `GetRecoverySteps` code
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For a brief period of time there will be snapshot_retention_count+1 snapshots in the system because we first create a snapshot and then delete the oldest one -> important when doing a capacity planning on the K8s side.
Replica doesn't manage efficiently disk space:
WAL files get cleared on the current main only when there are exactly storage_snapshot_retention_count snapshots in the system, including the current one.
The PR adds the following:
GetRecoveryStepscode