fix: Improve disk usage in HA by eagerly deleting received WAL file on the replica by as51340 · Pull Request #3300 · memgraph/memgraph

as51340 · 2025-09-25T08:35:52Z

For a brief period of time there will be snapshot_retention_count+1 snapshots in the system because we first create a snapshot and then delete the oldest one -> important when doing a capacity planning on the K8s side.

Replica doesn't manage efficiently disk space:

its WAL files and possibly received snapshots never get deleted (in normal circumstances, replica up, main up) because the cleaning of WAL files is done only when the snapshot is created and replicas don't create snapshots. On replica they get transferred to .old files if SnapshotRpc or WalFilesRpc is received as part of the force reset operation.

WAL files get cleared on the current main only when there are exactly storage_snapshot_retention_count snapshots in the system, including the current one.

The PR adds the following:

replica will delete the WAL file it received from the current main.
found corrupted snapshot won't be deleted anymore
tested and refactored retention code for WAL files and snapshots
refactored GetRecoverySteps code

as51340 · 2025-09-25T08:36:02Z

sonarqubecloud · 2025-09-29T09:14:28Z

Please retry analysis of this Pull-Request directly on SonarQube Cloud

…n the replica (#3300) For a brief period of time there will be snapshot_retention_count+1 snapshots in the system because we first create a snapshot and then delete the oldest one -> important when doing a capacity planning on the K8s side. Replica doesn't manage efficiently disk space: - its WAL files and possibly received snapshots never get deleted (in normal circumstances, replica up, main up) because the cleaning of WAL files is done only when the snapshot is created and replicas don't create snapshots. On replica they get transferred to .old files if SnapshotRpc or WalFilesRpc is received as part of the force reset operation. WAL files get cleared on the current main only when there are exactly storage_snapshot_retention_count snapshots in the system, including the current one. The PR adds the following: - replica will delete the WAL file it received from the current main. - found corrupted snapshot won't be deleted anymore - tested and refactored retention code for WAL files and snapshots - refactored `GetRecoverySteps` code

as51340 self-assigned this Sep 25, 2025

as51340 force-pushed the improve-durability-retention branch from 7b2e023 to f642567 Compare September 26, 2025 08:36

as51340 changed the title ~~refactor: Snapshot retention code~~ feat: Improve disk usage in HA Sep 26, 2025

as51340 force-pushed the improve-durability-retention branch from f642567 to 59a3241 Compare September 26, 2025 08:39

as51340 added feature feature Capability - high-availability Docs needed Docs needed labels Sep 26, 2025

as51340 mentioned this pull request Sep 29, 2025

docs: HA system configuration memgraph/documentation#1416

Merged

11 tasks

as51340 added this to the mg-v3.6.0 milestone Sep 29, 2025

as51340 added bug bug and removed feature feature labels Sep 29, 2025

as51340 requested a review from andrejtonev September 29, 2025 09:05

as51340 marked this pull request as ready for review September 29, 2025 09:05

as51340 changed the title ~~feat: Improve disk usage in HA~~ bugfix: Improve disk usage in HA by eagerly deleting received WAL file on the replica Sep 29, 2025

as51340 added 5 commits September 29, 2025 11:08

refactor: Retention of snapshots on the main

34c53d8

refactor: Retention of WAL files on the main

611a3d0

refactor: Improve GetRecoverySteps

a85e5ef

refactor: Avoid using optional

455f9dd

feat: Delete received durability file on replica

71e5fd6

as51340 force-pushed the improve-durability-retention branch from b4f5e4f to 71e5fd6 Compare September 29, 2025 09:08

matea16 mentioned this pull request Sep 29, 2025

Memgraph 3.6 memgraph/documentation#1382

Merged

50 tasks

as51340 changed the title ~~bugfix: Improve disk usage in HA by eagerly deleting received WAL file on the replica~~ fix: Improve disk usage in HA by eagerly deleting received WAL file on the replica Sep 30, 2025

andrejtonev approved these changes Sep 30, 2025

View reviewed changes

as51340 added this pull request to the merge queue Sep 30, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Sep 30, 2025

as51340 added this pull request to the merge queue Oct 1, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 1, 2025

as51340 added this pull request to the merge queue Oct 1, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 1, 2025

as51340 added this pull request to the merge queue Oct 1, 2025

Merged via the queue into master with commit 1293ccf Oct 1, 2025
36 checks passed

as51340 deleted the improve-durability-retention branch October 1, 2025 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Improve disk usage in HA by eagerly deleting received WAL file on the replica#3300

fix: Improve disk usage in HA by eagerly deleting received WAL file on the replica#3300
as51340 merged 5 commits intomasterfrom
improve-durability-retention

as51340 commented Sep 25, 2025 •

edited

Loading

Uh oh!

as51340 commented Sep 25, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

as51340 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

as51340 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Tracking

Standard development

CI Testing Labels

Documentation checklist

Uh oh!

sonarqubecloud bot commented Sep 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

as51340 commented Sep 25, 2025 •

edited

Loading

as51340 commented Sep 25, 2025 •

edited

Loading