WIP: Allow zero repo-archive-retention#1926
WIP: Allow zero repo-archive-retention#1926pgstef wants to merge 11 commits intopgbackrest:integrationfrom
Conversation
… when backup is running
# Conflicts: # src/common/lock.c
dwsteele
left a comment
There was a problem hiding this comment.
Did an initial review of this and there are some issues we'll need to address, see comments.
src/command/archive/push/file.c
Outdated
| cfgOptionIdxName(cfgOptRepoRetentionArchive, repoData->repoIdx)); | ||
| destinationCopy[repoListIdx] = false; | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Do we want a warning here? Since this is an intentional setting I don't think we want to fill the log with warnings.
There was a problem hiding this comment.
We currently not report any missing wal file in the info command (nor in the verify command the last time I tried). So perhaps not warning but perhaps detail (just like we have for the copied files in the backup command)?
There was a problem hiding this comment.
Changed to detail in 3b806c6.
This seems important to note that the .backup files aren't pushed either 🤔
There was a problem hiding this comment.
.backup files are pretty useless so I'm OK with that. One problem: this detail messages is not going to make it into the main log. It will only show up when logging with --log-subprocess. This is why the warnings are passed back to the main process to be output there.
There was a problem hiding this comment.
Hm, that's weird because it's not what I saw during the tests (explicitly setting --no-log-subprocess) 🤔
P00 INFO: archive-push command begin 2.44dev: [pg_wal/000000010000000000000016] --exec-id=37899-96f2d7d8 --log-level-console=detail --log-level-file=detail --no-log-subprocess --pg1-path=/var/lib/pgsql/15/main/data --process-max=2 --repo1-cipher-pass=<redacted> --repo2-cipher-pass=<redacted> --repo1-cipher-type=aes-256-cbc --repo2-cipher-type=aes-256-cbc --repo1-host=172.18.0.2 --repo2-host=172.18.0.2 --repo1-host-user=pgbackrest --repo2-host-user=pgbackrest --repo1-path=/repo1 --repo2-path=/repo2 --repo1-retention-archive=0 --stanza=ro8pg
P00 DETAIL: '000000010000000000000016' skipped for repo1, because option 'repo1-retention-archive' is set to 0
There was a problem hiding this comment.
It has to do with the way local processes are forked off in unit tests. It works differently when running pgbackrest for real, i.e. you won't get the logs for the subprocess in the main log.
|
|
||
| // Store the repo idx inside the lock file | ||
| lockWriteDataP(lockTypeBackup, .repoIdx = VARUINT(cfgOptionGroupIdxDefault(cfgOptGrpRepo))); | ||
|
|
There was a problem hiding this comment.
This is not going to work as expected for remote backups since updates to the lock file are only written locally.
There was a problem hiding this comment.
Hm, that's annoying 😢
I admit I didn't tested on repo-hosts yet. But the all point of this is to only push to the repo where the backup is running (so offline repos don't affect backups on online repos) 🤔
There was a problem hiding this comment.
We need to push the lock file to the pg host where archiving is happening or it will never archive while archive-retention=0.
| { | ||
| // For each possible repo, check and adjust the settings as appropriate | ||
| for (unsigned int optionIdx = 0; optionIdx < cfgOptionGroupIdxTotal(cfgOptGrpRepo); optionIdx++) | ||
| // !!! NEED CHECK HERE THAT IF ARCHIVE RETENTION IS 0 THEN ARCHIVE-CHECK MUST BE ENABLED WHEN VALID |
There was a problem hiding this comment.
There is a data loss scenario where repo-archive-retention=0 on the pg host and archive-check=n on the repo host. In this case the backup lock will be released too early and WAL segments will probably be lost.
This will be tricky because we somehow need to figure out what the settings for the archive-push process are, not the settings that the backup remote gets. I have thought about this for a bit and I don't see a way to solve it except "don't do that."
There was a problem hiding this comment.
In that case, it makes sense to require archive-check to be enabled imo 👍
There was a problem hiding this comment.
Yes, but this check needs to be done across hosts so that makes it pretty complicated. In other words we need to make sure that archive-check is enabled on the repo host when repo-retention-archive is enabled on the pg host. I don't see any obvious way to do this.
|
Hi, can i please get an update on this patch . |
@vkaplan40 This patch requires a new locking scheme and I have not come up with anything I'm happy with. I'm planning to look at locking again in the next release or two, but no guarantees. |
|
@dwsteele - if I can add my vote here, we REALLY need this feature!!! |
There are some valid use-cases where someone might want to disable archiving for a specific repository.
With
repo-retention-archiveset to 0, the archives are pushed only when the check or backup commands are running. That required to add a check lock.For the multi-repo context, I added the repoIdx into the backup lock, using that info to skip archiving on disabled repos (except the one where the backup is running). Choosing this path could let us add more info to the lock file later if we want.
Imho, since backupStart happens after the lock update inside the backup command, archive-push wouldn't skip archiving the starting wal but I might not be imaginative enough 🤔
Previously the expire command kept all archives after the newest backup. If i.e. the check command is executed, we'd have a broken chain of archives in the disabled repo. So I've updated the expire command to be able to remove those.
If this seems like a good approach, I'll gladly add the tests 😉
Many thanks in advance for the review and advice 😊