Skip to content

Conversation

@ikus060
Copy link
Contributor

@ikus060 ikus060 commented Apr 25, 2025

Changes done and why

  • Add implementation for a new selector to ignore unreadable files.
  • Make use of this new selector in fs_abilities to avoid raising permissions error

Self-Checklist

  • changes to the code have been reflected in the documentation
  • changes to the code have been covered by new/modified tests
  • commit contains a description of changes relevant to users prefixed by DOC:, FIX:, NEW: and/or CHG:

@ikus060
Copy link
Contributor Author

ikus060 commented Apr 25, 2025

@ericzolf If you agree with the principle, I will write a unit test to cover the new "--exclude-unreadable" selector.

@ikus060
Copy link
Contributor Author

ikus060 commented Apr 25, 2025

@ericzolf Sorry for the long list of small commits. At least everything is split into manageable chunks.

The general idea here is to avoid calling os.lstat() on files that are excluded by --exclude ... and not re-included by any other pattern.

  1. This reduces the number of I/O calls.
  2. When done properly, it prevents permission errors or access denied issues when trying to retrieve metadata from excluded folders.

Your input is appreciated, as always.

Just a heads-up: I’m heading off on vacation and will be back in late May.

@ikus060 ikus060 force-pushed the patrik-fix-access-error branch from dc6d9f5 to ab69d88 Compare April 30, 2025 18:23
@ericzolf
Copy link
Member

ericzolf commented Jun 5, 2025

Before I review this PR, can you clarify why the PR merges into a branch which isn't master?

@ikus060
Copy link
Contributor Author

ikus060 commented Jun 6, 2025

Before I review this PR, can you clarify why the PR merges into a branch which isn't master?

I wanted to chain the pull requests because one change depends on the other. However, the changes are unrelated in terms of functionality. I'm not sure how to properly manage this kind of situation in GitHub.

@ericzolf
Copy link
Member

ericzolf commented Jun 7, 2025

I generally keep it against master, put a note of the "chaining" for the reviewer, and potentially put it in draft mode until the base PR is merged.

Returns (rpath, num) where num == 0 means rpath should be
generated normally, num == 1 means the rpath is a directory
and should be included iff something inside is included.
and should be included if something inside is included.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and should be included if something inside is included.
and should be included iff something inside is included.

Ha, I remember also as I struggled with this iff, but it's legit and means "if and only if", see https://en.wikipedia.org/wiki/If_and_only_if

@ericzolf
Copy link
Member

ericzolf commented Jun 7, 2025

I need to take a deeper look at the changes you've made, they're quite extensive, but one thought goes through my mind: why do we need an exclude-unreadable option? If a file is unreadable, we won't anyway be able to backup it (and rdiff-backup wouldn't choke on it, just send a warning). Is it only about avoiding warnings? Then I'd like another option and even approach, but we can talk about once you've clarified the purpose.

@ikus060
Copy link
Contributor Author

ikus060 commented Jun 7, 2025

I need to take a deeper look at the changes you've made, they're quite extensive, but one thought goes through my mind: why do we need an exclude-unreadable option? If a file is unreadable, we won't anyway be able to backup it (and rdiff-backup wouldn't choke on it, just send a warning). Is it only about avoiding warnings? Then I'd like another option and even approach, but we can talk about once you've clarified the purpose.

Thanks, Eric, for taking the time to review.

To answer your question—yes, the main goal is to avoid raising unnecessary warnings during the backup process. That was the initial motivation for this change.

However, while working on it, I noticed that rdiff-backup calls lstat() even on files that are excluded. This shouldn't be necessary when the exclusion is based on filename patterns. In my scenario, this has a negative impact on performance because lstat() is still called on every file, even those that are excluded.

@ericzolf
Copy link
Member

ericzolf commented Jun 7, 2025

On avoiding lstat, fine with me on the principle, I just need to find a bit more time to review it properly.

On the exclude option, I don't like it so much:

  1. it gives the false impression that something is excluded which isn't normally
  2. for avoiding warnings, looking at [ENH] don't return a warning when there are no increments to remove #1071, I was thinking about an option like --ignore-warnings all resp --ignore-warnings unreadable,no-increment and/or --ignore-warnings unreadable --ignore-warnings no-increment, with of course the capability to add further warnings to ignore.

Would it be possible to split both things and add the ignore option in another PR?

Base automatically changed from patrik-tidy-up to master June 7, 2025 12:43
@ikus060
Copy link
Contributor Author

ikus060 commented Jun 7, 2025

I will split the PR in multiple pieces to make everything clear. That I agree.

I like the idea of --ignore-warnings unreadable,no-increment, but the main reason why I wanned to implement an "exclude-unreadable" is to exclude unreadable files from fs_availabilities check. On line 600, fs_availabilities uses a selection.Select to search for a regular file. While doing this on a root filesystem where multiple files are not readable, it raises warnings.

I'm open to suggestions.

ikus060 added 2 commits June 8, 2025 19:36
To reduce io calls and to avoid calling os.lstat on excluded files, run the selection functions in two pass. First pass only with filenames selection.

Signed-off-by: Patrik Dufresne <[email protected]>
- Add implementation for a new selector to ignore unreadable file.
- Make use of this new selector in fs_abilities to avoid raising permissions error

Signed-off-by: Patrik Dufresne <[email protected]>
@ikus060 ikus060 force-pushed the patrik-fix-access-error branch from ab69d88 to abe8355 Compare June 9, 2025 00:24
@ikus060
Copy link
Contributor Author

ikus060 commented Jun 9, 2025

@ericzolf Let me recap everything regarding these changes.

The main issue I’m trying to resolve is the following:

On Windows, when configured to back up "C:/", rdiff-backup runs its fs_abilities checks starting from "C:/". This causes several warnings and errors to be logged, especially during the _detect_resource_fork_readonly check.

This particular check attempts to determine if the filesystem supports resource forks. To do so, it needs to find a readable regular file on the filesystem. The current implementation relies on the default selection function to locate such a file, using:

selection.Select(dir_rp).get_select_iter()

However, this traverses the directory tree and, as a side effect, triggers warnings and errors for unreadable or locked files:

WARNING: ListError: 'C:/Users/REDACTED/NTUSER.DAT' [Errno 13] Permission denied: b'C:/Users/REDACTED/NTUSER.DAT'
* Unable to read ACL from path C:/hiberfil.sys due to exception '(32, 'GetNamedSecurityInfo', 'The process cannot access the file because it is being used by another process.')'
* Unable to read ACL from path C:/swapfile.sys due to exception '(32, 'GetNamedSecurityInfo', 'The process cannot access the file because it is being used by another process.')'

This behavior is both annoying and confusing for users reviewing the logs.

To mitigate this, my proposed solution in this pull request was to configure the selection function with the following options: exclude-special-files, exclude-unreadable, and exclude-other-filesystems. This reduces noise and avoids triggering these errors during capability checks.

That said, I acknowledge that this might not be the ideal fix. In hindsight, relying on the selection function inside fs_abilities was probably a design oversight. However, reimplementing a separate mechanism to walk the directory hierarchy just to find a single readable file also seems excessive and redundant. That explain, why I implemented this "exclude-unreadable"...

I hope this make thing more clear.

P.S.: This Pull Request depends on #1074

@ericzolf
Copy link
Member

Could we close this PR and wait for the implementation of #1078 ?

@ikus060
Copy link
Contributor Author

ikus060 commented Jul 3, 2025

Give me some time to test it end to end in Minarca.

I will mark this PR as draft for now.

@ikus060 ikus060 marked this pull request as draft July 3, 2025 12:24
@ikus060
Copy link
Contributor Author

ikus060 commented Jul 11, 2025

This is link to #1064

It all depends how we want to solve the error raised by resource_forks detection.. My initial idea was to create this new exclude and use it in resources forks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants