-
Notifications
You must be signed in to change notification settings - Fork 247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
abort on filesystem loop #273
Conversation
Linked issue: #259 |
Sounds fair to me, go for it! |
Great! This has been long in the coming. I don't follow the reasoning here, though:
So I thought I'd add my two computer science cents. Finding duplicates or even cycles in a directed graph (which a file system with symlinks is) is (almost) as fast as exploring that graph; in fact, you can do it along the way. All you need is a boolean flag per (real) file which indicates whether you've been there before. So this part of the argument is not quite accurate. Is usability really such a concern? Are there real disadvantages to using -- per convention -- always the real path or the shortest path or the "most local" path or ... of a given watched directory/file? You can even collect and store all non-cyclic paths for reporting (not for watching!) and duplicate reports; the overhead incurred is the user's responsibility. What I'm trying to say is: algorithmically, there is no big issue at all with duplicate symlinks and cycles. It's easy (and efficient) to detect either and filter and/or normalise paths. (In fact, you implicitly do this already with all the paths containing |
The problem isn't detecting duplicates - it's deciding how to easily watch directories recursively (technical constraint) and continously (regardless of dirs being added while scanning) in a way to avoid watching the same directory more than once.
That's what the patch already does (to know when to abort with an error).
Yes, because e.g. Example: technically users expect that changing How do you know whether the user is editing Well, you don't because the filesystem doesn't provide that information (except maybe when polling - which creates multiple events, so problem type is OS-specific).
Yes. Human beings don't like using redundant and lengthy paths, e.g. I run Those are user expectations - not meeting them means frustration, issues, etc. The problem is: system events use full paths, so currently I just try to find how the full path is relative to the current one (not ideal, but better than a full path, because at least it's tied to a specific watched directory). This patch, though, opens up the possibility of transforming a real path back into the original symlink (sic!).
This is exactly what I was aiming for - it's the users responsibility to keep their filesystem and directory structure reasonable, and the error makes them aware they are watching more files and directories than necessary.
Algorithmically, the challenge is doing the reverse (having a resolved file and asynchronously discovering/guessing which symlink the event originated from).
Trust me, in Listen it isn't efficient:
And, it isn't easy:
In short, Listen already allows too much and does too much (just presenting my view based on many hours of debugging issues in Listen). Handling more than one "reference" (dir or symlink) in a watched directory is like expecting an OS to continue working reliable on unreliable, unpredictable and faulty hardware (getting more reliable hardware will almost be a faster/cheaper option). If only Listen could get away with doing a lot less... ;) |
044ef28
to
c7b9a75
Compare
One thought: With this commit listen assume that the folder listened to is under the control of the person initiating listen. Eg. Listen is now useless for watching a users directory where the developer has no control of what is going on. Idea: How about adding a "no symlink" mode which simply ignores symlinks? |
I assume the user can choose which folders can be watched - in that case, it's just a matter of making a "unique list" of directories - and pass them to Listen. Of course, this is slightly troublesome to build a "correct" list of directories - and listen could do this for you automatically. Which is exactly what this feature request is for: #274 (Pull requests are welcome - I'll help answer questions, etc.). |
Just ran into this. It would be nice if Listen somehow resolved this instead of just refusing to continue. Is there no way to just not continue following symlinks if it finds a loop? I'd also be fine with it not following symlinks at all, if that's easier. |
@e2 Ah, hadn't seen that one. Any suggestions on where I should look first if I wanted to poke around? |
Yes! I've added 4 points in the description here: #274 |
I feel that guard/listen is a bit overcomplicated, so I made a minimalistic replacement for my use case. It has way less features, and is not as accurate, but it gets the job done at a significantly reduced cpu cost, at least on osx. It might be worth having a look: https://github.com/stephan-nordnes-eriksen/feather_watch |
Yes, that's because of lots of trial and error getting everything working: https://github.com/guard/listen/issues?page=3&q=is%3Aissue+is%3Aclosed I tried to make it simpler, but unfortunately every bit of code has a strong reason to stay there. Also, there are many people already relying on how listen works (also very, very old versions), so API compatibility is very important as well - especially since it's used by Guard. Listen 3.x is still in the pipeline, and one planned feature is to give users a way of getting a "lightweight" setup (using the adapters more directly) and where you could choose the features you're interested in. Personally, I'd just use rb-inotify directly and never have a single care for OSX or Windows users - because rb-inotify needs the least amount of resources and provides enough functionality to be used alone (WDM comes second place). As for the file events you're using: https://github.com/thibaudgg/rb-fsevent#fileevents You may want to checkout vagrant-gatling-rsync, which kind of implements what you need.
This is a deal breaker because of editor support, even with rb-inotify: https://github.com/guard/guard/wiki/Analysis-of-inotify-events-for-different-editors. E.g. with some Vim settings, you'll get a file move and a file delete and ... nothing else (you won't detect changes on the actual file saved). I think nothing frustrates users more than saving a file in an editor which either doesn't trigger a file event, or triggers multiple events.
OSX is one area I can't help much with - so if you can get editor support working flawlessly on it, I could help adapt Guard to use your library, etc. |
No doubt there is a lot of work behind listen. I would be an honour if you decided to use my library within listen. It is really just as close to pure file system events as possible by using existing libraries (rb-inotify, wdm, and rb-fsevent). Not much more. As for editor support. I am not 100% sure what that implies. Maybe there is some magic that should be added, but I honestly don't know what that should be because you get (as far as I know) all the file system events created, but sometimes you get only 2 if there is multiple in quick succession, the first and the last. On osx, well, I might dig around in rb-fsevent to see if I am amble to append the actual events, as it is the missing link for both listen and feather_watch. Only issue is that it is about 5 lines of ruby, and the rest is magically conjured into existence from some c code. |
Compatibility is more important for me (since that's where more of the problems and challenges are). If you are willing to actively support feather_watch, deal with issues, etc., and you strongly believe many people would benefit from using it, I'd suggest you first integrate it with listen-compat. Ideally, users could just use an environment variable like Basically, listen-compat is supposed to be an "wrapper" for any available/installed listen version, but that also means it could easily support any library providing the same interface (see README). Feel free to send pull requests to have it integrated.
If I were you, I'd definiately go through all the closed issues in Listen - because there's too much reinventing the wheel in terms of potential problems. Also, the lack of Polling mode is pretty much a deal breaker on virtual or special filesystems or with complex trees with symlinks.
It means I save a file, and e.g. Guard, or Sass or Jekyll causes no more and no less than one "changed" event, regardless if the editor uses move_to, atomic save, rename, temporary files, backup files, etc. This also means ignoring editor backup files (Listen has a list of builtin defaults). Sometimes and editor uses multiple events to "change" a file - this should be reported as one change within the given time interval (see
E.g. On Linux, events have a "cookie" option, which says which operations are related. Listen uses this information to "cancel out" events logically, e.g. removing a file and adding it again is translated into a "changed" event, because although the file was removed it exists, and although it was added, it isn't a new file (because it was deleted).
The problem with Listen is not so much performance, as people expect "1 saved file" to generate "1 event". Personally, when I need performance, I just use the rb-inotify library directly. |
- Move Gemfile, Gemfile.lock, deployment.rb out of project root via symlink (cf. guard/listen#273)
@rymai, @thibaudgg - I think this is pretty drastic, so I wanted some feedback before thinking of releasing this.
TL;DR With this patch - Listen aborts (!) if you symlink a directory to another directory within the project
There are 3 scenarios:
normal symlink - when there's a symlink to a directory outside the watched dir, e.g.
foo/bar -> /baz/bar # outside watched dir
duplicate watched dir - when a symlink points to a directory WITHIN the watched dir
Bad, i.e. filesystem loop (and e.g.
find -L
also halts on this)foo/bar/baz -> ../..
The patch
This patch makes Listen abort on
3
(obviously), but also aborts on2
, which I think is ok, because either the user resolves the ambiguity (see below), or complex resolving would have to be implemented in Listen (pointless waste of time).Best solution
The "best" solution is for users to set watchdirs in
Listen
in such a way, that2
is avoided.Biggest challenge with symlinks:
Intuitively, users "expect" Listen to do a "REVERSE symlink resolution" and ... return ONLY one path/event!
E.g., given the user edits:
project/foo/bar/hello.rb
, andproject/foo
links to-> project/vendor/baz
, and listen watchesproject
then what should Listen report as modified?a) what the user edited? (
foo/bar/hello.rb
)b) or what physically changed relative to the watched dir ? (
vendor/baz/bar/hello.rb
)Of course, option a) is most "user friendly" and "expected", but it gets hairy if you have multiple symlinks pointing to
vendor/baz/bar
.My stance on this
I'm in favor of not letting users watch the same physical directory multiple times (files are ok) because of:
Of course it would be up to the user to make sure there are no loops or symlinks - and I believe that's how it should be, because Listen cannot "lock" the filesystem for itself or find out what symlink was used when a file changed.
Semver
At the very least I can release a 3.0 of Listen.
What do you think?