-
-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Glob complexity is Quadratic on directory depth #106
Comments
@nh2 Perhaps you'd be interested to take a look and see if you can understand why the performance is quadratic. |
The complexity appears to be coming from the call to the zipfile namelist. If I add this patch: zipp main @ git diff
diff --git a/tests/test_complexity.py b/tests/test_complexity.py
index 67e9c17..7b91505 100644
--- a/tests/test_complexity.py
+++ b/tests/test_complexity.py
@@ -39,7 +39,9 @@ class TestComplexity(unittest.TestCase):
for path, name in pairs:
zf.writestr(f"{path}{name}.txt", b'')
zf.filename = "big un.zip"
- return zipp.Path(zf)
+ res = zipp.Path(zf)
+ res._saved_namelist = res.root.namelist()
+ return res
@classmethod
def make_names(cls, width, letters=string.ascii_lowercase):
@@ -81,6 +83,7 @@ class TestComplexity(unittest.TestCase):
max_n=100,
min_n=1,
)
+ breakpoint()
assert best <= big_o.complexities.Quadratic
@pytest.mark.flaky
diff --git a/zipp/__init__.py b/zipp/__init__.py
index a1b9884..e62dc05 100644
--- a/zipp/__init__.py
+++ b/zipp/__init__.py
@@ -399,7 +399,7 @@ class Path:
prefix = re.escape(self.at)
tr = Translator(seps='/')
matches = re.compile(prefix + tr.translate(pattern)).fullmatch
- return map(self._next, filter(matches, self.root.namelist()))
+ return map(self._next, filter(matches, self._saved_namelist))
def rglob(self, pattern):
return self.glob(f'**/{pattern}') The result comes back as Constant time (in one test; it's probably Linear). |
The problem is that ZipFile.namelist constructs a new list, which is apparently quadratic in the length of the filelist. Bypassing that list construction restores the expectation of linear or better performance. |
In #105, this project re-worked the glob functionality. In that effort, I found that in
test_glob_depth
, the best complexity was never better than Quadratic. That's why I wrotetest_baseline_regex_complexity
to show that the regex is Constant on the length of the path, which means it should be linear on a number of paths.It's probably not important, but I'd like to get a good answer for why the test performance isn't better than Quadratic.
The text was updated successfully, but these errors were encountered: