-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numpy.testing accounts for almost 30 % import time #11457
Comments
Interesting. I quickly looked through the main There would seem to be two options:
The latter would seem a much lighter change (and one could obviously leave an import in its current location), but let me cc @charris (who can perhaps ping others who understand this part better). [1] from
[2] almost everything from
|
IIRC we wanted to deprecate the automatic import of |
Can we fix this by defining
That way, only users and subclassers will incur the import |
There are two issues: (1) there's lots of code that assumes it can do a bare For the lazy import, the portable way is to use https://github.com/njsmith/metamodule/; this lets you basically define a metamodule is very clever about trying to avoid adding extra overhead to regular attribute lookup (e.g. I think |
interesting. how much is "a little bit"? |
On my machine/configuration, python3.7 -X importtime -c 'import numpy'
Numpy version: 1.14.5 |
Of which almost 100 % is from |
Graphically: https://transfer.sh/tUOuP/time.pdf |
Another graph using tuna. Reproduce with
|
Tab completion is fine – all the mechanisms that let you override
On Python 3, it's ~100 ns per attribute lookup, and on Python 2 it's only ~10 ns per attribute lookup. Python 3.6:
Python 2.7:
Unfortunately when I was originally testing this it was on Python 2 so I didn't notice the issue. The underlying issue is probably not too hard to fix in CPython, but that wouldn't help until 3.8 comes out. I haven't tried timing the new module-level |
Wouldn't one use the |
@mhvk I assume you meant |
Yes, But perhaps we can also consider the even less intrusive option of moving |
I would vote for creating a clean path for importing only the minimum of NumPy needed to use the multiarray container, without any of testing, polynomial, ma, financial, fft, ... Users can always |
I think that's a non-starter. And that for a quite minor problem. Both a ~80 ms import time and PyInstaller packaging are relevant for only a very small subset of users. And once those users use anything more than bare numpy, (e.g. scipy, pandas, matplotlib - with all larger import times), it still doesn't matter to them. Same for PyInstaller - there are issues like getting a 500 MB minimum .exe size if you do it from an Anaconda install. Adding in pytest yes/no hardly matters. |
I think this is a very good idea. Right now this creates all sorts of nasty circular imports, because every single submodule imports |
FWIW, For my case of a library that uses Numpy, import time is not a primary issue, although it does add up -- with |
I believe that is a major problem: 120 ms solely for numpy import on a high-end laptop makes it unusable for CLI. And scipy/pandas/matplotlib along with any other package commonly used for scientific purposes all depend on numpy. There is no reason for those packages to speed-up their imports because their common package is an unavoidable blotteneck. |
…ecessary import dependencies pytesttester is used by every single subpackage, so making it depend on np.testing just creates cyclic dependencies that can lead to circular imports Relates to numpy#11457
Yeah that's because the
@eric-wieser made a nice start in gh-11473 with moving |
With gh-11473, the only remaining bit in
@eric-wieser already had a suggestion on how to fix it. Somebody want to take this up? |
Several related PRs have now gone in, someone want to take a stab of adding something to the 1.16.0 release notes? |
You might already know this, but for the avoidance of doubt: we also have as part of our public API that you can write things like: import numpy
numpy.testing.assert_equal(1, 1) We can't break that, so even if |
Perhaps we just fix this on python 3.7 where it's easy, and tell people who care to upgrade? |
As of #11474, the import tree is much more balanced - |
@eric-wieser does the balancedness of that tree matter? (honest question) |
@njsmith: Not beyond making the tree easier to analyze - having add_newdocs look like 90% of numpy was just confusing. I think a more balanced tree is less indicative of (allowable) circular imports though |
The point is, you can choose what you want (up to inter-dependencies), and then pay the price of the import. Some of the packages are actually not that heavy. Numbers from another (more crappy) machine (Numpy is around 160 ms):
... and so on. |
Does anybody now what are the drawbacks of lazy loading modules? https://pypi.org/project/lazy-import/ |
Maybe lazy loading for py37 only is a good idea: https://snarky.ca/lazy-importing-in-python-3-7/ |
Lazy loading has two main downsides. The intrinsic one is that it can be surprising: it can hide The other one is that implementing it is pretty complicated, so most implementations have quirks and subtle tradeoffs. You have to look carefully at things like speed impact, and I don't trust that It's doable, some projects have done it for a long time, it's probably the right thing for |
Looking at the import times more carefully, 125ms / 195ms (65%) on my machine is spent importing builtin modules - so it's perhaps unfair to blame numpy for a large import time, especially if your code goes on to import the same builtins anyway. |
I suggest we close this and if desired open a new issue reflecting the discussion here, alternatively we could retitle this one appropriately since the title is misleading |
Not sure opening a new issue here is helpful, since I don't think there's anything left to do. Except perhaps a decision on lazy loading of If you have a more descriptive title, please go ahead and edit it, that's always useful. |
Rereading the discussion I agree with @rgommers that, essentially, there is nothing more to do, so I'm closing this issue. |
As stated, I think blaming numpy.testing alone is probably misleading. There are quite a few other modules that take time. One of those "innocent" looking ones is:
While some of these might be I'm going to refer to numpy's own benchmarks regarding the amount of time it takes for numpy to import: According to those, on some computer somewhere, it might take about 900 ms to import numpy up from 700 ms in recent versions. While the benchmarks running on my laptop are not that slow, it also isn't the cheapest laptop around. Here is a PR made to my own branch showing the changes in case anybody wanted to glance at them: #14083 And an image of the improvements as I made the changes. I'm happy to cleanup the changes as required. I get that in many application the caller will likely import |
I just ran the brand new
python -X importtime -c "import numpy"
. Turns out thatnumpy.testing
takes rougly 29 % (23 ms) of the total import time (80 ms), most of which (62 %, 14 ms) is theunittest
module (nonose
in my environment). In view of #10856, this is only going to get worse --pytest
weighs in with about 35 ms on my machine.So there's an opportunity for an easy win in import time by making the import of
numpy.testing
lazy. That is, if such a change doesn't cause huge amounts of downstream breakage.Alternatively, the main culprits could be imported lazily, with a much smaller downstream impact (likely none).
The text was updated successfully, but these errors were encountered: