numpy.testing accounts for almost 30 % import time #11457

kohr-h · 2018-06-29T18:49:59Z

I just ran the brand new python -X importtime -c "import numpy". Turns out that numpy.testing takes rougly 29 % (23 ms) of the total import time (80 ms), most of which (62 %, 14 ms) is the unittest module (no nose in my environment). In view of #10856, this is only going to get worse -- pytest weighs in with about 35 ms on my machine.

So there's an opportunity for an easy win in import time by making the import of numpy.testing lazy. That is, if such a change doesn't cause huge amounts of downstream breakage.
Alternatively, the main culprits could be imported lazily, with a much smaller downstream impact (likely none).

The text was updated successfully, but these errors were encountered:

mhvk · 2018-06-30T22:07:34Z

Interesting. I quickly looked through the main __init__.py, and that really refers to testing only in a few lines [1]. The important one, repeated elsewhere, is the import of testing._private.pytesttester.PytestTester. Here, PytestTester itself does not do any imports until it is used, but the problem is that along its path, testing.__init__ imports a lot of stuff. If I comment out the from .testing import Tester line in [1], and also remove all but the PyTestTester lines from testing.__init__ [2], the import time does indeed reduce by ~25%.

There would seem to be two options:

As you say, lazily initializing in testing.__init__.py (in python3.7, the module-level __getattr__ could be used...; are there other tricks?);
Move PytestTester to a different path.

The latter would seem a much lighter change (and one could obviously leave an import in its current location), but let me cc @charris (who can perhaps ping others who understand this part better).

[1] from numpy.__init__.py:

    # We don't actually use this ourselves anymore, but I'm not 100% sure that
    # no-one else in the world is using it (though I hope not)
    from .testing import Tester

    # Pytest testing
    from numpy.testing._private.pytesttester import PytestTester
    test = PytestTester(__name__)
    del PytestTester

[2] almost everything from numpy.testing.__init__.py:

from unittest import TestCase

from ._private.utils import *
from ._private import decorators as dec
from ._private.nosetester import (
    run_module_suite, NoseTester as Tester
    )

__all__ = _private.utils.__all__ + ['TestCase', 'run_module_suite']

from ._private.pytesttester import PytestTester
test = PytestTester(__name__)
del PytestTester

seberg · 2018-06-30T22:11:38Z

IIRC we wanted to deprecate the automatic import of testing before, but didn't because there was no deprecation mechanism possible. Nathaniel probably put that comment there...

eric-wieser · 2018-06-30T22:16:13Z

Can we fix this by defining Tester as:

class TesterMeta(type):
    def mro(cls):
        # expensive import
        from Testing import TheOldTester
        return TheOldTester.mro()
    def __call__(cls, *args, **kwargs):
        from Testing import TheOldTester
        return TheOldTester(*args, **kwargs)

Tester = TesterMeta('Tester', (), {})

That way, only users and subclassers will incur the import

njsmith · 2018-06-30T23:20:26Z

There are two issues: (1) there's lots of code that assumes it can do a bare import numpy and then refer to numpy.testing, so we need a way to automatically load it on first reference, (2) we have to somehow handle the re-exports like np.Tester and np.test. We have to solve all of them if we want to reduce import time.

For the lazy import, the portable way is to use https://github.com/njsmith/metamodule/; this lets you basically define a __getattr__ on module objects. There's some old PR to add this to numpy that I've been feeling guilty about not getting back to for like 4 years at this point... though that PR actually had a different motivation, which was to deprecate np.int and friends. (If anyone wants to revive that PR, I believe the main blocker did get fixed in metamodule's v1.2 release.)

metamodule is very clever about trying to avoid adding extra overhead to regular attribute lookup (e.g. np.array), but for obscure reasons it does add a little bit, at least on some versions of python. (It turns out there's a fast path inside the CPython interpreter that gets disabled for objects that have a __getattr__ fallback, even if you never actually call it?!) Because of this, using 3.7's new module-level __getattr__ hack might be faster, so we might want to use that where possible. People are pretty sensitive about the speed of attribute lookup on the numpy module object, because unless you're doing from numpy import * then most code does a lot of attribute lookups.

I think np.test is probably trivial, it can become a function or something. For Tester – remember that to support lazy loading of np.testing, we need to hook things up so that we can run arbitrary code when an attribute is accessed on the np object. I suspect the simplest thing to do is to re-use this for np.Tester. (And we can deprecate it at the same time!)

rgommers · 2018-07-01T00:08:23Z

This isn't really a new discussion, would be good to at least browse https://mail.python.org/pipermail/numpy-discussion/2012-July/063093.html
From that thread, tab completion is also important, it needs to stay working.
+1 for @mhvk's suggestion of pytest on a different import path, nicer than messing with lazy imports.

metamodule is very clever about trying to avoid adding extra overhead to regular attribute lookup (e.g. np.array), but for obscure reasons it does add a little bit, at least on some versions of python.

interesting. how much is "a little bit"?

kohr-h · 2018-07-01T00:13:04Z

Here's @njsmith's PR for reference: #6103

horta · 2018-07-01T12:10:59Z

On my machine/configuration, numpy.add_newdocs is the most expensive import: 82% of the time. In total, 180ms.

python3.7 -X importtime -c 'import numpy'

import time: self [us] | cumulative | imported package
import time:       136 |        136 | zipimport
import time:       547 |        547 | _frozen_importlib_external
import time:        60 |         60 |     _codecs
import time:       527 |        586 |   codecs
import time:       509 |        509 |   encodings.aliases
import time:       923 |       2018 | encodings
import time:       240 |        240 | encodings.utf_8
import time:       100 |        100 | _signal
import time:       320 |        320 | encodings.latin_1
import time:        48 |         48 |     _abc
import time:       299 |        346 |   abc
import time:       292 |        637 | io
import time:        61 |         61 |       _stat
import time:       267 |        328 |     stat
import time:       279 |        279 |       genericpath
import time:       263 |        542 |     posixpath
import time:       976 |        976 |     _collections_abc
import time:       612 |       2456 |   os
import time:       252 |        252 |   _sitebuiltins
import time:       322 |        322 |   sitecustomize
import time:        70 |         70 |   usercustomize
import time:       609 |       3707 | site
import time:       361 |        361 |   __future__
import time:       611 |        611 |   warnings
import time:       377 |        377 |   numpy._globals
import time:       199 |        199 |   numpy.__config__
import time:       185 |        185 |   numpy.version
import time:       328 |        328 |   numpy._import_tools
import time:       683 |        683 |       math
import time:       365 |        365 |       numpy.lib.info
import time:       407 |        407 |             numpy.core.info
import time:      1045 |       1045 |                 time
import time:       769 |        769 |                 _datetime
import time:       975 |       2788 |               datetime
import time:      4321 |       7109 |             numpy.core.multiarray
import time:      1856 |       1856 |             numpy.core.umath
import time:       774 |        774 |                   types
import time:        83 |         83 |                   _collections
import time:      1029 |       1884 |                 enum
import time:        69 |         69 |                   _sre
import time:       559 |        559 |                     sre_constants
import time:       475 |       1033 |                   sre_parse
import time:       384 |       1485 |                 sre_compile
import time:        75 |         75 |                   _functools
import time:       152 |        152 |                       _operator
import time:       458 |        610 |                     operator
import time:       281 |        281 |                     keyword
import time:       443 |        443 |                       _heapq
import time:       311 |        754 |                     heapq
import time:       124 |        124 |                     itertools
import time:       339 |        339 |                     reprlib
import time:       980 |       3085 |                   collections
import time:       757 |       3915 |                 functools
import time:       143 |        143 |                 _locale
import time:       346 |        346 |                 copyreg
import time:     21133 |      28905 |               re
import time:       396 |        396 |                 numpy.compat._inspect
import time:       551 |        551 |                     fnmatch
import time:       191 |        191 |                       nt
import time:       166 |        166 |                       nt
import time:       161 |        161 |                       nt
import time:       982 |       1499 |                     ntpath
import time:       115 |        115 |                     errno
import time:       317 |        317 |                       urllib
import time:      1553 |       1869 |                     urllib.parse
import time:      1408 |       5440 |                   pathlib
import time:       332 |       5772 |                 numpy.compat.py3k
import time:       338 |       6505 |               numpy.compat
import time:      1232 |       1232 |                 _ctypes
import time:       620 |        620 |                   _struct
import time:       347 |        967 |                 struct
import time:       401 |        401 |                 ctypes._endian
import time:     21597 |      24195 |               ctypes
import time:       457 |        457 |                 numbers
import time:       908 |       1365 |               numpy.core.numerictypes
import time:      1260 |      62227 |             numpy.core._internal
import time:       568 |        568 |                 _compat_pickle
import time:       175 |        175 |                     org
import time:        27 |        201 |                   org.python
import time:        49 |        250 |                 org.python.core
import time:       870 |        870 |                 _pickle
import time:      1992 |       3678 |               pickle
import time:       356 |        356 |                 numpy.core._methods
import time:       659 |       1015 |               numpy.core.fromnumeric
import time:       744 |        744 |               numpy.core.arrayprint
import time:      1074 |       6509 |             numpy.core.numeric
import time:       521 |        521 |             numpy.core.defchararray
import time:       377 |        377 |             numpy.core.records
import time:       261 |        261 |             numpy.core.memmap
import time:       345 |        345 |             numpy.core.function_base
import time:       337 |        337 |             numpy.core.machar
import time:      1107 |       1107 |             numpy.core.getlimits
import time:       455 |        455 |             numpy.core.shape_base
import time:       351 |        351 |             numpy.core.einsumfunc
import time:       305 |        305 |                         token
import time:      1104 |       1408 |                       tokenize
import time:       255 |       1663 |                     linecache
import time:       444 |       2106 |                   traceback
import time:       397 |        397 |                   unittest.util
import time:       485 |       2988 |                 unittest.result
import time:       748 |        748 |                   difflib
import time:       350 |        350 |                       _weakrefset
import time:       728 |       1077 |                     weakref
import time:       385 |        385 |                     collections.abc
import time:        45 |         45 |                       _string
import time:      1186 |       1230 |                     string
import time:       960 |        960 |                     threading
import time:        71 |         71 |                     atexit
import time:      1546 |       5266 |                   logging
import time:       482 |        482 |                   pprint
import time:       717 |        717 |                   contextlib
import time:      1019 |       8232 |                 unittest.case
import time:       345 |        345 |                 unittest.suite
import time:       682 |        682 |                 unittest.loader
import time:       790 |        790 |                       locale
import time:      1075 |       1865 |                     gettext
import time:      1025 |       2889 |                   argparse
import time:      1318 |       1318 |                       signal
import time:       219 |       1536 |                     unittest.signals
import time:       338 |       1874 |                   unittest.runner
import time:       364 |       5126 |                 unittest.main
import time:       521 |      17890 |               unittest
import time:       183 |        183 |                   numpy.testing.nose_tools
import time:       806 |        806 |                       zlib
import time:       327 |        327 |                         _compression
import time:       517 |        517 |                         _bz2
import time:       447 |       1290 |                       bz2
import time:       843 |        843 |                         _lzma
import time:       388 |       1230 |                       lzma
import time:        62 |         62 |                       pwd
import time:       438 |        438 |                       grp
import time:       834 |       4658 |                     shutil
import time:      2276 |       2276 |                           _hashlib
import time:       501 |        501 |                           _blake2
import time:       486 |        486 |                           _sha3
import time:       417 |       3679 |                         hashlib
import time:       371 |        371 |                           _bisect
import time:       260 |        631 |                         bisect
import time:       383 |        383 |                         _random
import time:       787 |       5478 |                       random
import time:       646 |       6124 |                     tempfile
import time:       842 |        842 |                     numpy.lib.utils
import time:       700 |      12322 |                   numpy.testing.nose_tools.utils
import time:       310 |      12814 |                 numpy.testing.nose_tools.decorators
import time:       292 |      13106 |               numpy.testing.decorators
import time:       327 |        327 |                 numpy.testing.nose_tools.nosetester
import time:     14285 |      14611 |               numpy.testing.nosetester
import time:       330 |        330 |               numpy.testing.utils
import time:       481 |      46417 |             numpy.testing
import time:      1002 |     129274 |           numpy.core
import time:        31 |     129305 |         numpy.core.numeric
import time:       311 |        311 |         numpy.lib.ufunclike
import time:       415 |     130030 |       numpy.lib.type_check
import time:       304 |        304 |           numpy.lib.twodim_base
import time:       748 |       1052 |         numpy.lib.function_base
import time:       916 |        916 |               _ast
import time:       410 |       1326 |             ast
import time:       449 |       1774 |           numpy.matrixlib.defmatrix
import time:       292 |       2065 |         numpy.matrixlib
import time:       273 |        273 |         numpy.lib.stride_tricks
import time:       599 |       3986 |       numpy.lib.index_tricks
import time:       336 |        336 |       numpy.lib.mixins
import time:       364 |        364 |       numpy.lib.nanfunctions
import time:       321 |        321 |       numpy.lib.shape_base
import time:       283 |        283 |       numpy.lib.scimath
import time:       254 |        254 |           numpy.linalg.info
import time:       427 |        427 |             numpy.linalg.lapack_lite
import time:       953 |        953 |             numpy.linalg._umath_linalg
import time:       585 |       1964 |           numpy.linalg.linalg
import time:       315 |       2532 |         numpy.linalg
import time:       551 |       3083 |       numpy.lib.polynomial
import time:       299 |        299 |       numpy.lib.arraysetops
import time:       280 |        280 |         numpy.lib.format
import time:       407 |        407 |         numpy.lib._datasource
import time:       508 |        508 |         numpy.lib._iotools
import time:       730 |       1923 |       numpy.lib.npyio
import time:      1694 |       1694 |           _decimal
import time:       414 |       2107 |         decimal
import time:       445 |       2552 |       numpy.lib.financial
import time:       447 |        447 |       numpy.lib.arrayterator
import time:       582 |        582 |       numpy.lib.arraypad
import time:       362 |        362 |       numpy.lib._version
import time:       597 |     146207 |     numpy.lib
import time:        75 |         75 |     numpy.core.c_einsum
import time:      1993 |     148275 |   numpy.add_newdocs
import time:       228 |        228 |   numpy._distributor_init
import time:       254 |        254 |     numpy.fft.info
import time:       437 |        437 |       numpy.fft.fftpack_lite
import time:       284 |        284 |       numpy.fft.helper
import time:       326 |       1046 |     numpy.fft.fftpack
import time:       270 |       1568 |   numpy.fft
import time:       284 |        284 |       numpy.polynomial.polyutils
import time:       568 |        568 |       numpy.polynomial._polybase
import time:       527 |       1379 |     numpy.polynomial.polynomial
import time:       431 |        431 |     numpy.polynomial.chebyshev
import time:       768 |        768 |     numpy.polynomial.legendre
import time:       436 |        436 |     numpy.polynomial.hermite
import time:       968 |        968 |     numpy.polynomial.hermite_e
import time:       492 |        492 |     numpy.polynomial.laguerre
import time:       321 |       4791 |   numpy.polynomial
import time:       261 |        261 |     numpy.random.info
import time:      5080 |       5080 |     numpy.random.mtrand
import time:       437 |       5777 |   numpy.random
import time:       478 |        478 |   numpy.ctypeslib
import time:      1338 |       1338 |       textwrap
import time:      2203 |       3540 |     numpy.ma.core
import time:       699 |        699 |     numpy.ma.extras
import time:       352 |       4591 |   numpy.ma
import time:     13171 |     180934 | numpy

Numpy version: 1.14.5
Python version: 3.7.0
Conda environment, on macos 10.13.5

kohr-h · 2018-07-01T12:47:51Z

On my machine/configuration, numpy.add_newdocs is the most expensive import: 82% of the time. In total, 180ms.

Of which almost 100 % is from numpy.lib, of which north of 90 % is from numpy.core. At that point it spreads out more, but numpy.testing makes up about 30 % of the time spent importing numpy.core.

horta · 2018-07-01T12:49:33Z

Graphically: https://transfer.sh/tUOuP/time.pdf

nschloe · 2018-07-01T15:08:20Z

Another graph using tuna. Reproduce with

python3.7 -X importtime -c "import numpy" 2> numpy.log
tuna numpy.log

njsmith · 2018-07-01T20:15:32Z

From that thread, tab completion is also important, it needs to stay working.

Tab completion is fine – all the mechanisms that let you override __getattr__ also let you override __dir__, so you can make sure dir(np) reports all the attributes you want to be tab-completable.

interesting. how much is "a little bit" [of overhead added by metamodule]?

On Python 3, it's ~100 ns per attribute lookup, and on Python 2 it's only ~10 ns per attribute lookup.

Python 3.6:

In [1]: # Attribute lookup alone
   ...: %timeit np.array
58.3 ns ± 8.48 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [2]: # Doing something simple
   ...: %timeit np.array([])
830 ns ± 41.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [3]: import metamodule

In [4]: metamodule.install("numpy")

In [5]: # Attribute lookup alone
   ...: %timeit np.array
165 ns ± 11.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [6]: # Doing something simple
   ...: %timeit np.array([])
970 ns ± 98.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Python 2.7:

In [1]: %timeit np.array
10000000 loops, best of 3: 38.7 ns per loop

In [2]: %timeit np.array([])
The slowest run took 18.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 590 ns per loop

In [3]: import metamodule

In [4]: metamodule.install("numpy")

In [5]: import numpy as np  # required on py2 b/c we're hackily installing this from outside numpy/__init__.py

In [6]: %timeit np.array
10000000 loops, best of 3: 57.1 ns per loop

In [7]: %timeit np.array([])
The slowest run took 16.42 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 610 ns per loop

Unfortunately when I was originally testing this it was on Python 2 so I didn't notice the issue. The underlying issue is probably not too hard to fix in CPython, but that wouldn't help until 3.8 comes out. I haven't tried timing the new module-level __getattr__ in 3.7, but someone should :-).

mhvk · 2018-07-01T20:41:40Z

Wouldn't one use the __getattr__ in timing, though? In that case it wouldn't slow down anything in the main namespace (of course, that would not help for deprecating int).

njsmith · 2018-07-01T20:58:38Z

@mhvk I assume you meant testing, not timing? And the answer is ... well, maybe :-). The simplest way to do this is to make it so that the testing subpackage doesn't get loaded at all until the first time someone accesses the numpy.testing attribute. For this, we'd need to add __getattr__ to the main numpy module. I guess it would also be possible to go through numpy.testing and make it so that loading the package itself is cheap, but then every attribute numpy.testing.<whatever> is auto-loaded on first access. But setting this up would be significantly more complex and error-prone though, just because there are a lot more attributes to sort out.

mhvk · 2018-07-01T21:12:14Z

Yes, 'testing I meant.... Its __init__ is fairly simple, with an __all__ that just contains _private.utils and two other items, so writing a __getattr__ for that should be nearly trivial.

But perhaps we can also consider the even less intrusive option of moving PytestTester elsewhere? (maybe just move the _private directory one level up?)

mattip · 2018-07-01T21:25:54Z

I would vote for creating a clean path for importing only the minimum of NumPy needed to use the multiarray container, without any of testing, polynomial, ma, financial, fft, ... Users can always import numpy.testing to get the desired submodule. Yes I know this is a breaking change, but I think the wider community would welcome it, since it would siginficantly simplify packaging numpy as part of a pyinstall/nuitka package.

rgommers · 2018-07-01T21:36:50Z

Yes I know this is a breaking change,

I think that's a non-starter. And that for a quite minor problem. Both a ~80 ms import time and PyInstaller packaging are relevant for only a very small subset of users. And once those users use anything more than bare numpy, (e.g. scipy, pandas, matplotlib - with all larger import times), it still doesn't matter to them. Same for PyInstaller - there are issues like getting a 500 MB minimum .exe size if you do it from an Anaconda install. Adding in pytest yes/no hardly matters.

eric-wieser · 2018-07-01T22:09:23Z

But perhaps we can also consider the even less intrusive option of moving PytestTester elsewhere?

I think this is a very good idea. Right now this creates all sorts of nasty circular imports, because every single submodule imports np.testing, which in turn imports np.lib.

kohr-h · 2018-07-01T22:19:57Z

FWIW, scipy itself without any submodule only adds 3 ms import time to numpy's on my machine. All of their heavier functionality in submodules has to be imported explicitly, which I think is a good thing.

For my case of a library that uses Numpy, import time is not a primary issue, although it does add up -- with pytest and some (subobtimal) own code it's not far to 300 ms import time, which is noticeable. If 3 packages we depend on scrape of 30 ms import time each, it makes quite a difference.
People who write CLIs and depend on Numpy are probably more concerned about this stuff.

horta · 2018-07-02T02:25:32Z

And that for a quite minor problem.

I believe that is a major problem: 120 ms solely for numpy import on a high-end laptop makes it unusable for CLI. And scipy/pandas/matplotlib along with any other package commonly used for scientific purposes all depend on numpy. There is no reason for those packages to speed-up their imports because their common package is an unavoidable blotteneck.

…ecessary import dependencies pytesttester is used by every single subpackage, so making it depend on np.testing just creates cyclic dependencies that can lead to circular imports Relates to numpy#11457

rgommers · 2018-07-02T16:53:02Z

FWIW, scipy itself without any submodule only adds 3 ms import time to numpy's on my machine.

Yeah that's because the scipy namespace is empty except for numpy functions. You never do import scipy, you always do from scipy import some_submodule. scipy.stats import time alone weighs in at 3x more than all of numpy.

I believe that is a major problem: 120 ms solely for numpy import on a high-end laptop makes it unusable for CLI.

everyone thinks their problem is major. there's a reason this only comes up every couple of years. it's simply not relevant for >99% of users.
100ms doesn't mean unusable for CLI, and shaving 25-30% off of that time isn't going to make it then that much more usable all of a sudden.

@eric-wieser made a nice start in gh-11473 with moving pytest imports. Further improvements are of course very welcome. Large breaking changes however are simply a no go.

mhvk · 2018-07-02T23:52:47Z

With gh-11473, the only remaining bit in numpy.__init__.py bringing in testing is

    # We don't actually use this ourselves anymore, but I'm not 100% sure that
    # no-one else in the world is using it (though I hope not)
    from .testing import Tester

@eric-wieser already had a suggestion on how to fix it. Somebody want to take this up?

charris · 2018-07-03T00:01:16Z

Several related PRs have now gone in, someone want to take a stab of adding something to the 1.16.0 release notes?

njsmith · 2018-07-03T00:02:39Z

You might already know this, but for the avoidance of doubt: we also have as part of our public API that you can write things like:

import numpy
numpy.testing.assert_equal(1, 1)

We can't break that, so even if Tester stops needing testing, we'll still need to somehow export testing from numpy/__init__.py.

eric-wieser · 2018-07-03T00:31:20Z

Perhaps we just fix this on python 3.7 where it's easy, and tell people who care to upgrade?

eric-wieser · 2018-07-03T06:05:29Z

As of #11474, the import tree is much more balanced - testing now only accounts for 18% of the total import time (and unittest is 12% of the total)

njsmith · 2018-07-03T06:22:15Z

@eric-wieser does the balancedness of that tree matter? (honest question)

eric-wieser · 2018-07-03T06:29:03Z

@njsmith: Not beyond making the tree easier to analyze - having add_newdocs look like 90% of numpy was just confusing.

I think a more balanced tree is less indicative of (allowable) circular imports though

kohr-h · 2018-07-03T12:48:51Z

Yeah that's because the scipy namespace is empty except for numpy functions. You never do import scipy, you always do from scipy import some_submodule. scipy.stats import time alone weighs in at 3x more than all of numpy.

The point is, you can choose what you want (up to inter-dependencies), and then pay the price of the import. Some of the packages are actually not that heavy. Numbers from another (more crappy) machine (Numpy is around 160 ms):

package	time [ms]	Numpy	Comment
`scipy.sparse`	194	82 %
`scipy.linalg`	205	78 %
`scipy.special`	218	73 %	imports `scipy.linalg`
`scipy.optimize`	351	46 %	imports `scipy.special`, `scipy.linalg` and `scipy.sparse`
`scipy.stats`	466	34 %	imports kind of everything

... and so on. stats and signal are heavy, others less so.

horta · 2018-07-03T13:05:09Z

Does anybody now what are the drawbacks of lazy loading modules? https://pypi.org/project/lazy-import/

horta · 2018-07-03T13:11:53Z

Maybe lazy loading for py37 only is a good idea: https://snarky.ca/lazy-importing-in-python-3-7/

njsmith · 2018-07-03T15:52:03Z

Lazy loading has two main downsides. The intrinsic one is that it can be surprising: it can hide ImportErrors or make them appear at strange places, performance is less predictable, etc. Also, since it's just moving around there cost, it doesn't help for modules that you do eventually end up loading anyway. For numpy.testing in particular I don't think these are an issue.

The other one is that implementing it is pretty complicated, so most implementations have quirks and subtle tradeoffs. You have to look carefully at things like speed impact, and I don't trust that lazy-import package without reading its source. (Also it's GPL, which isn't something numpy can use by policy.) Or there are also methods that involve hooking the import system itself...

It's doable, some projects have done it for a long time, it's probably the right thing for numpy.testing, but it's just non-trivial enough that no one has ever put all the pieces together and gotten something merged.

eric-wieser · 2018-07-06T19:46:16Z

Looking at the import times more carefully, 125ms / 195ms (65%) on my machine is spent importing builtin modules - so it's perhaps unfair to blame numpy for a large import time, especially if your code goes on to import the same builtins anyway.

mattip · 2018-07-18T21:53:13Z

I suggest we close this and if desired open a new issue reflecting the discussion here, alternatively we could retitle this one appropriately since the title is misleading

rgommers · 2018-07-21T18:56:33Z

I suggest we close this and if desired open a new issue reflecting the discussion here, alternatively we could retitle this one appropriately since the title is misleading

Not sure opening a new issue here is helpful, since I don't think there's anything left to do. Except perhaps a decision on lazy loading of numpy/testing yes/no, which we can do in this issue. Once we have that, we can close this.

If you have a more descriptive title, please go ahead and edit it, that's always useful.

mhvk · 2019-03-22T21:45:36Z

Rereading the discussion I agree with @rgommers that, essentially, there is nothing more to do, so I'm closing this issue.

hmaarrfk · 2019-07-23T02:41:15Z

As stated, I think blaming numpy.testing alone is probably misleading.

There are quite a few other modules that take time.

One of those "innocent" looking ones is:

platform. The only location where it is truely necessary is to create the variable IS_PYPY. That said, it seems to import all of Threading, which accounts for a large chunk of the import time. If detecting PyPy was hard and inconvenient, it might be justified, but in fact, it is as easy as "PyPy" in sys.version.
Threading is also imported for a Lock in the random module, which is a Cython module. I made a proof of concept where I imported fastrlock (ok, I know that we don't need reentrant locks, but I wanted something that look API compatible easily). The random modules are already cython, and thus this is a "small micro optimization" that doesn't add any cost. We can use the same locking primitives that fastrlock uses to speed up the whole module.
secrets is quite slow to import. Since we only need it for a few random bits, we can import what we need ourselves. https://github.com/numpy/numpy/pull/14083/files#diff-89944aec176617da993c6de4d9529348R251
As stated, by other UnitTest does take quite a bit of time. The warning in the comments indicates that it is likely only used by packages, that can find the relevant documentation to test numpy as needed.
pickle is quite slow too. pickle is a strange one, since many other libraries will import it, but from what I found by removing it, almost everywhere it was used except for numpy.core._methods, it is associated with a warning. Not sure if the omission of the warning there was an honest mistake or an API decision. But avoiding the import of pickle can speed things up for those that don't need it.
https://github.com/numpy/numpy/blob/master/numpy/core/_methods.py#L241
textwrap is not a trivial import. It is only used in 2 location where it makes the code "indented to according to a certain style". It doesn't seem worthwhile to use it to sanitize static strings. https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L166 https://github.com/numpy/numpy/blob/master/numpy/ma/core.py#L2448
Decimal takes time, but there isn't much you can do other than ruining code style.
pathlib is also not a trivial import. In fact the one location where it is imported directly has a comment stating that it should not be the prefered method. https://github.com/numpy/numpy/blob/master/numpy/compat/py3k.py#L105
shutils can be lazy imported in the two locations where it is used.

While some of these might be micro optimizations, I think many might be well justified to help improve numpy's import time in the near term especially seeing as these optimizations hit code that is considered soft deprecated or convenient for compatibility reasons that no longer exist (i.e. python 2 has been dropped).

I'm going to refer to numpy's own benchmarks regarding the amount of time it takes for numpy to import:
https://pv.github.io/numpy-bench/#bench_import.Import.time_numpy

According to those, on some computer somewhere, it might take about 900 ms to import numpy up from 700 ms in recent versions. While the benchmarks running on my laptop are not that slow, it also isn't the cheapest laptop around.

Here is a PR made to my own branch showing the changes in case anybody wanted to glance at them: #14083

And an image of the improvements as I made the changes.

I'm happy to cleanup the changes as required.
Other relevant discussion here:
https://news.ycombinator.com/item?id=16978932 linking to a post where python core devs are worried about import time as well.

I get that in many application the caller will likely import Threading, or pathlib or platform themselves, and thus their application will not see the overall benefit or removing all 3 imports, but they might see the slight improvement in removing one of the many dependencies that aren't critical, or, at the very least, they might have a nice way of lazy importing them themselves.

mhvk mentioned this issue Jul 1, 2018

importing astropy takes very long astropy/astropy#4598

Closed

eric-wieser mentioned this issue Jul 1, 2018

MAINT: Move pytesttester outside of np.testing, to avoid creating unnecessary import dependencies #11473

Merged

mattip added 01 - Enhancement 57 - Close? Issues which may be closable unless discussion continued labels Jul 19, 2018

rgommers added 54 - Needs decision and removed 57 - Close? Issues which may be closable unless discussion continued labels Jul 21, 2018

rht mentioned this issue Dec 13, 2018

quad: Import sympy only when necessary QuantEcon/QuantEcon.py#459

Merged

mhvk closed this as completed Mar 22, 2019

hmaarrfk mentioned this issue Jul 23, 2019

WIP, MAINT: Improve import time #14083

Closed

mattip mentioned this issue Aug 27, 2019

It seemed that NumPy spend a lot of time on loading packages #14374

Closed

eric-wieser mentioned this issue May 31, 2020

DOC: Fix np.ma.core.doc_note #16311

Merged

numpy.testing accounts for almost 30 % import time #11457

numpy.testing accounts for almost 30 % import time #11457

Comments

kohr-h commented Jun 29, 2018

mhvk commented Jun 30, 2018

seberg commented Jun 30, 2018

eric-wieser commented Jun 30, 2018

njsmith commented Jun 30, 2018

rgommers commented Jul 1, 2018

kohr-h commented Jul 1, 2018

horta commented Jul 1, 2018 • edited Loading

kohr-h commented Jul 1, 2018

horta commented Jul 1, 2018

nschloe commented Jul 1, 2018 • edited Loading

njsmith commented Jul 1, 2018

mhvk commented Jul 1, 2018

njsmith commented Jul 1, 2018

mhvk commented Jul 1, 2018

mattip commented Jul 1, 2018

rgommers commented Jul 1, 2018

eric-wieser commented Jul 1, 2018

kohr-h commented Jul 1, 2018

horta commented Jul 2, 2018

rgommers commented Jul 2, 2018

mhvk commented Jul 2, 2018

charris commented Jul 3, 2018

njsmith commented Jul 3, 2018

eric-wieser commented Jul 3, 2018

eric-wieser commented Jul 3, 2018 • edited Loading

njsmith commented Jul 3, 2018

eric-wieser commented Jul 3, 2018

kohr-h commented Jul 3, 2018

horta commented Jul 3, 2018

horta commented Jul 3, 2018

njsmith commented Jul 3, 2018

eric-wieser commented Jul 6, 2018

mattip commented Jul 18, 2018

rgommers commented Jul 21, 2018

mhvk commented Mar 22, 2019

hmaarrfk commented Jul 23, 2019

horta commented Jul 1, 2018 •

edited

Loading

nschloe commented Jul 1, 2018 •

edited

Loading

eric-wieser commented Jul 3, 2018 •

edited

Loading