Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: Replace range by prange when applicable #13213

Open
jeremiedbb opened this issue Feb 21, 2019 · 7 comments
Open

PERF: Replace range by prange when applicable #13213

jeremiedbb opened this issue Feb 21, 2019 · 7 comments
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices Performance

Comments

@jeremiedbb
Copy link
Member

We have recently enabled OpenMP support in sklearn, so let's use it !

It would be worth to check if there are place in sklearn cython code where we can use a prange loop instead of a range loop for instance.

@rth
Copy link
Member

rth commented Feb 21, 2019

Do we have a way to avoid thread over-subscription with BLAS and joblib while doing this?

@jeremiedbb
Copy link
Member Author

Absolutely ! not on master however :/
Currently I put it in #11950 for now because there was no use elsewhere yet. But I can make a separate PR for that.

It's adapted from some code of loky. Essentially loop over dynamically shared libs to check which blas / openmp is loaded and then use the appropriate method to dynamically set the max number of threads they are allowed to use.

@rth
Copy link
Member

rth commented Feb 21, 2019

Nice! Yes, maybe splitting it out of #11950 would make review of that one easier as well.

@jnothman
Copy link
Member

Does the user need a global way to control how many processors are available beyond explicit use of n_jobs?

@jeremiedbb
Copy link
Member Author

I'm not sure to understand. Here's my thought.

When inside code parallelized using joblib, there's no need to add a second level of parallelism with prange. At least for now, while cython only supports OpenMP. Maybe when cython supports TBB it could be worth.

Inside code not parallelized, there may be some places where we could use a prange. However, there might be some BLAS calls inside these loops. In that case, we need to control the number of threads of the BLAS to avoid oversubscription. I made a contextmanager for that, to locally control BLAS (it can be used to control openmp also but no need for now).

@thomasjpfan
Copy link
Member

By default, joblib's 'loky' backend forces workers to use only one thread: https://joblib.readthedocs.io/en/latest/parallel.html#avoiding-over-subscription-of-cpu-ressources

To cope with this problem, joblib forces by default supported third-party libraries to use only one thread in workers with the 'loky' backend.

@rmenuet
Copy link
Contributor

rmenuet commented Feb 28, 2019

FYI, tested switching Joblib for prange in nearest neighbors tree-based queries: it did not improve performance but deactivating some debugging stats did

See #13330 and #13331

@cmarmo cmarmo added Moderate Anything that requires some knowledge of conventions and best practices Performance help wanted and removed Sprint labels Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Moderate Anything that requires some knowledge of conventions and best practices Performance
Projects
None yet
Development

No branches or pull requests

7 participants