-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Replace range by prange when applicable #13213
Comments
Do we have a way to avoid thread over-subscription with BLAS and joblib while doing this? |
Absolutely ! not on master however :/ It's adapted from some code of loky. Essentially loop over dynamically shared libs to check which blas / openmp is loaded and then use the appropriate method to dynamically set the max number of threads they are allowed to use. |
Nice! Yes, maybe splitting it out of #11950 would make review of that one easier as well. |
Does the user need a global way to control how many processors are available beyond explicit use of n_jobs? |
I'm not sure to understand. Here's my thought. When inside code parallelized using joblib, there's no need to add a second level of parallelism with prange. At least for now, while cython only supports OpenMP. Maybe when cython supports TBB it could be worth. Inside code not parallelized, there may be some places where we could use a prange. However, there might be some BLAS calls inside these loops. In that case, we need to control the number of threads of the BLAS to avoid oversubscription. I made a contextmanager for that, to locally control BLAS (it can be used to control openmp also but no need for now). |
By default, joblib's
|
We have recently enabled OpenMP support in sklearn, so let's use it !
It would be worth to check if there are place in sklearn cython code where we can use a prange loop instead of a range loop for instance.
The text was updated successfully, but these errors were encountered: