-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threads awaiting for GIL in Forest estimators #20666
Comments
From #20651 (reply in thread), two small optimizations could improve the scalability when using threads:
For reference, here is the extract of the sequential execution profiling: |
I think threading is still the most appropriate backend by default. In #20651 the individual trees are very fast to train (4ms) and many trees are fitted (1000) which is not representative of the typical use case. If the tree fitting time lasted 1s or more, then the GIL-holding segments of the code would be negligible and the thread-based scalability would be fine. |
I agree, it's dataset dependent. Switch out |
hey @ogrisel @brianbien , could you please help me resolving this issue? I would like to contribute to it |
@Bhavay192 Feel free to have a look at the code of the random forests and the decision trees to try to open a PR to solve one item of #20666 (comment) at a time to keep the PR focused. Feel free to ask specific questions on gitter if you need more help: https://gitter.im/scikit-learn/scikit-learn |
Before opening a PR for this issue, we should reevaluate with a free-threading build of scikit-learn on CPython 3.13. Here are some resources to get started: https://py-free-threading.github.io/ Scikit-learn's CI also automatically publishes nightly wheels for https://scikit-learn.org/stable/developers/advanced_installation.html#installing-nightly-builds |
Discussed in #20651
In forest algorithms, the preferred parallelization backend is
threading
. However, it looks that it is not anymore the most appropriate backend. As discussed here, it might be that the GIL is not explicitly released in some part of the code locking the execution of the thread.We need to investigate more to solve this issue.
The text was updated successfully, but these errors were encountered: