-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prange
in trees
#14037
Comments
Hard to tell without benchmarks but I doubt that this approach would be faster. Parallelizing the outermost loop is usually the fastest (unless there are only a few steps in that loop => won't use all cores). Moreover if there's inherent sequential part in the trees, it would hurt the performances. |
The most obvious parallelization would be to parallelize the split finding procedure over each feature since they're independent. But as Jeremie noted that could lead to over subscription, typically for forests. I feel like the only benefit would be to grow single trees. |
I was meeting with people at JHU this week who are working on this: |
I would tag this as hard. it requires a pretty solid knowledge of the trees, and some cython code. (BTW, "intermediate" and "moderate" are synonyms to me, assuming a "easy / something_in_between / hard" scale.) |
Can I work on the benchmark? |
related to #13213 and triggered by #12887 (comment)
we train trees in parallel in tree ensembles (IIRC), so I'm not sure how much we'd benefit from using
prange
in the tree code base. Still, some benchmarks would be nice before we decide how to proceed.The benchmark would compare the master, with a version of trees where some of the for loops on
range
are done withprange
instead, and maybe run the ensembles with only one job.The text was updated successfully, but these errors were encountered: