Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prange in trees #14037

Open
adrinjalali opened this issue Jun 7, 2019 · 5 comments
Open

prange in trees #14037

adrinjalali opened this issue Jun 7, 2019 · 5 comments
Labels
Hard Hard level of difficulty help wanted module:tree Needs Benchmarks A tag for the issues and PRs which require some benchmarks

Comments

@adrinjalali
Copy link
Member

related to #13213 and triggered by #12887 (comment)

we train trees in parallel in tree ensembles (IIRC), so I'm not sure how much we'd benefit from using prange in the tree code base. Still, some benchmarks would be nice before we decide how to proceed.

The benchmark would compare the master, with a version of trees where some of the for loops on range are done with prange instead, and maybe run the ensembles with only one job.

@jeremiedbb
Copy link
Member

some of the for loops on range are done with prange instead, and maybe run the ensembles with only one job.

Hard to tell without benchmarks but I doubt that this approach would be faster. Parallelizing the outermost loop is usually the fastest (unless there are only a few steps in that loop => won't use all cores). Moreover if there's inherent sequential part in the trees, it would hurt the performances.

@NicolasHug
Copy link
Member

The most obvious parallelization would be to parallelize the split finding procedure over each feature since they're independent.

But as Jeremie noted that could lead to over subscription, typically for forests.

I feel like the only benefit would be to grow single trees.

@amueller
Copy link
Member

amueller commented Jun 7, 2019

I was meeting with people at JHU this week who are working on this:
https://github.com/neurodata/RerF
and they build several trees in parallel on the same core, I think, and said it was super fast. I haven't looked into it, though.

@NicolasHug
Copy link
Member

I would tag this as hard. it requires a pretty solid knowledge of the trees, and some cython code.

(BTW, "intermediate" and "moderate" are synonyms to me, assuming a "easy / something_in_between / hard" scale.)

@adrinjalali adrinjalali added Hard Hard level of difficulty and removed Intermediate labels Jun 14, 2019
@venkyyuvy
Copy link
Contributor

Can I work on the benchmark?
The idea is to measure the time taken for the master and a new version which replaces range with prange?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Hard Hard level of difficulty help wanted module:tree Needs Benchmarks A tag for the issues and PRs which require some benchmarks
Projects
None yet
Development

No branches or pull requests

6 participants