`prange` in trees #14037

adrinjalali · 2019-06-07T07:38:14Z

related to #13213 and triggered by #12887 (comment)

we train trees in parallel in tree ensembles (IIRC), so I'm not sure how much we'd benefit from using prange in the tree code base. Still, some benchmarks would be nice before we decide how to proceed.

The benchmark would compare the master, with a version of trees where some of the for loops on range are done with prange instead, and maybe run the ensembles with only one job.

The text was updated successfully, but these errors were encountered:

jeremiedbb · 2019-06-07T08:43:51Z

some of the for loops on range are done with prange instead, and maybe run the ensembles with only one job.

Hard to tell without benchmarks but I doubt that this approach would be faster. Parallelizing the outermost loop is usually the fastest (unless there are only a few steps in that loop => won't use all cores). Moreover if there's inherent sequential part in the trees, it would hurt the performances.

NicolasHug · 2019-06-07T15:46:43Z

The most obvious parallelization would be to parallelize the split finding procedure over each feature since they're independent.

But as Jeremie noted that could lead to over subscription, typically for forests.

I feel like the only benefit would be to grow single trees.

amueller · 2019-06-07T15:48:28Z

I was meeting with people at JHU this week who are working on this:
https://github.com/neurodata/RerF
and they build several trees in parallel on the same core, I think, and said it was super fast. I haven't looked into it, though.

NicolasHug · 2019-06-07T19:35:50Z

I would tag this as hard. it requires a pretty solid knowledge of the trees, and some cython code.

(BTW, "intermediate" and "moderate" are synonyms to me, assuming a "easy / something_in_between / hard" scale.)

venkyyuvy · 2021-09-18T08:18:55Z

Can I work on the benchmark?
The idea is to measure the time taken for the master and a new version which replaces range with prange?

adrinjalali added help wanted Intermediate Needs Benchmarks A tag for the issues and PRs which require some benchmarks labels Jun 7, 2019

adrinjalali mentioned this issue Jun 7, 2019

[MRG] Adds Minimal Cost-Complexity Pruning to Decision Trees #12887

Merged

adrinjalali added Hard Hard level of difficulty and removed Intermediate labels Jun 14, 2019

cmarmo added the module:tree label Mar 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`prange` in trees #14037

`prange` in trees #14037

adrinjalali commented Jun 7, 2019

jeremiedbb commented Jun 7, 2019

NicolasHug commented Jun 7, 2019

amueller commented Jun 7, 2019

NicolasHug commented Jun 7, 2019

venkyyuvy commented Sep 18, 2021

prange in trees #14037

prange in trees #14037

Comments

adrinjalali commented Jun 7, 2019

jeremiedbb commented Jun 7, 2019

NicolasHug commented Jun 7, 2019

amueller commented Jun 7, 2019

NicolasHug commented Jun 7, 2019

venkyyuvy commented Sep 18, 2021

`prange` in trees #14037

`prange` in trees #14037