Replies: 3 comments 13 replies
-
As you see in the snippet of code of the # %%
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
# %%
from joblib import parallel_backend
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
n_estimators = 1_000
bagged_decision_trees = BaggingRegressor(DecisionTreeRegressor(), n_estimators=n_estimators, n_jobs=-1)
random_forest = RandomForestRegressor(n_estimators=n_estimators, n_jobs=-1)
# %%
%%timeit
with parallel_backend('threading', n_jobs=2):
bagged_decision_trees.fit(X, y)
# %%
%%timeit
with parallel_backend('loky', n_jobs=2):
bagged_decision_trees.fit(X, y)
# %%
%%timeit
with parallel_backend('threading', n_jobs=2):
random_forest.fit(X, y)
# %%
%%timeit
with parallel_backend('loky', n_jobs=2):
random_forest.fit(X, y) |
Beta Was this translation helpful? Give feedback.
-
On GNU/Linux, one can have a look at the way threads are scheduled and identify if the GIL gets locked using Using this setup on this script: # rf_debug.py
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
def main(args=None):
X, y = load_boston(return_X_y=True)
rf = RandomForestRegressor(n_estimators=1000, n_jobs=4)
rf.fit(X, y)
if __name__ == "__main__":
main() by running: giltracer --state-detect rf_debug.py One get several reports, which can be visualized with Perfetto. Summary of threads interaction with the GILIn what follows:
I have not been able to get results for For
For
Zooming on the kernel's scheduler profiling (exported in Probably, some Python code prior to the Cython implementation takes the GIL in |
Beta Was this translation helpful? Give feedback.
-
I'm going to convert back this discussion to an issue such that we investigate the core reason for the GIL to no be released. |
Beta Was this translation helpful? Give feedback.
-
What kind of speedup should we expect from more CPUs with RandomForestRegressor? I seem to get a max of ~2x across 3 environments I've tested, up to 64 cores. I think it relates to the preference for threads:
scikit-learn/sklearn/ensemble/_forest.py
Lines 381 to 393 in 2beed55
RandomForestRegressor (or ExtraTreesRegressor) is showing a much smaller multiprocessing speedup than I would expect if cpu parallelization were fully taken advantage of.
For comparison, in any environment, I get the expected multi-core speedup with other sklearn multiprocess calls (
cross_val_score
orBaggingRegressor
both scale up ~proportionally in speed with n_jobs=-1 to the number of physical cores.Mac:
Windows:
Test script
Beta Was this translation helpful? Give feedback.
All reactions