-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Open
Labels
Breaking ChangeIssue resolution would not be easily handled by the usual deprecation cycle.Issue resolution would not be easily handled by the usual deprecation cycle.Needs BenchmarksA tag for the issues and PRs which require some benchmarksA tag for the issues and PRs which require some benchmarksNeeds DecisionRequires decisionRequires decisionNew FeaturePerformancemodule:ensemble
Description
Related issues: #25210
Current State
HistGradientBootingClassifier and HistGradientBootingRegressor both:
- Calculate the sample size
countin histograms - Use
countfor splitting (mostly excluding split candidates) - Save the
countin the final trees and use it in partial dependence computations.
Proposition
- Evaluate if removing
countfrom the histograms (LightGBM only sums gradient and hessian in histograms, no count) gives a good speed-up .(I measured a roughly 10-20% speed-up.)
Edit: LightGBM uses an approximate count based on the hessian to check for min sample size. So this might not be what we want. - Add an option to save counts and sample weights to final trees at the very end of
fit(where the binned trainingXis still available). - Use partial dependence
method='recursion'if the above option was set, else usemethod='brute'.
Why?
#25431 concluded that adding weights to the trees is too expensive. The above proposition gives a user a clear choice: Faster training time or faster pdp afterwards.
adrinjalali
Metadata
Metadata
Assignees
Labels
Breaking ChangeIssue resolution would not be easily handled by the usual deprecation cycle.Issue resolution would not be easily handled by the usual deprecation cycle.Needs BenchmarksA tag for the issues and PRs which require some benchmarksA tag for the issues and PRs which require some benchmarksNeeds DecisionRequires decisionRequires decisionNew FeaturePerformancemodule:ensemble