-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA Categorical split support for DecisionTree*
, ExtraTree*
, RandomForest*
and `ExtraTrees*
#29437
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Adam Li <[email protected]>
❌ Linting issuesThis PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling You can see the details of the linting issues under the
|
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
…into partition
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Signed-off-by: Adam Li <[email protected]>
Reference Issues/PRs
Supersedes #12866 #4899 and #7068 , #3346
What does this implement/fix? Explain your changes.
Implements splitting rules when categorical data is given in a decision tree classifier/regressor.
ParentInfo
,SplitRecord
,Node
) ->_utils.pxd
Open Questions / TODO
breiman_shortcut
be part ofSplitter
? It is completely unused byRandomSplitter
, so it seems weird to pass in an argument that is unnecessary. Otoh, if we specialize thebreiman_shortcut
only forBestSplitter
, then we need to also specialize howbreiman_shortcut
is passed inBaseDecisionTree
.Any other comments?
May be related to #24967
Would be nice to merge in #29458
Notes:
n_categories
, so we can trim down the number of categories. I think this can be implemented last and then benchmarked to determine if this optimization is useful or not. I suspect it is since for categorical splits, there is time involved to "count the unique categories", which seems unnecessary. This is conceptually similar to tracking constant features