categorical predictors in randomForest #21932

markusloecher · 2021-12-09T16:31:51Z

Describe the workflow you want to enable

I would like to pass categorical predictors to the sklearn randomForestRegressor/Classifier without the need to dummy code them.
Its potential improvement of predictive power is significant and this idea has been implemented in most competing randomForest packages in R (e.g. ranger or randomForest) as well as the h20 library.

Describe your proposed solution

Following the documentation in ranger, I propose to add the parameter respect.unordered.factors:

Unordered categorical predictors should be handled in 3 different ways by using respect.unordered.factors: For 'ignore' all factors are regarded ordered, for 'partition' all possible 2-partitions are considered for splitting. For 'order' and 2-class classification the factor levels are ordered by their proportion falling in the second class, for regression by their mean response, as described in Hastie et al. (2009), chapter 9.2.4. The use of 'order' is recommended, as it computationally fast and can handle an unlimited number of factor levels. Note that the factors are only reordered once and not again in each split.

Describe alternatives you've considered, if relevant

No response

Additional context

The combinatorial search can be avoided in the case of binary classification or regression as shown by Breiman in his original work.

The text was updated successfully, but these errors were encountered:

thomasjpfan · 2021-12-09T17:09:56Z

Thank you for opening this PR! The most recent work on categories in trees is at #12866 Currently the tree code is hard to maintain and I am working on a tree refactor/redesign to make it easier to add tree based features, which includes categorical features.

markusloecher · 2021-12-09T17:29:47Z

Thanks a lot for this pointer ! I had searched the issues only for randomForest but should have broadened to trees.

markusloecher added the New Feature label Dec 9, 2021

thomasjpfan added module:ensemble module:tree labels Dec 9, 2021

glemaitre added this to Support for categorical variable May 17, 2024

glemaitre moved this to Discussion in Support for categorical variable May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

categorical predictors in randomForest #21932

categorical predictors in randomForest #21932

markusloecher commented Dec 9, 2021

thomasjpfan commented Dec 9, 2021 •

edited

Loading

markusloecher commented Dec 9, 2021

categorical predictors in randomForest #21932

categorical predictors in randomForest #21932

Comments

markusloecher commented Dec 9, 2021

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

thomasjpfan commented Dec 9, 2021 • edited Loading

markusloecher commented Dec 9, 2021

thomasjpfan commented Dec 9, 2021 •

edited

Loading