-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Summary
While missing-value support for decision trees have been added recently, they only work when encoded in a dense array. Since RandomForest* and ExtraTrees* both support sparse X, if a user encodes np.nan inside sparse X, it should still work.
Solution
Add missing-value logic in SparsePartitioner in _parititoner.pyx, BestSparseSplitter and RandomSparseSplitter in _splitter.pyx.
The logic is the same as in the dense case, but just has to handle the fact that X is now sparse CSC array format.
Misc.
FYI https://github.com/scikit-learn/scikit-learn/pull/27966 will introduce native support for missing values in the `ExtraTree*` models (i.e. random splitter).
One thing I noticed though as I went through the PR is that the current codebase still does not support missing values in the sparse splitter. I think this might be pretty easy to add, but should we re-open this issue technically?
Xref: #5870 (comment)
Originally posted by @adam2392 in #5870 (comment)