-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Description
Describe the workflow you want to enable
Suppose you wanna impute a bool array. Because it has NaNs, it's gonna be of dtype float and work fine:
>>> np.asarray([[True, False, np.nan]]).dtype
dtype('float64')However, now suppose that the value you pass actually has no NaNs (e.g., because the current data happens to have no NaNs), so there's nothing to impute. In this case, the array is of type bool:
>>> np.asarray([[True, False]]).dtype
dtype('bool')Which makes the imputation fail because bool isn't supported:
import numpy as np
from sklearn.impute import SimpleImputer
SimpleImputer(strategy="most_frequent").fit(np.asarray([[True, False]]))This previous code generates an exception:
Traceback (most recent call last):
# ...
File ".../lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File ".../lib/python3.10/site-packages/sklearn/base.py", line 878, in fit_transform
return self.fit(X, **fit_params).transform(X)
File ".../lib/python3.10/site-packages/sklearn/impute/_base.py", line 390, in fit
X = self._validate_input(X, in_fit=True)
File ".../lib/python3.10/site-packages/sklearn/impute/_base.py", line 352, in _validate_input
raise ValueError(
ValueError: SimpleImputer does not support data with dtype bool. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object type.
This is because np.asarray([[True, False]]).dtype.kind is "b", which is not in {"i", "u", "f", "O"} and so it fails.
Describe your proposed solution
My solution is to support bool. I believe it shouldn't be a great effort.
Describe alternatives you've considered, if relevant
An alternative solution is for me to create a custom imputer class that copy-pastes the behavior I want from SimpleImputer and makes it work for bools.
Additional context
No response