Skip to content

Add support for bools in SimpleImputer #26292

@bryant1410

Description

@bryant1410

Describe the workflow you want to enable

Suppose you wanna impute a bool array. Because it has NaNs, it's gonna be of dtype float and work fine:

>>> np.asarray([[True, False, np.nan]]).dtype
dtype('float64')

However, now suppose that the value you pass actually has no NaNs (e.g., because the current data happens to have no NaNs), so there's nothing to impute. In this case, the array is of type bool:

>>> np.asarray([[True, False]]).dtype
dtype('bool')

Which makes the imputation fail because bool isn't supported:

import numpy as np
from sklearn.impute import SimpleImputer

SimpleImputer(strategy="most_frequent").fit(np.asarray([[True, False]]))

This previous code generates an exception:

Traceback (most recent call last):
  # ...
  File ".../lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File ".../lib/python3.10/site-packages/sklearn/base.py", line 878, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File ".../lib/python3.10/site-packages/sklearn/impute/_base.py", line 390, in fit
    X = self._validate_input(X, in_fit=True)
  File ".../lib/python3.10/site-packages/sklearn/impute/_base.py", line 352, in _validate_input
    raise ValueError(
ValueError: SimpleImputer does not support data with dtype bool. Please provide either a numeric array (with a floating point or integer dtype) or categorical data represented either as an array with integer dtype or an array of string values with an object type.

This is because np.asarray([[True, False]]).dtype.kind is "b", which is not in {"i", "u", "f", "O"} and so it fails.

Describe your proposed solution

My solution is to support bool. I believe it shouldn't be a great effort.

Describe alternatives you've considered, if relevant

An alternative solution is for me to create a custom imputer class that copy-pastes the behavior I want from SimpleImputer and makes it work for bools.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    ModerateAnything that requires some knowledge of conventions and best practicesNew Featurehelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions