Skip to content

SequentialFeatureSelector is not working with ColumnTransformer #25711

@Crispy13

Description

@Crispy13

Describe the bug

Please see the code.

Steps/Code to Reproduce

from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer

import pandas as pd
import numpy as np


# dummy data
N = 100
dummy_x = pd.DataFrame(
    np.random.randn(N,3),
    columns = list('abc'),
)

dummy_y = pd.DataFrame(
    np.random.choice([0,1], size= (N,1)),
    columns = ['label'],
)


num_pipeline = Pipeline([
        ('imputer', SimpleImputer(strategy="median")),
        ('std_scaler', StandardScaler()),
    ])

ct_parts = [
                ('num', num_pipeline, [0,1,2]),
]

data_preparation_pipe = ColumnTransformer(ct_parts, remainder='passthrough')


from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.ensemble import GradientBoostingClassifier

model = Pipeline(
    [
        # ('data_prep', num_pipeline),
        ('data_prep', data_preparation_pipe),
        ('ML', GradientBoostingClassifier()),
    ]
)

sfs = SequentialFeatureSelector(
    model,
)

sfs.fit(dummy_x, dummy_y)

Expected Results

No error

Actual Results

Traceback (most recent call last):
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/tmp/ipykernel_3433582/3663298101.py", line 1, in <module>
    sfs.fit(dummy_x, dummy_y)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py", line 268, in fit
    new_feature_idx, new_score = self._get_best_new_feature_score(
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py", line 299, in _get_best_new_feature_score
    scores[feature_idx] = cross_val_score(
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 515, in cross_val_score
    cv_results = cross_validate(
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 285, in cross_validate
    _warn_or_raise_about_fit_failures(results, error_score)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 367, in _warn_or_raise_about_fit_failures
    raise ValueError(all_fits_failed_message)
ValueError: 
All the 5 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.

Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 416, in _get_column_indices
    idx = _safe_indexing(np.arange(n_columns), key)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 356, in _safe_indexing
    return _array_indexing(X, indices, indices_dtype, axis=axis)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 185, in _array_indexing
    return array[key] if axis == 0 else array[:, key]
IndexError: index 1 is out of bounds for axis 0 with size 1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/pipeline.py", line 401, in fit
    Xt = self._fit(X, y, **fit_params_steps)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/pipeline.py", line 359, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/joblib/memory.py", line 349, in __call__
    return self.func(*args, **kwargs)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/pipeline.py", line 893, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 724, in fit_transform
    self._validate_column_callables(X)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 426, in _validate_column_callables
    transformer_to_input_indices[name] = _get_column_indices(X, columns)
  File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 418, in _get_column_indices
    raise ValueError(
ValueError: all features must be in [0, 0] or [-1, 0]

Versions

System:
    python: 3.10.9 | packaged by conda-forge | (main, Feb  2 2023, 20:20:04) [GCC 11.3.0]
executable: /software/anaconda3/envs/TOSC_ML/bin/python
   machine: Linux-4.18.0-305.3.1.el8.x86_64-x86_64-with-glibc2.28

Python dependencies:
      sklearn: 1.2.1
          pip: 23.0
   setuptools: 67.3.2
        numpy: 1.23.5
        scipy: 1.10.0
       Cython: None
       pandas: 1.5.3
   matplotlib: 3.7.0
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /software/anaconda3/envs/TOSC_ML/lib/libopenblasp-r0.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
    num_threads: 128

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /software/anaconda3/envs/TOSC_ML/lib/libgomp.so.1.0.0
        version: None
    num_threads: 128

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugModerateAnything that requires some knowledge of conventions and best practiceshelp wanted

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions