-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Open
Labels
BugModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practiceshelp wanted
Description
Describe the bug
Please see the code.
Steps/Code to Reproduce
from sklearn.impute import SimpleImputer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.compose import ColumnTransformer
import pandas as pd
import numpy as np
# dummy data
N = 100
dummy_x = pd.DataFrame(
np.random.randn(N,3),
columns = list('abc'),
)
dummy_y = pd.DataFrame(
np.random.choice([0,1], size= (N,1)),
columns = ['label'],
)
num_pipeline = Pipeline([
('imputer', SimpleImputer(strategy="median")),
('std_scaler', StandardScaler()),
])
ct_parts = [
('num', num_pipeline, [0,1,2]),
]
data_preparation_pipe = ColumnTransformer(ct_parts, remainder='passthrough')
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.ensemble import GradientBoostingClassifier
model = Pipeline(
[
# ('data_prep', num_pipeline),
('data_prep', data_preparation_pipe),
('ML', GradientBoostingClassifier()),
]
)
sfs = SequentialFeatureSelector(
model,
)
sfs.fit(dummy_x, dummy_y)Expected Results
No error
Actual Results
Traceback (most recent call last):
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3460, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "/tmp/ipykernel_3433582/3663298101.py", line 1, in <module>
sfs.fit(dummy_x, dummy_y)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py", line 268, in fit
new_feature_idx, new_score = self._get_best_new_feature_score(
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/feature_selection/_sequential.py", line 299, in _get_best_new_feature_score
scores[feature_idx] = cross_val_score(
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 515, in cross_val_score
cv_results = cross_validate(
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 285, in cross_validate
_warn_or_raise_about_fit_failures(results, error_score)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 367, in _warn_or_raise_about_fit_failures
raise ValueError(all_fits_failed_message)
ValueError:
All the 5 fits failed.
It is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
5 fits failed with the following error:
Traceback (most recent call last):
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 416, in _get_column_indices
idx = _safe_indexing(np.arange(n_columns), key)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 356, in _safe_indexing
return _array_indexing(X, indices, indices_dtype, axis=axis)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 185, in _array_indexing
return array[key] if axis == 0 else array[:, key]
IndexError: index 1 is out of bounds for axis 0 with size 1
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/pipeline.py", line 401, in fit
Xt = self._fit(X, y, **fit_params_steps)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/pipeline.py", line 359, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/joblib/memory.py", line 349, in __call__
return self.func(*args, **kwargs)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/pipeline.py", line 893, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 724, in fit_transform
self._validate_column_callables(X)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 426, in _validate_column_callables
transformer_to_input_indices[name] = _get_column_indices(X, columns)
File "/software/anaconda3/envs/TOSC_ML/lib/python3.10/site-packages/sklearn/utils/__init__.py", line 418, in _get_column_indices
raise ValueError(
ValueError: all features must be in [0, 0] or [-1, 0]Versions
System:
python: 3.10.9 | packaged by conda-forge | (main, Feb 2 2023, 20:20:04) [GCC 11.3.0]
executable: /software/anaconda3/envs/TOSC_ML/bin/python
machine: Linux-4.18.0-305.3.1.el8.x86_64-x86_64-with-glibc2.28
Python dependencies:
sklearn: 1.2.1
pip: 23.0
setuptools: 67.3.2
numpy: 1.23.5
scipy: 1.10.0
Cython: None
pandas: 1.5.3
matplotlib: 3.7.0
joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /software/anaconda3/envs/TOSC_ML/lib/libopenblasp-r0.3.21.so
version: 0.3.21
threading_layer: pthreads
architecture: Zen
num_threads: 128
user_api: openmp
internal_api: openmp
prefix: libgomp
filepath: /software/anaconda3/envs/TOSC_ML/lib/libgomp.so.1.0.0
version: None
num_threads: 128avm19, pascal456, NoPenguinsLand and Salonimalpani08
Metadata
Metadata
Assignees
Labels
BugModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practiceshelp wanted