-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegressorChain support for Pipelines including ColumnTransformer #20557
Comments
Currently the regression chain needs to slice the data which means it is converting the data into numpy arrays before passing it into the base estimator: scikit-learn/sklearn/multioutput.py Line 521 in a5a9d17
At a glance, I think the implementation can be adjusted to preserve the dataframe, and ultimately passthrough a pandas dataframe into the base estimator. |
It is somehow quite linked to validation in meta-estimator that we are dealing with with |
So can one say we can remove validation here, just use |
It's a bit more involved because scikit-learn/sklearn/multioutput.py Lines 607 to 613 in 4ee3fdd
Which it will later slice: scikit-learn/sklearn/multioutput.py Line 633 in 4ee3fdd
To pass a DataFrame to
With that in mind, I'm labeling this as hard. |
Describe the bug
I can't seem to get the
RegressorChain
working with pipelines that include aColumnTransformer
. I posted an issue on StackOverflow with more: https://stackoverflow.com/questions/68430993/sklearn-using-regressorchain-with-columntransformer-in-pipelines .Somewhere in
__init__.py / _get_column_indices(X, key)
this call fails:all_columns = X.columns
saying'numpy.ndarray' object has no attribute 'columns'
. Because this is a known issue withColumnTransformer
, I suspect theRegressorChain
can't be used with it.I'm not sure if this is a supported scenario, but the documentation for RegressorChain (https://scikit-learn.org/stable/modules/generated/sklearn.multioutput.RegressorChain.html), for
set_params
, includes this:"The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form __ so that it’s possible to update each component of a nested object."
So I was led to assume it would also work with Pipelines including the column transformer.
Steps/Code to Reproduce
Any example with a Pipeline containing a ColumnTransformer and a Regressor. The StackOverflow link I included above has my code.
Expected Results
Fitted pipeline.
Actual Results
Versions
System:
python: 3.8.10 (default, May 19 2021, 13:12:57) [MSC v.1916 64 bit (AMD64)]
executable: C:\ProgramData\Anaconda3\envs\py38aml\python.exe
machine: Windows-10-10.0.22000-SP0
Python dependencies:
pip: 21.1.3
setuptools: 52.0.0.post20210125
sklearn: 0.24.2
numpy: 1.20.2
scipy: 1.6.2
Cython: None
pandas: 1.2.5
matplotlib: 3.3.4
joblib: 1.0.1
threadpoolctl: 2.2.0
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: