Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC improve documentation of copy=False in preprocessing functions #27691

Merged
merged 11 commits into from
Dec 7, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 32 additions & 20 deletions sklearn/preprocessing/_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -155,9 +155,10 @@ def scale(X, *, axis=0, with_mean=True, with_std=True, copy=True):
unit standard deviation).

copy : bool, default=True
Set to False to perform inplace row normalization and avoid a
copy (if the input is already a numpy array or a scipy.sparse
CSC matrix and if axis is 1).
If False, try to avoid a copy and scale in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

Returns
-------
Expand Down Expand Up @@ -613,8 +614,10 @@ def minmax_scale(X, feature_range=(0, 1), *, axis=0, copy=True):
otherwise (if 1) scale each sample.

copy : bool, default=True
Set to False to perform inplace scaling and avoid a copy (if the input
is already a numpy array).
If False, try to avoid a copy and scale in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

Returns
-------
Expand Down Expand Up @@ -1336,8 +1339,10 @@ def maxabs_scale(X, *, axis=0, copy=True):
otherwise (if 1) scale each sample.

copy : bool, default=True
Set to False to perform inplace scaling and avoid a copy (if the input
is already a numpy array).
If False, try to avoid a copy and scale in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

Returns
-------
Expand Down Expand Up @@ -1713,9 +1718,10 @@ def robust_scale(
.. versionadded:: 0.18

copy : bool, default=True
Set to `False` to perform inplace row normalization and avoid a
copy (if the input is already a numpy array or a scipy.sparse
CSR matrix and if axis is 1).
If False, try to avoid a copy and scale in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

unit_variance : bool, default=False
If `True`, scale data so that normally distributed features have a
Expand Down Expand Up @@ -1826,9 +1832,10 @@ def normalize(X, norm="l2", *, axis=1, copy=True, return_norm=False):
normalize each sample, otherwise (if 0) normalize each feature.

copy : bool, default=True
Set to False to perform inplace row normalization and avoid a
copy (if the input is already a numpy array or a scipy.sparse
CSR matrix and if axis is 1).
If False, try to avoid a copy and normalize in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

return_norm : bool, default=False
Whether to return the computed norms.
Expand Down Expand Up @@ -2059,9 +2066,10 @@ def binarize(X, *, threshold=0.0, copy=True):
Threshold may not be less than 0 for operations on sparse matrices.

copy : bool, default=True
Set to False to perform inplace binarization and avoid a copy
(if the input is already a numpy array or a scipy.sparse CSR / CSC
matrix and if axis is 1).
If False, try to avoid a copy and binarize in place.
This is not guaranteed to always work in place; e.g. if the data is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case (and maybe others) the previous wording is accurate that scipy CSR or CSC matrices are not copied, so you can not reuse the same doc here.

In general, the fact that a copy is made mostly depends on the check_array arguments, but the code following it may also need to be looked at closer.

Copy link
Member

@lesteve lesteve Nov 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think the previous docstring was accurate whether the new one is slightly misleading.

The code below is used, which means that it accepts CSR/CSC sparse matrix without copying it with copy=False:

X = check_array(X, accept_sparse=["csr", "csc"], copy=copy)

a numpy array with an object dtype, a copy will be returned even with
copy=False.

Returns
-------
Expand Down Expand Up @@ -2945,9 +2953,10 @@ def quantile_transform(
See :term:`Glossary <random_state>`.

copy : bool, default=True
Set to False to perform inplace transformation and avoid a copy (if the
input is already a numpy array). If True, a copy of `X` is transformed,
leaving the original `X` unchanged.
If False, try to avoid a copy and transform in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

.. versionchanged:: 0.23
The default value of `copy` changed from False to True in 0.23.
Expand Down Expand Up @@ -3481,7 +3490,10 @@ def power_transform(X, method="yeo-johnson", *, standardize=True, copy=True):
transformed output.

copy : bool, default=True
Set to False to perform inplace computation during transformation.
If False, try to avoid a copy and transform in place.
This is not guaranteed to always work in place; e.g. if the data is
a numpy array with an int dtype, a copy will be returned even with
copy=False.

Returns
-------
Expand Down