Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect invalid device error introduced in #25956 #29107

Open
jph00 opened this issue May 25, 2024 · 9 comments · Fixed by #29119 · May be fixed by #30454
Open

Incorrect invalid device error introduced in #25956 #29107

jph00 opened this issue May 25, 2024 · 9 comments · Fixed by #29119 · May be fixed by #30454

Comments

@jph00
Copy link

jph00 commented May 25, 2024

Describe the bug

#25956 introduced a new sklearn.utils._array_api._check_device_cpu function to test whether a tensor is on CPU. However, the implementation of the test, which is device not in {"cpu", None}, is incorrect -- the device will actually not be a string, but device(type='cpu'). Therefore, you should attempt to get the type attr, and use that if available.

Steps/Code to Reproduce

You can view a sample error here:
https://github.com/fastai/fastai/actions/runs/9232979440/job/25404873935

Expected Results

ValueError: Unsupported device for NumPy: device(type='cpu') should not be thrown.

Actual Results

File /opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/site-packages/sklearn/utils/_array_api.py:308, in _check_device_cpu(device)
    306 def _check_device_cpu(device):  # noqa
    307     if device not in {"cpu", None}:
--> 308         raise ValueError(f"Unsupported device for NumPy: {device!r}")

ValueError: Unsupported device for NumPy: device(type='cpu')

Versions

I've seen this on multiple Linux and Mac versions. My current Mac version:


System:
    python: 3.11.8 (main, Feb 26 2024, 15:36:12) [Clang 14.0.6 ]
executable: /Users/jhoward/miniconda3/bin/python
   machine: macOS-14.3.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.4.2
          pip: 23.3.1
   setuptools: 68.2.2
        numpy: 1.26.4
        scipy: 1.13.0
       Cython: None
       pandas: 2.2.1
   matplotlib: 3.8.4
       joblib: 1.4.0
threadpoolctl: 2.2.0

Built with OpenMP: True

threadpoolctl info:
       filepath: /Users/jhoward/miniconda3/lib/libopenblasp-r0.3.21.dylib
         prefix: libopenblas
       user_api: blas
   internal_api: openblas
        version: 0.3.21
    num_threads: 8
threading_layer: pthreads
   architecture: armv8

       filepath: /Users/jhoward/miniconda3/lib/libomp.dylib
         prefix: libomp
       user_api: openmp
   internal_api: openmp
        version: None
    num_threads: 8
@jph00 jph00 added Bug Needs Triage Issue requires triage labels May 25, 2024
jph00 added a commit to jph00/scikit-learn that referenced this issue May 25, 2024
Avoids incorrect `ValueError: Unsupported device for NumPy: device(type='cpu')` by using `type` attr if available.
@glemaitre
Copy link
Member

Do you have a minimal reproducer.

Looking at our code, I think that we expect to have a call to the sklearn.utils._array_api.device that is going to return the string "cpu". So it would be interesting to understand what is generating the device object to have a proper regression test.

I'm not the most familiar with the array API work so, ping @betatim and @ogrisel.

@ogrisel
Copy link
Member

ogrisel commented May 27, 2024

Looking at the code, this function _check_device_cpu seems to only be used for _NumPyAPIWrapper which is not meant to be called in torch inputs I think. So we need to understand how you ended up with an xp module that is an instance of the _NumPyAPIWrapper but with a device variable that holds a torch device (instead of None).

@ogrisel
Copy link
Member

ogrisel commented May 27, 2024

I tried to craft a minimal reproducer by mixing numpy and torch inputs as follows:

>>> import numpy as np
>>> import array_api_compat.torch as xp
>>> import sklearn
>>> sklearn.set_config(array_api_dispatch=True)
>>> from sklearn.metrics import r2_score
>>> r2_score(xp.arange(10, device="mps"), xp.arange(10, device="mps"), sample_weight=xp.asarray(range(10), device="mps"))
1.0
>>> r2_score(np.arange(10), xp.arange(10, device="mps"), sample_weight=xp.arange(10, device="mps"))
Traceback (most recent call last):
  Cell In[16], line 1
    r2_score(np.arange(10), xp.arange(10, device="mps"), sample_weight=xp.arange(10, device="mps"))
  File ~/code/scikit-learn/sklearn/utils/_param_validation.py:213 in wrapper
    return func(*args, **kwargs)
  File ~/code/scikit-learn/sklearn/metrics/_regression.py:1214 in r2_score
    xp, _, device_ = get_namespace_and_device(
  File ~/code/scikit-learn/sklearn/utils/_array_api.py:572 in get_namespace_and_device
    *get_namespace(*array_list, **skip_remove_kwargs),
  File ~/code/scikit-learn/sklearn/utils/_array_api.py:553 in get_namespace
    namespace, is_array_api_compliant = array_api_compat.get_namespace(*arrays), True
  File ~/miniforge3/envs/dev/lib/python3.11/site-packages/array_api_compat/common/_helpers.py:372 in array_namespace
    raise TypeError(f"Multiple namespaces for array inputs: {namespaces}")
TypeError: Multiple namespaces for array inputs: {<module 'array_api_compat.numpy' from '/Users/ogrisel/miniforge3/envs/dev/lib/python3.11/site-packages/array_api_compat/numpy/__init__.py'>, <module 'array_api_compat.torch' from '/Users/ogrisel/miniforge3/envs/dev/lib/python3.11/site-packages/array_api_compat/torch/__init__.py'>}

but it's raising the expected error message that states that r2_score should not be called with mixed-type inputs (as of now, we might relax this in the future if we have a good reason to do so).

@ogrisel
Copy link
Member

ogrisel commented May 27, 2024

From the traceback I also tried:

>>> import numpy as np
>>> import sklearn
>>> sklearn.set_config(array_api_dispatch=True)
>>> from sklearn.metrics import r2_score
>>> import torch
>>> x1, x2 = torch.randn(20, 5), torch.randn(20, 5)
>>> r2_score(x2.view(-1), x1.view(-1))
-0.8340672254562378

but I cannot reproduce either.

@ogrisel
Copy link
Member

ogrisel commented May 27, 2024

Ok I understand, it's happening when array API dispatch is disabled and regular numpy code is expected:

>>> import numpy as np
>>> import sklearn
>>> sklearn.set_config(array_api_dispatch=False)
>>> from sklearn.metrics import r2_score
>>> import torch
>>> x1, x2 = torch.randn(20, 5), torch.randn(20, 5)
>>> r2_score(x2.view(-1), x1.view(-1))
Traceback (most recent call last):
  Cell In[27], line 7
    r2_score(x2.view(-1), x1.view(-1))
  File ~/code/scikit-learn/sklearn/utils/_param_validation.py:213 in wrapper
    return func(*args, **kwargs)
  File ~/code/scikit-learn/sklearn/metrics/_regression.py:1242 in r2_score
    return _assemble_r2_explained_variance(
  File ~/code/scikit-learn/sklearn/metrics/_regression.py:897 in _assemble_r2_explained_variance
    output_scores = xp.ones([n_outputs], device=device, dtype=dtype)
  File ~/code/scikit-learn/sklearn/utils/_array_api.py:314 in wrapped_func
    _check_device_cpu(kwargs.pop("device", None))
  File ~/code/scikit-learn/sklearn/utils/_array_api.py:308 in _check_device_cpu
    raise ValueError(f"Unsupported device for NumPy: {device!r}")
ValueError: Unsupported device for NumPy: device(type='cpu')

so this is indeed a regression: when array API dispatch is disabled, the usual implicit conversion from torch CPU tensors to NumPy arrays should still happen and for some reason, we get a non-None device where we don't expect it. I need to investigate.

@ogrisel ogrisel added Regression Array API and removed Needs Triage Issue requires triage labels May 27, 2024
@ogrisel
Copy link
Member

ogrisel commented May 27, 2024

I think I get it. It's get_namespace_and_device that is wrong when array API dispatch is disabled. I will open a PR with a non-regression test.

@jph00
Copy link
Author

jph00 commented Dec 10, 2024

@ogrisel is it possible that this bug has reappeared? I'm getting the same error again here:
https://github.com/AnswerDotAI/fastcore/actions/runs/12251169586/job/34175356676

@lesteve
Copy link
Member

lesteve commented Dec 10, 2024

@jph00, indeed it seems like a regression in scikit-learn 1.6. I am taking a closer look.

@ogrisel ogrisel reopened this Dec 10, 2024
@ogrisel
Copy link
Member

ogrisel commented Dec 10, 2024

Reopening. Indeed, we made some changes in #29476 (including in the non-regression test named test_get_namespace_and_device) without realizing that it would break calls like r2_score(torch.ones(3), torch.ones(3)) when array API dispatch is disabled (default). In this case want to rely on the regular np.asarray implicit conversion instead.

We need to fix it (either in the sklearn.utils._array_api.device function or in the sklearn.utils._array_api.get_namespace_and_device function.

This time, add a non-regression test that directly calls r2_score(torch.ones(3), torch.ones(3)) (or other metrics functions) with array API dispatch disabled and check that it works. The intent of the test will be clearer and we won't risk changing it without understanding the consequence...

@ogrisel ogrisel modified the milestones: 1.5.1, 1.6.1 Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment