-
-
Notifications
You must be signed in to change notification settings - Fork 25.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finding indexes with np.where(condition)
or np.asarray(condition).nonzero()
#30400
Comments
np.where(cond)
or np.asarray(condition).nonzero()
np.where(condition)
or np.asarray(condition).nonzero()
The reason why
I don't recall any issue reported by users using scikit-learn with subclasses of NumPy arrays and having problems as a result. I am not even sure what "behaves correctly" means in that context: returning a NumPy array of integer indices or a subclass instance storing the same indices? For our use cases, I think returning a NumPy array (not subclassing) is perfectly fine, but any option might work. Let's try: >>> import numpy as np
>>> class my_array(np.ndarray):
... pass
...
>>> condition = np.asarray([True, False, False, True, True]).view(my_array)
>>> condition
my_array([ True, False, False, True, True])
>>> np.where(condition)[0]
array([0, 3, 4])
>>> condition.nonzero()[0]
array([0, 3, 4])
>>> np.nonzero(condition)[0]
array([0, 3, 4]) So I don't see any difference: neither returns an instance of the I traced this recommendation in the doc to this PR by @eric-wieser: numpy/numpy#11425 However, whenever we convert a scikit-learn estimator to add array API support, the https://data-apis.org/array-api/latest/API_specification/generated/array_api.where.html#where Since https://data-apis.org/array-api/latest/API_specification/generated/array_api.nonzero.html#nonzero Apart from adding array API support, I would rather not change the code unless we identify a good reason to do it (in particular if we identify a user-facing impact). Personally, I am not convinced that one is particularly more readable than the other. That being said, I would be curious to understand better why |
Maybe the note about the issue with subclasses is actually a reference to using |
@eric-wieser @seberg friendly ping. We would like to better understand the numpy recommendation of using Personally, I would follow numpy‘s recommendation without questioning it too much: their realm, their expertise. |
|
Throughout the repo, we use
np.where(condition)
for getting indexes, for instance in SelectorMixin.get_support(), in SimpleImputer.transform() and in several of our examples (example).The numpy documentation discourages the use of
np.where
with just passing a condition and recommendsnp.asarray(condition).nonzero()
instead.For cleanliness of code, should we adopt this recommendation, at least in the examples? Or are there good reasons why we do that?
The text was updated successfully, but these errors were encountered: