DOC: Add clarifications np.argpartition by JuliaPoo · Pull Request #26716 · numpy/numpy

JuliaPoo · 2024-06-17T09:31:38Z

Linking the PR for np.top_k #26666, since it is helpful for the docs for np.top_k to specify explicitly the treatment of np.nan and the stability of the returned indices, and the behaviour of np.top_k relies on the implementation of np.argpartition, I suggest adding similar clarifications to np.argpartition docs.

With regards to the treatment of np.nan, currently np.argpartition does not explicitly handle np.nan and by the nature of its implementation, np.nan is treated like np.inf. This seems like something that can change in the future so I wrote that the treatment of np.nan is undefined. If this is not desirable, perhaps one can consider adding a np.nanargpartition and np.nanpartition variant.

Making this change also allows np.top_k to justly say that it follows np.argpartition's specifications.

seberg · 2024-06-17T09:48:44Z

+    is unstable, and hence the returned indices are not guaranteed
+    to be the earliest/latest occurrence of the element.
+
+    The treatment of ``np.nan`` in the input array is undefined.


Going to comment on the other issue also, but I don't think this is right. We sort them to the end and I am not aware of any plan to change that.

Are you sure? Doesn't look like it to me:

>>> x = np.array([1, 3, 2, np.nan, 4, 0, np.nan, 3.5]) >>> np.partition(x, 1) array([0. , 1. , 2. , nan, 4. , 3. , nan, 3.5]) >>> x[np.argpartition(x, 1)] array([0. , 1. , 2. , nan, 4. , 3. , nan, 3.5])

Without having looked at how it's implemented under the hood, it looks to me like there are no guarantees at least the unsorted elements to the right of the k-th element.

In addtion, this looks like a bug:

>>> np.partition(x, 5) array([0. , 1. , 2. , 3. , 3.5, 4. , nan, nan]) >>> np.partition(x, 6) array([0. , 1. , 2. , 3. , 3.5, 4. , nan, nan]) >>> np.partition(x, 7) # Incorrect result when `kth == x.size-1` array([1. , 3. , 2. , nan, 4. , 0. , 3.5, nan])

Yes, I am sure as your last examples proof: The NaNs are all after or at the partition position.

And the last one is perfectly correct, since partition(x, 7) means that only the last entry (position 7) has any defined order at all.

Maybe we're talking past each other - "after the partition position" isn't the same as "we sort them to the end". The former looks correct indeed.

So the documented behavior should be something like "nan are treated as the largest possible values, larger than inf even - they always end up to the right of the kth partition element (unless there are more nan's in the array than fit to the right of kth)."

Well, "unspecified" would mean that they could occur completely randomly in the result.
The fact that they are random within each half is already clearly specified by partition itself! So yes, I think saying that their sort-order sorts them to the end is sufficient (which should be defined in sort, but maybe is not very clearly defined there).

But, if you prefer you can also add the longer explanation, it might just become incorrect conceptually when thinking about sorting in reverse order.
(I.e. I still think that the sort-order being defined to sort them to the end is the better specification, although maybe not the easier explanation)

JuliaPoo · 2024-06-28T06:55:35Z

I've added a note about the sort order of np.nan (that its greater than np.inf). Sometimes np.partition will default to sorting which has sort order of np.nan defined so I was a bit suspicious it is indeed defined for np.partition, but from my own testing it does seem to be.

I've also added some tests to verify this sort order.

mattip · 2024-06-30T06:33:47Z

+
+    The sort order of ``np.nan`` is bigger than ``np.inf``.
+
    See `partition` for notes on the different selection algorithms.


I don't understand this comment. Aren't we in the partition function docstring? Does this docstring show up in a different function as well?

The sort order is defined in the docstring for np.sort, but since np.partition uses a different implementation, I thought it would be helpful to also define the sort order of np.partition.

I've removed notes regarding the sort order from np.argpartition and instead linked it to the docs for np.partition in the latest commit, this follows what np.sort and np.argsort did.

Co-authored-by: Matti Picus <[email protected]>

mattip · 2024-07-01T07:53:53Z

Thanks @JuliaPoo

DOC: Add clarifications np.argpartition

72b943c

github-actions Bot added the 04 - Documentation label Jun 17, 2024

seberg reviewed Jun 17, 2024

View reviewed changes

DOC+TST: Specify np.nan sort order in partition functions

2398d9c

mattip reviewed Jun 30, 2024

View reviewed changes

Comment thread numpy/_core/tests/test_multiarray.py Outdated

mattip reviewed Jun 30, 2024

View reviewed changes

Comment thread numpy/_core/fromnumeric.py Outdated

mattip reviewed Jun 30, 2024

View reviewed changes

JuliaPoo and others added 2 commits July 1, 2024 14:37

Update numpy/_core/tests/test_multiarray.py

05a3b82

Co-authored-by: Matti Picus <[email protected]>

DOC: Clean docs for argpartition

456c4a9

JuliaPoo mentioned this pull request Jul 1, 2024

WIP: top_k draft implementation #26666

Closed

mattip merged commit c21ac10 into numpy:main Jul 1, 2024

rgommers added this to the 2.1.0 release milestone Jul 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC: Add clarifications np.argpartition#26716

DOC: Add clarifications np.argpartition#26716
mattip merged 4 commits into
numpy:mainfrom
JuliaPoo:cont-15128-topk-feat

JuliaPoo commented Jun 17, 2024 •

edited

Loading

Uh oh!

seberg Jun 17, 2024

Uh oh!

rgommers Jun 18, 2024

Uh oh!

seberg Jun 18, 2024

Uh oh!

rgommers Jun 18, 2024 •

edited

Loading

Uh oh!

seberg Jun 18, 2024

Uh oh!

JuliaPoo commented Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

mattip Jun 30, 2024

Uh oh!

JuliaPoo Jul 1, 2024 •

edited

Loading

Uh oh!

mattip commented Jul 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		The sort order of ``np.nan`` is bigger than ``np.inf``.

		See `partition` for notes on the different selection algorithms.

Uh oh!

Conversation

JuliaPoo commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg Jun 17, 2024

Choose a reason for hiding this comment

Uh oh!

rgommers Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

seberg Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

rgommers Jun 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seberg Jun 18, 2024

Choose a reason for hiding this comment

Uh oh!

JuliaPoo commented Jun 28, 2024

Uh oh!

Uh oh!

Uh oh!

mattip Jun 30, 2024

Choose a reason for hiding this comment

Uh oh!

JuliaPoo Jul 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Jul 1, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

JuliaPoo commented Jun 17, 2024 •

edited

Loading

rgommers Jun 18, 2024 •

edited

Loading

JuliaPoo Jul 1, 2024 •

edited

Loading