ENH: medcouple n log n (see #9570)#9571
Open
hmustafamail wants to merge 9 commits intostatsmodels:mainfrom
Open
ENH: medcouple n log n (see #9570)#9571hmustafamail wants to merge 9 commits intostatsmodels:mainfrom
hmustafamail wants to merge 9 commits intostatsmodels:mainfrom
Conversation
- added fast algorithm - added legacy multiplexing option - revised tests - suggested draft release note
also STY: more style fixes
also STY: multi line warning again
bashtage
requested changes
May 27, 2025
Member
bashtage
left a comment
There was a problem hiding this comment.
Overall looking pretty good.
statsmodels/stats/stattools.py
Outdated
| ----- | ||
| This is a helper function for the O(N log N) medcouple algorithm. | ||
| """ | ||
| AW = sorted(zip(A, W), key=lambda x: x[0]) |
Member
There was a problem hiding this comment.
Should these be ndarray at this point, in which case we could avoid the python zip and sorted, which would normally be much slower for even modest sized data?
statsmodels/stats/stattools.py
Outdated
| mid = (beg + end) // 2 | ||
| trial = AW[mid][0] | ||
|
|
||
| wleft = sum(w for a, w in AW if a < trial) |
Member
There was a problem hiding this comment.
These might also be slow since using Python objects.
statsmodels/stats/stattools.py
Outdated
| wleft = sum(w for a, w in AW if a < trial) | ||
| wright = sum(w for a, w in AW if a >= trial) | ||
|
|
||
| if 2 * wleft > wtot: |
Member
There was a problem hiding this comment.
Do we need some nan protection somwhere? Usually need to ensure that all values are non-nan which using boolean comp.
ENH: other numpy library usage. STY: docstring improvements.
Author
|
Thank you for your code critique. I made the following main changes:
Please let me know what else may be needed. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I am following up on issue #9570 (link to issue) by creating this pull request, which implements Medcouple in O(N log N) time.
Legacy functionality with O(N**2) time is preserved with the use of a flag
use_fast. This flag defaults to the new behavior.Please let me know if you have any questions, or if you need anything else.
Overview:
added fast algorithm
added legacy multiplexing option
revised tests (passed locally)
suggested draft release note
closes ENH: Medcouple in O(N Log N) time #9570
tests added / passed.
code/documentation is well formatted.
properly formatted commit message. See
NumPy's guide.