BUG: fix negative samples generated by Wald distribution (#29609)#29926
Merged
charris merged 1 commit intonumpy:maintenance/2.3.xfrom Oct 10, 2025
Merged
BUG: fix negative samples generated by Wald distribution (#29609)#29926charris merged 1 commit intonumpy:maintenance/2.3.xfrom
charris merged 1 commit intonumpy:maintenance/2.3.xfrom
Conversation
In numpy.random, when generating Wald samples, if the mean and scale have a large discrepancy, the current implementation suffers from catastrophic cancellation. An explicit example of this has been added to test_generator_mt19937.py. The key line implicated in distributions.c: `X = mean + mu_2l * (Y - sqrt(4 * scale * Y + Y * Y));` which has numerical issues when Y >> scale. I have replaced this with the equivalent rationalized form: ``` d = Y + sqrt(Y) * sqrt(Y + 4 * scale); X = mean - (2 * mean * Y) / d; ``` And now the test passes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport of #29603.
In
numpy.random, when generating Wald samples, if the mean and scale have a large discrepancy, the current implementation sometimes generates negative samples. An explicit example of this has been added totest_generator_mt19937.py,mean=1e8, scale=2.25(which is specific to the seed used in that test file).The key line implicated in
distributions.c:which has numerical issues when
Y >> scale.I have replaced this with the equivalent rationalized form:
This can be seen my noting the following algebraic identity:
As
mu_2l = mean / (2 * scale), we then have:Introduce a new variable for the denominator and factoring gives the final expressions:
And now the test passes, i.e., all samples are non-negative.
I have not made this change to the legacy implementation (nor the legacy test) as I figure they are deprecated, although it also suffers from the same issue.
Disclaimer: I found this failing property during a research project I am working on using LLM agents to write property-based tests (i.e., in Hypothesis). This property, that all Wald samples should be non-negative, was proposed and written "autonomously" by the agent. I spent several hours analyzing the failure and becoming acquainted with the NumPy codebase myself in order to verify it was a real issue, running it with the seed in the test suite, etc. I also pinpointed the cause of the problem and wrote this fix and the PR myself. I take full responsibility for it.
The Hypothesis test that caught the issue was simply:
Note that even after this PR, this property fails, albeit the minimal example is now
mean=280432326968266.0, scale=0.25(i.e., even more extreme values). But, at such values, it seems like we hit the limits of floating point computations.