-
Notifications
You must be signed in to change notification settings - Fork 13.8k
[FLINK-38740][table] Introduce Welford's online algorithm for variance related functions to avoid catastrophic cancellation in naive algorithm #27325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
The first commit will be removed once it is merged. |
|
@flinkbot run azure |
1 similar comment
|
@flinkbot run azure |
…e related functions to avoid catastrophic cancellation in naive algorithm
|
The |
lincoln-lil
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dylanhz Thank you for fixing this! This pr is of high code quality with solid test coverage, and it effectively resolves the long-standing numerical stability issues in these functions.
I checked:
The core implementation in WelfordM2AggFunction (accumulate, merge, retract, and numerical safeguards in getValue).
The related changes in AggregateReduceFunctionsRule, with corresponding plan tests.
Overall, LGTM! Only minor comments.
...e/src/main/java/org/apache/flink/table/runtime/functions/aggregate/WelfordM2AggFunction.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public Double getValue(WelfordM2Accumulator acc) { | ||
| // Theoretically, acc.m2 should always be non-negative. | ||
| // But in practice it may be negative if records are out of order, which is different from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: IIUC, perhaps we should clarify here that out-of-order itself doesn't affect the result mathematically, the issue mainly stems from accumulated precision loss caused by floating-point arithmetic in combination with retractions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your input. Precision is irrelevant here actually. The negative value is simply a mathematical result of the algorithm when processing unmatched retractions.
Regarding the comment, I've updated it to be more explicit and focused on the current logic. What do you think of this version?
// Theoretically, acc.m2 should always be non-negative.
// But in practice it may become negative due to unmatched retractions.
// (e.g., [+I, 1], [+I, 2] followed by [-D, 3], which results in acc.m2 = -4)
// Therefore, return null in such cases to indicate an invalid result.
What is the purpose of the change
Fix catastrophic cancellation in naive algorithm of variance related functions.
Brief change log
Verifying this change
New test class:
MathAggFunctionITCase.Update some existing tests using variance related functions.
Does this pull request potentially affect one of the following parts:
@Public(Evolving): (yes)Documentation