-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix at_least_one
test when group_by_columns
is configured
#922
Fix at_least_one
test when group_by_columns
is configured
#922
Conversation
@dbeatty10, would you mind to review when you have a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A crucial step before merging this would be the addition of test(s) as described in #922 (comment)
at_least_one
test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for raising this PR @katieclaiborne-duet 🤩
I used #934 to quickly check that the new test you added correctly surfaces the error case discovered in #838. Looks good ✅
It would be nice to keep the optimization and also fix the bug 🤞
When group_by_columns
is an empty list, is there a way to keep the optimizations in #776 but avoid the bug reported in #838?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome @katieclaiborne-duet 🤩
Thank you for creating a relevant test for this bug and implementing a targeted fix 🧠
at_least_one
testat_least_one
test when group_by_columns
is configured
Thank you so much for the fix, @katieclaiborne-duet! 🙌 |
resolves #838
Problem
The pruning logic introduced in #776 disrupted the grouping behavior in the
at_least_one
test.Specifically, the initial
pruned_rows
CTE includes alimit 1
statement, which selects a single record where the target column is not null.As a result, the test passes inappropriately when the
group_by_columns
parameter is configured, as long as the target column has at least one not-null value.Essentially, the
group_by_columns
parameter currently has no effect on the test. It evaluates target column values as a single group, regardless of whether or not the parameter is present.Solution
Restore the
at_least_one
test to its prior state.Since the goal of #776 was to improve the performance of the test, I also considered a set-logic approach to evaluating a column at the group level:
In BigQuery, however, this version of the test took twice as long as the simple reverted version.
Checklist