-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Interpolate counter values between adjacent tbuckets #139384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…/rate-interpolate
…/rate-interpolate
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
.../generated-src/org/elasticsearch/compute/aggregation/RateLongGroupingAggregatorFunction.java
Show resolved
Hide resolved
| BlockFactory blockFactory = driverContext.blockFactory(); | ||
| int positionCount = selected.getPositionCount(); | ||
| try (var rates = blockFactory.newDoubleBlockBuilder(positionCount)) { | ||
| Map<Integer, ReducedState> flushedStates = new HashMap<>(positionCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- What do we get in the
selectedint vector? Does include all the ts buckets in this shard for a query? - If this is the case then this
Mapcan grow significantly? Should we use something else?LongObjectPagedHashMap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the map for the final results, across all shards? I was also worried about this, LongObjectPagedHashMap looks like a good option. Will give it a try.
| } else if (evalContext instanceof TimeSeriesGroupingAggregatorEvaluationContext tsContext) { | ||
| rate = computeRate(flushedStates, group, tsContext, isRateOverTime, dateFactor); | ||
| } else { | ||
| rate = computeRateWithoutExtrapolate(state, isRateOverTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what cases would computeRateWithoutExtrapolate(...) execute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see some tests where the context is passed as simple grouping context (e.g. RateTests), not sure why.. want to check with Nhat on this. But, this is existing logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is for aggregations that are not timeseries aggregations - e.g. rate over logs (?) but we can also check w Nhat
| * Computes the rate for a given group by interpolating boundary values with adjacent groups, | ||
| * or extrapolating values at the time bucket boundaries. | ||
| */ | ||
| private static double computeRate( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think this rate is much more useful than the current rate.
The current rate logic uses extrapolation within each tbucket, ignoring counter values before and after. This leaves room for missing counter resets happening across the tbucket boundaries, produces more null buckets as it requires at least 2 data points in each, and leads to rate inaccuracies as extrapolation is somewhat arbitrary.
This can be improved by using the last data point of a time bucket and the first data point of the next one, to interpolate the counter value at the boundary. This leads to more accurate results, catches resets across tbuckets and reduces null buckets, as it just needs one data point per tbucket to produce rate results.