Description
Title: OpenTelemetry stats reports histograms incorrectly
Description:
Sending envoy OpenTelemetry metrics to an OpenTelemetry collector, and using the logging exporter, I observed a histogram where the Count did not match the count of the buckets (see below). From the OTLP proto definition:
// bucket_counts is an optional field contains the count values of histogram
// for each bucket.
//
// The sum of the bucket_counts must equal the value in the count field.
//
// The number of elements in bucket_counts array must be by one greater than
// the number of elements in explicit_bounds array.
repeated fixed64 bucket_counts = 6;
The number of bucket_counts also appears to be the same as the number of explicit bounds, rather than one greater.
Reading through the implementation, it looks like we are using computedBuckets():
... which appears to be the count of the number below the threshold:
Lines 65 to 70 in 2de016d
computeDisjointBuckets()
seems like it potentially does what we are looking for.
Lines 72 to 76 in 2de016d
Collector logging exporter output:
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2023-11-22 00:54:12.184643877 +0000 UTC
Count: 1
Sum: 375.000000
ExplicitBounds #0: 0.500000
ExplicitBounds #1: 1.000000
ExplicitBounds #2: 5.000000
ExplicitBounds #3: 10.000000
ExplicitBounds #4: 25.000000
ExplicitBounds #5: 50.000000
ExplicitBounds #6: 100.000000
ExplicitBounds #7: 250.000000
ExplicitBounds #8: 500.000000
ExplicitBounds #9: 1000.000000
ExplicitBounds #10: 2500.000000
ExplicitBounds #11: 5000.000000
ExplicitBounds #12: 10000.000000
ExplicitBounds #13: 30000.000000
ExplicitBounds #14: 60000.000000
ExplicitBounds #15: 300000.000000
ExplicitBounds #16: 600000.000000
ExplicitBounds #17: 1800000.000000
ExplicitBounds #18: 3600000.000000
Buckets #0, Count: 0
Buckets #1, Count: 0
Buckets #2, Count: 0
Buckets #3, Count: 0
Buckets #4, Count: 0
Buckets #5, Count: 0
Buckets #6, Count: 0
Buckets #7, Count: 0
Buckets #8, Count: 1
Buckets #9, Count: 1
Buckets #10, Count: 1
Buckets #11, Count: 1
Buckets #12, Count: 1
Buckets #13, Count: 1
Buckets #14, Count: 1
Buckets #15, Count: 1
Buckets #16, Count: 1
Buckets #17, Count: 1
Buckets #18, Count: 1
The sum of buckets is 10, but the count is 1.
Repro steps:
Run envoy configured with the OpenTelemetry stats sync and send to an OpenTelemetry collector with the logging exporter, with logLevel: debug
to print out the OTLP.
Note: The Envoy_collect tool
gathers a tarball with debug logs, config and the following admin
endpoints: /stats, /clusters and /server_info. Please note if there are
privacy concerns, sanitize the data prior to sharing the tarball/pasting.
Admin and Stats Output:
Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
admin endpoint documentation.
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Config:
Include the config used to configure Envoy.
Logs:
Include the access logs and the Envoy logs.
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Call Stack:
If the Envoy binary is crashing, a call stack is required.
Please refer to the Bazel Stack trace documentation.
cc @ohadvano