-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Improve packet lag calculation #3899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
src/zeek.bif
Outdated
## For ``get_current_packet_ts()`` the same limitations as for | ||
## :zeek:see:`get_current_packet` apply. In particular, the return value | ||
## should be considered undefined when called within event handlers raised | ||
## via :zeek:see:`event`, :zeek:see:`schedule` or by recipient of Broker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yet, the user added is in check_stats()
- that's a scheduled event, and the baselines verify the output of 0.0.
After looking a bit: get_current_packet()
returns pcap_packet
including timestamp information that should allow to build the timestamp in script land without a new BiF? I forget if there was another reason to have a separate BiF?
type pcap_packet: record {
ts_sec: count; ##< The non-fractional part of the packet's timestamp (i.e., full seconds since the epoch).
ts_usec: count; ##< The fractional part of the packet's timestamp.
I think this comment could be relaxed, actually:
## The return value of ``get_current_packet()`` further should be considered
## undefined when called within event handlers raised via :zeek:see:`event`,
## :zeek:see:`schedule` or by recipient of Broker messages.
Packet::ToVal()
actually returns a zero-initialized structure if there wasn't a packet, I had missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative proposal in DM was to add a BiF get_packet_lag()
that handles it all internally and possibly without caveats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summing up out-of-band communication:
yet, the user added is in check_stats() - that's a scheduled event
Correct. During normal operation, this shouldn't be an issue because the event will be triggered when we update network time based on a packet. However, #3947 will be an issue. But, that is a fundamental issue, which affects stats.log
as a whole. Eventually, packet lag is a metric that should be scraped using the reworked Telemetry::sync()
hook.
get_current_packet()
returnspcap_packet
including timestamp information that should allow to build the timestamp in script land without a new BiF?
Correct as well. However, the idea here is to avoid the unnecessary overhead introduced building the record that is returned by get_current_packet()
(see https://github.com/zeek/zeek/blob/master/src/iosource/Packet.cc#L156-L178).
Now we have two options:
- Add a
get_packet_lag()
BiF that handles it all internally. - Add a
get_current_packet_ts()
BiF to obtain only the timestamp and calculate the lag in script-land.
I went with option 2, introducing a more general BiF, because the packet timestamp seemed to be a value that's likely of interest in other contexts as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorrry for the delay here — do we actually document anywhere why & when Zeek updates network time to current time? It's mentioned in the NEWS, but without doing so, I think users will have a hard time understanding network_time()
vs get_current_packet_ts()
.
If you're on board with adding that, I'm (mildly) in favor of get_current_packet_ts()
, mainly because we already have get_current_packet_header()
(i.e., variations of retrieving packet info), and because it strikes me as potentially more useful to have in other settings than only having get_packet_lag()
.
I'd then just add get_packet_lag()
as a script-layer function — i.e., I see no harm in providing both. Hardly a lot of code. :-)
Using network_time to calculate packet lag will produce wrong results when there is no packet available but network time does not (yet) fall back to wall clock.
1a9aedc
to
c2b17f9
Compare
I've added |
Using network_time to calculate packet lag will produce wrong results when there is no packet available but network time does not (yet) fall back to wall clock. This introduces
get_current_packet_ts()
. For the BiF I thought about returning a record with an optional timestamp field that isn't set in case there is no packet. However, I wasn't sure it would be wort the effort.