Skip to content

Improve packet lag calculation #3899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 10, 2024
Merged

Conversation

J-Gras
Copy link
Contributor

@J-Gras J-Gras commented Aug 26, 2024

Using network_time to calculate packet lag will produce wrong results when there is no packet available but network time does not (yet) fall back to wall clock. This introduces get_current_packet_ts(). For the BiF I thought about returning a record with an optional timestamp field that isn't set in case there is no packet. However, I wasn't sure it would be wort the effort.

src/zeek.bif Outdated
## For ``get_current_packet_ts()`` the same limitations as for
## :zeek:see:`get_current_packet` apply. In particular, the return value
## should be considered undefined when called within event handlers raised
## via :zeek:see:`event`, :zeek:see:`schedule` or by recipient of Broker
Copy link
Contributor

@awelzel awelzel Aug 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yet, the user added is in check_stats() - that's a scheduled event, and the baselines verify the output of 0.0.


After looking a bit: get_current_packet() returns pcap_packet including timestamp information that should allow to build the timestamp in script land without a new BiF? I forget if there was another reason to have a separate BiF?

type pcap_packet: record {
        ts_sec: count;  ##< The non-fractional part of the packet's timestamp (i.e., full seconds since the epoch).
        ts_usec: count; ##< The fractional part of the packet's timestamp.

I think this comment could be relaxed, actually:

 ##      The return value of ``get_current_packet()`` further should be considered
 ##      undefined when called within event handlers raised via :zeek:see:`event`,
 ##      :zeek:see:`schedule` or by recipient of Broker messages.

Packet::ToVal() actually returns a zero-initialized structure if there wasn't a packet, I had missed that.

Copy link
Contributor

@awelzel awelzel Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative proposal in DM was to add a BiF get_packet_lag() that handles it all internally and possibly without caveats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summing up out-of-band communication:

yet, the user added is in check_stats() - that's a scheduled event

Correct. During normal operation, this shouldn't be an issue because the event will be triggered when we update network time based on a packet. However, #3947 will be an issue. But, that is a fundamental issue, which affects stats.log as a whole. Eventually, packet lag is a metric that should be scraped using the reworked Telemetry::sync() hook.

get_current_packet() returns pcap_packet including timestamp information that should allow to build the timestamp in script land without a new BiF?

Correct as well. However, the idea here is to avoid the unnecessary overhead introduced building the record that is returned by get_current_packet() (see https://github.com/zeek/zeek/blob/master/src/iosource/Packet.cc#L156-L178).
Now we have two options:

  1. Add a get_packet_lag() BiF that handles it all internally.
  2. Add a get_current_packet_ts() BiF to obtain only the timestamp and calculate the lag in script-land.

I went with option 2, introducing a more general BiF, because the packet timestamp seemed to be a value that's likely of interest in other contexts as well.

@ckreibich ckreibich self-requested a review October 30, 2024 16:09
Copy link
Member

@ckreibich ckreibich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorrry for the delay here — do we actually document anywhere why & when Zeek updates network time to current time? It's mentioned in the NEWS, but without doing so, I think users will have a hard time understanding network_time() vs get_current_packet_ts().

If you're on board with adding that, I'm (mildly) in favor of get_current_packet_ts(), mainly because we already have get_current_packet_header() (i.e., variations of retrieving packet info), and because it strikes me as potentially more useful to have in other settings than only having get_packet_lag().

I'd then just add get_packet_lag() as a script-layer function — i.e., I see no harm in providing both. Hardly a lot of code. :-)

Using network_time to calculate packet lag will produce wrong results
when there is no packet available but network time does not (yet) fall
back to wall clock.
@J-Gras J-Gras force-pushed the topic/jgras/fix-packet-lag branch from 1a9aedc to c2b17f9 Compare December 9, 2024 18:20
@J-Gras J-Gras requested a review from ckreibich December 10, 2024 11:33
@J-Gras
Copy link
Contributor Author

J-Gras commented Dec 10, 2024

I've added get_packet_lag() as a script-layer function and provided some more details in the doc-strings of the new functions 🙂

@awelzel awelzel merged commit 77465a9 into zeek:master Dec 10, 2024
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants