Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support symbolizing enums #3515

Merged
merged 14 commits into from
Oct 25, 2024
Merged

Support symbolizing enums #3515

merged 14 commits into from
Oct 25, 2024

Conversation

danobi
Copy link
Member

@danobi danobi commented Oct 13, 2024

This PR allows the user to symbolize enums in userspace by
using %s printf() specifier. This is a fully backwards compatible
change, as trying to use %s on an enum was a compile error
before.

Example:

# bpftrace -e 'tracepoint:skb:kfree_skb { $r = args->reason; printf("%d, %s\n", $r, $r); exit() }'
Attaching 1 probe...
2, SKB_DROP_REASON_NOT_SPECIFIED

Note this PR does not support casting integers to enums. While
this is technically possible (and described in the linked ticket), it's
probably not necessary as users can simply perform arithmetic
operations on an enum and get the same result. There's currently
a limitation but it's being fixed in #3517.

This is easier to review per-commit.

This closes #2772.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

src/printf.cpp Fixed Show fixed Hide fixed
src/printf.cpp Fixed Show fixed Hide fixed
@danobi danobi force-pushed the enum_symbolize branch 2 times, most recently from c6cf5ec to 51a8e3d Compare October 13, 2024 06:32
case Type::integer:
return PrintableInt(value_).print(buf, size, fmt, token, expected_type);
case Type::string:
return snprintf(buf, size, fmt, name_.c_str());

Check failure

Code scanning / CodeQL

Non-constant format string

The format string argument to [snprintf](1) has a source which cannot be verified to originate from a string literal.
@danobi danobi force-pushed the enum_symbolize branch 6 times, most recently from ad19ef8 to d1576d4 Compare October 13, 2024 22:51
@danobi danobi marked this pull request as ready for review October 13, 2024 23:05
Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going back to our discussions in the issue: I'm not sure that this PR solves enough of the problem to be useful, and by defining enums as integers, it risks locking us into behaviour that we won't be able to change to fix it in the future.

We need to be able to symbolicate enums everywhere that they can appear:

  • as a field in a struct: { .myenum = ABC }
  • in tuples: (SKB_CONSUMED, 123)
  • as map keys or values: @[SKB_CONSUMED] = ABC

@danobi
Copy link
Member Author

danobi commented Oct 14, 2024

Going back to our discussions in the issue: I'm not sure that this PR solves enough of the problem to be useful,

I think it's a good start. A few users have asked me for this already. Let me follow up and collect their feedback on this approach.

and by defining enums as integers, it risks locking us into behaviour that we won't be able to change to fix it in the future.

I think it might be too late for enums as they exist today. Plenty of people are using today's enums as ints. But the good news is I haven't seen it be a problem yet in the ~5 years they've been in: e4cb6ce

We need to be able to symbolicate enums everywhere that they can appear:

For sure, agreed. But I think we need to start somewhere and printf() is the highest impact for now. We'll need the metadata collected in commits 1-6 to keep building on top.

@danobi
Copy link
Member Author

danobi commented Oct 15, 2024

@ajor and I chatted offline. Decision is that this approach is ok. We'll decide how to extend enum symbolication to struct fields, tuples, and map key/value next (or even at some other later point). Gut feeling is map key is next most useful.

It won't be hard to do when the infra in this PR is landed. Just a question of how much backwards compat we want. For example, changing human readable output is probably fine. But we probably want to keep JSON output as-is so tooling doesn't silently break.

@ajor ajor self-requested a review October 17, 2024 15:16
Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few small things but otherwise LGTM. Good stuff!

src/format_string.cpp Show resolved Hide resolved
src/format_string.h Show resolved Hide resolved
src/format_string.h Outdated Show resolved Hide resolved
tests/runtime/call Show resolved Hide resolved
man/adoc/bpftrace.adoc Outdated Show resolved Hide resolved
@jordalgo
Copy link
Contributor

But we probably want to keep JSON output as-is so tooling doesn't silently break.

Would be nice if we could provide a way to print the symbolized enums in JSON output though - env variable?

@danobi
Copy link
Member Author

danobi commented Oct 24, 2024

But we probably want to keep JSON output as-is so tooling doesn't silently break.

Would be nice if we could provide a way to print the symbolized enums in JSON output though - env variable?

That was referring to changing the output for map keys/values, tuples, and structs. I think changing the human output is fine but JSON is dicey. As folks may be parsing it in prod already.

For regular printf() (added in this PR), JSON output changes "correctly":

$ sudo ./build/src/bpftrace -e 'BEGIN { printf("%s\n", SKB_DROP_REASON_NOT_SPECIFIED); exit() }' -f json
{"type": "attached_probes", "data": {"probes": 1}}
{"type": "printf", "data": "SKB_DROP_REASON_NOT_SPECIFIED\n"}

@jordalgo
Copy link
Contributor

I think changing the human output is fine but JSON is dicey. As folks may be parsing it in prod already.

I guess we'll see if we get requests from users to make this configurable.

src/bpftrace.h Outdated Show resolved Hide resolved
src/types.h Outdated Show resolved Hide resolved
Copy link
Contributor

@jordalgo jordalgo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One super nit and one question but otherwise LGTM.

Enum type definitions will appear (generated from BTF) in c_definitions
part of program if bpftrace detects an enum is being used. So when we
parse the c_definitions, we need to keep track of all the enum metadata
we see.

The newly added map is structured for lookup efficiency on the
upcoming symbolization codepath.
Enums in C, C++, and bpftrace are treated as regular integers in the
backend. Thus, rather than create a new type in bpftrace (which comes
with great complexity and maintenance overhead), encode enums as an
integer with a name. That is all an enum is, after all. This way all
existing codepaths which deal with integers also work with enums without
code change.

CreateEnum() and IsEnumTy() are added to work with the new pseudo-type.
danobi added 12 commits October 25, 2024 07:59
tracepoint and kfunc args are laid out in a custom struct definition and
then parsed by clang_parser. So we need to remember in the type
system which struct fields are enums so that if the user accesses the
enum field, bpftrace knows it can be symbolized.
We'll need this information when processing enum variants the user
directly referenced. For example: `print(MY_VARIANT_NAME)`.
Add tests to ensure we are parsing the data we expect in clang_parser.
Teach semantic analyser to track enum type information in the type
system. We need this type information to propagate to format string
validation and handling so we can both allow printing enums symbolically
as well as symbolize correctly at runtime.
Just for convenience when working with strings.
Previously, semantic analysis used this logic to validate that for
printf() the format string matched the provided arguments. It was also
the only user.

In the next commit we want to change formatting of a single given
argument based on the type specified in the format string. To do this
reliably and consistenly, we'll need to reuse this logic during runtime.

This commit factors out token extraction so it can be reused.
We'll need the token information for enum symbolization in order to
reliably choose whether to format the opaque format string with an
integer or string.

Note that token extraction is done on first use and cached. This
amortizes the performance penalty on the output fast path.
This commit allows the user to print enums with %s format string
specifier. At this point, the type system can determine if an expression
is an enum and which enum it is.

So at output time, the machinery can now do lookups to map from enum
type and value to enum variant.

This is super useful for users who want an efficient way to symbolize
enums. Previously if they wanted to do this they'd have to manually
symbolize in the script. And that's bad b/c it's both inconvenient and
slows down the kernel fast path.
Before, we were removing duplicated whitespace in all directives.
Somehow this actually worked. But the next test I want to add has
whitespace sensitive output. So fix the whitespace disturbance.
Add a smattering of semantic analyser and runtime tests to check that
enum symbolization indeed works.
Toss it into printf(). Hopefully it's not too obscure of a feature and
people can still find this.
@danobi danobi merged commit 22bce0d into bpftrace:master Oct 25, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support symbolizing enum values
3 participants