-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support symbolizing enums #3515
Conversation
c6cf5ec
to
51a8e3d
Compare
case Type::integer: | ||
return PrintableInt(value_).print(buf, size, fmt, token, expected_type); | ||
case Type::string: | ||
return snprintf(buf, size, fmt, name_.c_str()); |
Check failure
Code scanning / CodeQL
Non-constant format string
ad19ef8
to
d1576d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going back to our discussions in the issue: I'm not sure that this PR solves enough of the problem to be useful, and by defining enums as integers, it risks locking us into behaviour that we won't be able to change to fix it in the future.
We need to be able to symbolicate enums everywhere that they can appear:
- as a field in a struct:
{ .myenum = ABC }
- in tuples:
(SKB_CONSUMED, 123)
- as map keys or values:
@[SKB_CONSUMED] = ABC
I think it's a good start. A few users have asked me for this already. Let me follow up and collect their feedback on this approach.
I think it might be too late for enums as they exist today. Plenty of people are using today's enums as ints. But the good news is I haven't seen it be a problem yet in the ~5 years they've been in: e4cb6ce
For sure, agreed. But I think we need to start somewhere and printf() is the highest impact for now. We'll need the metadata collected in commits 1-6 to keep building on top. |
d1576d4
to
ccaaa08
Compare
@ajor and I chatted offline. Decision is that this approach is ok. We'll decide how to extend enum symbolication to struct fields, tuples, and map key/value next (or even at some other later point). Gut feeling is map key is next most useful. It won't be hard to do when the infra in this PR is landed. Just a question of how much backwards compat we want. For example, changing human readable output is probably fine. But we probably want to keep JSON output as-is so tooling doesn't silently break. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few small things but otherwise LGTM. Good stuff!
Would be nice if we could provide a way to print the symbolized enums in JSON output though - env variable? |
That was referring to changing the output for map keys/values, tuples, and structs. I think changing the human output is fine but JSON is dicey. As folks may be parsing it in prod already. For regular printf() (added in this PR), JSON output changes "correctly":
|
ccaaa08
to
89d02c6
Compare
I guess we'll see if we get requests from users to make this configurable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One super nit and one question but otherwise LGTM.
Enum type definitions will appear (generated from BTF) in c_definitions part of program if bpftrace detects an enum is being used. So when we parse the c_definitions, we need to keep track of all the enum metadata we see. The newly added map is structured for lookup efficiency on the upcoming symbolization codepath.
Enums in C, C++, and bpftrace are treated as regular integers in the backend. Thus, rather than create a new type in bpftrace (which comes with great complexity and maintenance overhead), encode enums as an integer with a name. That is all an enum is, after all. This way all existing codepaths which deal with integers also work with enums without code change. CreateEnum() and IsEnumTy() are added to work with the new pseudo-type.
tracepoint and kfunc args are laid out in a custom struct definition and then parsed by clang_parser. So we need to remember in the type system which struct fields are enums so that if the user accesses the enum field, bpftrace knows it can be symbolized.
We'll need this information when processing enum variants the user directly referenced. For example: `print(MY_VARIANT_NAME)`.
Add tests to ensure we are parsing the data we expect in clang_parser.
Teach semantic analyser to track enum type information in the type system. We need this type information to propagate to format string validation and handling so we can both allow printing enums symbolically as well as symbolize correctly at runtime.
Just for convenience when working with strings.
Previously, semantic analysis used this logic to validate that for printf() the format string matched the provided arguments. It was also the only user. In the next commit we want to change formatting of a single given argument based on the type specified in the format string. To do this reliably and consistenly, we'll need to reuse this logic during runtime. This commit factors out token extraction so it can be reused.
We'll need the token information for enum symbolization in order to reliably choose whether to format the opaque format string with an integer or string. Note that token extraction is done on first use and cached. This amortizes the performance penalty on the output fast path.
This commit allows the user to print enums with %s format string specifier. At this point, the type system can determine if an expression is an enum and which enum it is. So at output time, the machinery can now do lookups to map from enum type and value to enum variant. This is super useful for users who want an efficient way to symbolize enums. Previously if they wanted to do this they'd have to manually symbolize in the script. And that's bad b/c it's both inconvenient and slows down the kernel fast path.
Before, we were removing duplicated whitespace in all directives. Somehow this actually worked. But the next test I want to add has whitespace sensitive output. So fix the whitespace disturbance.
Add a smattering of semantic analyser and runtime tests to check that enum symbolization indeed works.
Toss it into printf(). Hopefully it's not too obscure of a feature and people can still find this.
89d02c6
to
decef6d
Compare
This PR allows the user to symbolize enums in userspace by
using
%s
printf() specifier. This is a fully backwards compatiblechange, as trying to use
%s
on an enum was a compile errorbefore.
Example:
Note this PR does not support casting integers to enums. While
this is technically possible (and described in the linked ticket), it's
probably not necessary as users can simply perform arithmetic
operations on an enum and get the same result. There's currently
a limitation but it's being fixed in #3517.
This is easier to review per-commit.
This closes #2772.
Checklist
man/adoc/bpftrace.adoc
CHANGELOG.md