Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error in dereferencing kernel double pointers #3024

Merged
merged 2 commits into from
May 20, 2024

Conversation

viktormalik
Copy link
Contributor

@viktormalik viktormalik commented Feb 28, 2024

BPF verifier can detect safety of pointer accesses for BTF-based probes (k(ret)func, iter) and therefore it is not necessary to use bpf_probe_read_kernel inside such probes. This feature was enabled in bpftrace by commit c2c3ab9 ("Support identifying btf type").

Unfortunately, the verifier is not able to track BTF information for dereferences and array accesses on double pointers so, e.g. the following script fails to load:

# bpftrace -e 'kfunc:__module_get { print(args.module->trace_events[0]->flags);' } -v
INFO: node count: 13
Attaching 1 probe...

Error log:
reg type unsupported for arg#0 function kfunc_vmlinux___module_get#22
0: R1=ctx(off=0,imm=0) R10=fp0
0: (79) r1 = *(u64 *)(r1 +0)
func '__module_get' arg0 has btf_id 250 type STRUCT 'module'
1: R1_w=ptr_module(off=0,imm=0)
1: (79) r1 = *(u64 *)(r1 +1128)       ; R1_w=scalar()
2: (79) r1 = *(u64 *)(r1 +0)
R1 invalid mem access 'scalar'
processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

ERROR: Error loading program: kfunc:vmlinux:__module_get

A similar error happens when dereferencing the double pointer with *

# bpftrace -e 'kfunc:__module_get { print((*args.module->trace_events)->flags);' } -v

An analogous program fails to load even when written using libbpf.

We need to use bpf_probe_read_kernel for such cases so do not propagate the SizedType::is_btftype flag when observing a dereference or array access of a double pointer in semantic analyser.

This also fixes another bug related to array access - field analyser was not doing field resolution on array accesses on pointers to structs.

Overall, this resolves #3016.

Checklist
  • Language changes are updated in man/adoc/bpftrace.adoc
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find! did you report this upstream? wonder if this should be supported by kernel

@viktormalik
Copy link
Contributor Author

nice find! did you report this upstream? wonder if this should be supported by kernel

Not yet, but I'll report it. I wanted to check first if it is a bug or a feature :)

@danobi
Copy link
Member

danobi commented Feb 28, 2024

Seems more like a lack of a feature to me

@viktormalik
Copy link
Contributor Author

Turns out the same problem exists when dereferencing via array accesses, i.e. args.trace_events[0]->flags. I'll try to fix that, too, in this PR.

@viktormalik viktormalik added the do-not-merge Changes are not ready to be merged into master yet label Feb 29, 2024
@viktormalik
Copy link
Contributor Author

viktormalik commented Feb 29, 2024

Fixed the array access on our side, too. I'll report/fix kernel upstream.

Originally, I wanted to separate the dereference fix from the array access one b/c the latter is one of two issues causing #3016. It turned out that it makes more sense to have them together so I added the other fix for #3016 to this PR. We now have two changelog entries with the same PR number but I think that it makes sense to merge everything together.

I'll also update the PR description.

@viktormalik viktormalik removed the do-not-merge Changes are not ready to be merged into master yet label Feb 29, 2024
@viktormalik viktormalik requested a review from danobi February 29, 2024 08:59
@ajor
Copy link
Member

ajor commented Feb 29, 2024

Why is it that we need to reset is_btftype on the type contained within the array, but for pointers only the outer type gets reset?

@viktormalik
Copy link
Contributor Author

Why is it that we need to reset is_btftype on the type contained within the array, but for pointers only the outer type gets reset?

It's actually the other way around - for pointers, we just reset the pointed type, for arrays we reset both the array and the element types.

The reason is that there's a slight difference in codegen for ptr->x and ptr[0]:

  • For ptr->x, we have two operations - dereference and field access. The load/probe_read is only done during the field access so the check for is_btftype is done on the type of *ptr.
  • For ptr[0], we have just a single operation so the check must be done on the type of ptr. We must also reset is_btftype on the element type to avoid further propagation.

Copy link
Member

@ajor ajor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks!

Is it possible to add a unit test for the FieldAnalyser change?

@viktormalik
Copy link
Contributor Author

Is it possible to add a unit test for the FieldAnalyser change?

Sure! I added a test (and confirmed that it failed prior to this PR).

tests/data/data_source.c Dismissed Show dismissed Hide dismissed
tests/data/data_source.c Fixed Show fixed Hide fixed
@viktormalik
Copy link
Contributor Author

While adding the test, I updated data_source.c for BTF and DWARF which changed one codegen test.

src/ast/passes/semantic_analyser.cpp Outdated Show resolved Hide resolved
@viktormalik
Copy link
Contributor Author

This slipped off my radar. I rebased onto the current master and will merge today afternoon if there are no objections.

@danobi
Copy link
Member

danobi commented May 1, 2024

Looks like a legit unit test failure

@viktormalik viktormalik force-pushed the btf-ptr-access-fix branch from 28f49bc to 580e791 Compare May 3, 2024 06:39
@viktormalik
Copy link
Contributor Author

Looks like a legit unit test failure

This is a tricky one. It doesn't reproduce on a local build (at least not on my setup), only in Nix. It's caused by this (seemingly trivial) change in data_source.c:

diff --git a/tests/data/data_source.c b/tests/data/data_source.c
index c6a7f713..eaecbc45 100644
--- a/tests/data/data_source.c
+++ b/tests/data/data_source.c
@@ -21,7 +21,7 @@ struct Foo3 {
 
 struct Foo3 foo3;
 
-struct Foo3 *func_1(int a, struct Foo1 *foo1, struct Foo2 *foo2)
+struct Foo3 *func_1(int a, struct Foo1 *foo1, struct Foo2 *foo2, struct Foo3 *foo3)
 {
   return 0;
 }
@@ -115,7 +115,7 @@ int main(void)
   struct bpf_iter__task_file iter_task_file;
   struct bpf_iter__task_vma iter_task_vma;
 
-  func_1(0, 0, 0);
+  func_1(0, 0, 0, 0);
 
   bpf_iter_task();
   bpf_iter_task_file();

With this, the field_analyser_dwarf.uprobe_args unit test fails because liblldb willl fail to get function parameters for func_3. I was able to narrow it down to dwarf_parser.cpp:80 where func_3 is found but, for some reason, it is not identified as a function and therefore fn will be null. Maybe @ttreyer could give some help on this?

@ttreyer
Copy link
Contributor

ttreyer commented May 8, 2024

With this, the field_analyser_dwarf.uprobe_args unit test fails because liblldb willl fail to get function parameters for func_3. I was able to narrow it down to dwarf_parser.cpp:80 where func_3 is found but, for some reason, it is not identified as a function and therefore fn will be null. Maybe @ttreyer could give some help on this?

I have reproduced the environment with the error and will have a look at it 🙂

@ttreyer
Copy link
Contributor

ttreyer commented May 8, 2024

This look like an issue with LLDB and/or GCC 12.3.0, but not with GCC 11.4.1. I'm investigating further...
In the meantime, as a workaround to unblock this PR, you can add __attribute__((noinline)) to func_3, which force GCC to emit enough DWARF for LLDB to correctly locate the function and its information.

@viktormalik viktormalik force-pushed the btf-ptr-access-fix branch from 580e791 to 143adae Compare May 9, 2024 16:40
tests/data/data_source.c Dismissed Show dismissed Hide dismissed
@viktormalik viktormalik force-pushed the btf-ptr-access-fix branch from 143adae to 817f6bb Compare May 9, 2024 17:00
@viktormalik
Copy link
Contributor Author

In the meantime, as a workaround to unblock this PR, you can add __attribute__((noinline)) to func_3, which force GCC to emit enough DWARF for LLDB to correctly locate the function and its information.

Works great, thanks! I added it to the PR, should be now ready to merge.

@viktormalik viktormalik force-pushed the btf-ptr-access-fix branch from 817f6bb to 3bd2f52 Compare May 20, 2024 09:08
@viktormalik viktormalik added this to the v0.21.0 milestone May 20, 2024
We did not resolve structs for array accesses on pointer types in
FieldAnalyser which causes failures when applying the array access
operator on pointers to structs:

    # bpftrace -e 'kfunc:vfs_read { print(args.file[0].f_inode); }'
      stdin:1:18-37: ERROR: Struct/union of type 'struct file' does not contain a field named 'f_inode'
      kfunc:vfs_read { print(args.file[0].f_inode); }
                       ~~~~~~~~~~~~~~~~~~~

This fixes the issue by adding the field resolution.
BPF verifier can detect safety of pointer accesses for BTF-based probes
(k(ret)func, iter) and therefore it is not necessary to use
bpf_probe_read_kernel inside such probes. This feature was enabled in
bpftrace by commit c2c3ab9 ("Support identifying btf type").

Unfortunately, the verifier is not able to track BTF information for
dereferences and array accesses on double pointers so, e.g. the
following script fails to load:

    # bpftrace -e 'kfunc:__module_get { print(args.module->trace_events[0]->flags);' } -v
    INFO: node count: 13
    Attaching 1 probe...

    Error log:
    reg type unsupported for arg#0 function kfunc_vmlinux___module_get#22
    0: R1=ctx(off=0,imm=0) R10=fp0
    0: (79) r1 = *(u64 *)(r1 +0)
    func '__module_get' arg0 has btf_id 250 type STRUCT 'module'
    1: R1_w=ptr_module(off=0,imm=0)
    1: (79) r1 = *(u64 *)(r1 +1128)       ; R1_w=scalar()
    2: (79) r1 = *(u64 *)(r1 +0)
    R1 invalid mem access 'scalar'
    processed 3 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

    ERROR: Error loading program: kfunc:vmlinux:__module_get

A similar error happens when dereferencing the double pointer with `*`

    # bpftrace -e 'kfunc:__module_get { print((*args.module->trace_events)->flags);' } -v

An analogous program fails to load even when written using libbpf.

We need to use bpf_probe_read_kernel for such cases so do not propagate
the SizedType::is_btftype flag when observing a dereference or array
access of a double pointer in semantic analyser.
@viktormalik viktormalik force-pushed the btf-ptr-access-fix branch from 3bd2f52 to 0d0ddc9 Compare May 20, 2024 10:19
@viktormalik viktormalik merged commit 67629c1 into bpftrace:master May 20, 2024
18 checks passed
@viktormalik viktormalik deleted the btf-ptr-access-fix branch May 22, 2024 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dereferencing pointes via [0] fails to load the program
4 participants