BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Unleashing the Kernel with eBPF

Unleashing the Kernel with eBPF

Bookmarks
50:09

Summary

Liz Rice uses demos and examples to explain how eBPF works, and why the ability to customize the kernel’s behavior leads to powerful and efficient capabilities.

Bio

Liz Rice is Chief Open Source Officer with eBPF specialists Isovalent. She is the author of Container Security, and Learning eBPF, both published by O'Reilly, and she sits on the CNCF Governing Board, and on the Board of OpenUK. She was Chair of the CNCF's Technical Oversight Committee in 2019-2022, and Co-Chair of KubeCon + CloudNativeCon in 2018.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Rice: My name is Liz Rice. I work for a company called Isovalent, which originally created the Cilium project. I'm also pretty involved with the Cloud Native Computing Foundation and with OpenUK. Cilium is based on this technology called eBPF. I'm going to try and talk a bit about what eBPF is. Since we're in a track about efficient programming languages, I'm going to talk about why eBPF is really great for performance and building high performance infrastructure tooling, particularly in the cloud native world. eBPF sounds like an acronym. It did use to stand for extended Berkeley Packet Filter, but you can forget that now, because it does so much more than packet filtering. It's much broader than that. Forget the acronym.

Think of it as a standalone term for a kernel technology that allows us to customize the way the kernel behaves. With that, we can build these incredibly high performance and low overhead tools that can help us with networking and security and observability. I'll try and touch on all those three things in this talk. eBPF allows us to make the kernel programmable. We can dynamically load programs into the kernel and change the way that the kernel is behaving. The kernel is the part of the operating system that interfaces with hardware.

As an application developer, we're usually pretty protected from this concept. We have abstractions in programming languages that let us do things like open files and receive network messages. In practice, those abstractions are using the system call interface to ask for help from the kernel. The kernel is involved whenever we're doing anything that interfaces with hardware, so anything that involves accessing a file, sending or receiving network messages, allocating memory. The kernel is also coordinating all the different processes that might be running at the same time. With eBPF, we can have events that happen in the kernel, and we can attach our eBPF programs to those events. Whenever that event happens, our program is going to get triggered and it will run.

Let me do a little demo, a little hello world. Let's change this program to say Hello QCon. What I have here, this bit, is the actual eBPF program. In this example, it's written in C, and all it's going to do is write out a line of trace. Then I have some user space code written in Python around it that's going to take that code, compile it, load it into the kernel and attach it to a system call called execve. That system call gets used to run new executables. Whenever a new executable is running on this virtual machine, it's going to trigger my eBPF program to run.

This is a virtual machine running in the cloud with VS Code Remote, so once it's compiled and loaded, we should see loads of events, because it's using Node, and there's tons of stuff happening. Takes a little while to compile that program and load it. You can see the lines of trace coming out. If I run something like ps, you can see that's 443352, as a process ID. If I go back here, here's an event triggered from a bash executable with the process ID that matches what we just saw. Every time that event occurs, we triggered this little, tiny, and useless eBPF program to run.

Packet of Death Mitigation

That's not very useful, but let's think about a useful case. This particular example is going to be mitigating a vulnerability where the kernel can't handle a certain format of packet. There have been a couple of cases of this where a particularly crafted network packet can crash the kernel. You receive the packet. If you're an attacker, you craft a packet that will trigger this vulnerability and send it to somebody whose machine you want to bring down, and they have a very bad day.

Normally, the way to fix this would be a kernel patch that you'd have to roll out to all your machines. With eBPF, we can dynamically load a program that inspects the network packet, sees whether or not it matches this packet of death format, and just throw it away if that happens. I'm going to pretend that ping packets are packets of death. Obviously, they're not, but we can pretend. I'm going to have a little ping eBPF program that's going to essentially look at a packet, if it's a ping request.

At the moment, it's just going to send out a line of trace. I'm going to ping localhost. I'm going to attach this program to the loopback interface here. You can see ping is currently just running happily, with that sequence number ticking up every second. When I load my ping program, and again, it's going to take a little moment to compile and load into the kernel. Now we're seeing a line of trace being generated for every ping packet that we receive, and those sequence numbers are continuing to tick up. I'm going to change the program to drop packets instead of passing them up the stack. I'm just going to run that again. What we're going to see is we're still going to have a line of trace for every packet, but we're dropping with that return code, XDP_DROP.

We're dropping the packet so the sequence numbers have stopped incrementing on that right-hand side. We've been able to dynamically change the way that this machine is handling ping packets. If I change this back to pass again, and run it one last time, we should start seeing those sequence numbers. It skipped a few, as we'd expect, because those packets got dropped, but now the sequence numbers are incrementing again.

Dynamically Changing Kernel Behavior

We've effectively shown this concept of changing the way that the networking stack on this machine behaves, and we've done a little example of dropping packets. We can change the way that the kernel behaves. Of course, another way we could have done this would be to actually change the kernel. We could write kernel code and compile it ourselves and load it, and that sounds really hard. If you want to change the kernel in a universal way, you've got to persuade basically Linus Torvalds and the entire kernel community that your change is a really good idea. Get it into the upstream code. Familiarize yourself with the 30 million lines of code in the kernel to understand how to make that change.

Even if you get your change into the upstream kernel, it will take literally years for most production distributions to have that version of the kernel. Instead, with eBPF, we can just write the program, load it dynamically. We didn't even need to reboot the machine to get our change rolled out onto that machine. If, for example, you had a kernel vulnerability to mitigate, you can just load that eBPF program across your fleet of machines, and away you go. Subsequently, you might want to patch your machines when there is a kernel patch, but you don't have to rush to do it, because eBPF can protect you. You might be thinking, could I do this with kernel modules? I don't necessarily have to change the upstream kernel, maybe I can just write myself a kernel module. You could, but a lot of organizations will avoid using kernel modules because of the risk that a bug in that module can crash the kernel.

If the kernel crashes, it brings down your machine. It's not like an application crashing. With eBPF, the program as it's loaded into the kernel, goes through a verification process, and this analyzes all the possible paths that that program can take and make sure that it can't crash the kernel. It's going to check that all memory access is safe. It's going to check that it will run to completion. It's going to check that it's only accessing helper functions that it's allowed to access in the context it's operating in.

For example, if you're looking at a network packet, you're allowed to look at that network packet, but you won't have any process information, for example. When we're thinking about the performance of this, it's important to note that after the program has been verified, it's going to get JIT compiled to native machine code instructions. The kernel is executing some instructions as it normally does. The event happens. It calls the eBPF program, which means it's just executing some more machine code native instructions, but without any context switching. It's just kernel code that's getting called, like a function in the kernel.

Custom Behavior Without Transitions

That means we get to customize the behavior of our machine, and we don't have to have any kernel and user space transitions. It's surprisingly hard to find any data about how expensive it is to transition between the kernel and user space, but I think anybody who's done performance-aware tuning will know that this is something that can be really expensive. When we have things like that XDP program that I just showed you that dropped the ping packets, that particular example happens as a network packet arrives on the machine, and it happens before the kernel has even started to process that packet, literally getting the packet as it arrives on the wire. In fact, some network cards support what's called hardware offload, where you're allowed to run the XDP program on a network card.

The program can be run, and you can customize the behavior of the networking stack without even touching the networking stack, doing it on the network card. We're talking about running custom code on network packets. They don't just have to be XDP, there's all sorts of places within the networking stack that we can attach eBPF programs. It's triggered immediately by the packet, so we don't have to wait for anything like polling from user space. We're just literally going to receive the packet at this event and then start executing native machine code instructions. There's no polling, no CPU wasted checking to see if there's an event that we want to deal with.

The network packet is sitting in kernel memory, and we can process it using eBPF, where it is in kernel memory. There's no need to copy it into user space. Alan mentioned how memory copies can be really expensive. eBPF allows us to access kernel memory without copying it. Another interesting thing that we can do with eBPF is we have these data structures called maps, and those maps, some of them are designed to be per CPU, essentially per core. If you've got a core executing machine code instructions, and it has access to one of these per CPU maps, you haven't got any locking to worry about, because the kernel is the thing that manages locking.

If your core is just executing instructions, there's nothing else that can get in the way and access this per CPU eBPF map. This can be super-efficient for network processing. In fact, there are lots of examples of eBPF based load balancers, for example. This is an example of a load balancer project called Polycube, where they've compared a couple of different network hooks, XDP and something called TC, which is part of the network stack. They've also included in this graph, Katran, which is one of the really early uses of eBPF. It's a load balancer created by Meta, it was Facebook at the time.

Interestingly, Facebook/Meta have been using eBPF to handle every single network packet. Whether you've been using Instagram or WhatsApp or Facebook itself, every single packet has been processed by eBPF since, I think it's 2017. Another example of improved network performance, this is a company called Seznam, they replaced their IPVS based load balancer with an XDP load balancer. You can see this is where they did a performance test, and then switched the IPVS one back on again and got much better throughput with XDP processing.

Improved Performance for Container Networking

eBPF also allows us to improve performance, specifically in the container networking world, because of the way that container networking works. In a container environment like Kubernetes, I'll use Kubernetes as this example, where we're running our applications in pods, they're in containers in pods, and we usually give our pods their own network namespace. Essentially, we're isolating the networking from the pod, from the networking on the host machine. The network namespace in the host and in the pod are connected by a virtual Ethernet connection. A packet that arrives destined for that application arrives through the network interface to the host, traverses the whole networking stack on the host network namespace, then crosses the virtual Ethernet connection into the pod namespace, and then goes through another round of network stack processing before it finally reaches the application.

With eBPF, one of the things that we did in Cilium was to realize we can bypass a whole lot of that by looking at packets as they arrive, realizing what pod they're destined for, and routing it straight into the pod network namespace. That results in significant improvements, not just Cilium, also Calico in their eBPF mode, you can see significant improvements by using an eBPF versus the legacy iptables based routing. Another performance graph here. With Cilium, we have some XDP mode and some TCP mode forwarding. You can see that the available CPU capacity is much higher when we're using XDP or TC eBPF than if we're using kube-proxy.

In fact, if you're running Kubernetes, you can use Cilium to replace kube-proxy altogether and get much higher throughput or free up some CPU. This is borne out by real-life examples. This is a Cilium user called Trendyol, who published a blog post about the amazing performance improvements that they saw, and particularly much less CPU usage. If we're not using the CPU for networking, we can either use it for something else or just be more cost effective.

eBPF for Performance Tracing

If we think about performance, one of the ways we can use eBPF is to measure performance. Brendan Gregg was one of the people who really popularized using eBPF for performance tracing. He was at Netflix at the time, so dealing with these massively scaled deployments and doing a lot of performance tuning. If you're dealing with thousands of machines, then even a small per machine improvement in performance can massively reduce your costs or reduce your latency, or all kinds of performance benefits. Brendan was really involved with a project called bcc. bcc created a huge array of BPF based performance tracing tools. When I first saw this diagram, I thought, what is this? It's frightening. You don't need to know what all the things are.

The point is that for every part of the system, there are performance measurements tools in the bcc project that you can use to measure performance and potentially spot bottlenecks and performance issues. Whether we're talking about the way the file system is working, the way our applications are working, OOM kills, you name it, there is a performance measurement tool written in eBPF to allow you to measure it. Brendan did a talk at eBPF Summit last year, his proposition is that you should be able to spot a performance issue on a Monday and fix it by Friday.

His real pitch here, I think, is that if you can spot performance issues and address them, then it has a ton of benefits, including environmental benefits, just simply using less energy. One of the ways that he was really emphasizing we can find performance bottlenecks is with flame graphs, and essentially saying how eBPF is probably the most efficient way we have today to gather this kind of flame graph and spot those parts of our applications that are really causing us to burn CPU. I think a really important quote from him is that eBPF is essential for this kind of performance testing. He's someone who knows. He's done this at scale.

Programmable Kernel in Kubernetes

I want to come back to thinking about how eBPF is powerful in a container environment for reasons other than just networking. When we run in a cloud native environment, again, I'll assume we're using Kubernetes, but however we're running our containers. It doesn't matter how many containers you're running on a given virtual machine or bare metal machine, if you're running on bare metal, there is only one kernel. Your containers are only doing the user space part. They're all sharing a kernel from the host. Whenever our applications in their containers are doing interesting things like networking or accessing files or allocating memory, or whenever Kubernetes is creating new containers for new pods, that kernel on that host is involved.

The kernel is aware of everything that's happening in all our applications, in all our pods. That means, if we instrument the kernel using eBPF programs, that instrumentation can be aware of everything that's happening across all of our pods and all of our applications. We only need to instrument each node to get visibility and influence over what's happening in all of our applications, and we don't have to change the application or reconfigure our application.

As you saw, for example, with the ping example, the ping carried on running, but we could change the way the underlying kernel behaved without having to change the application. The application can continue running. That process can continue running. We change the kernel underneath it with eBPF. We don't have to change anything about the app. We don't have to reconfigure it. We don't even have to restart it. It is visible and effected by eBPF.

This leads us to this great cartoon, by Nathan. I think it's a really nice summary of why eBPF is a really great way for instrumenting containerized applications. The alternative is sidecars. With a sidecar, we take our pod and we inject a container into that pod to do whatever instrumentation we want, whether it's tracing, like in this example, or security tooling, or service mesh. For quite a while now, we've had lots of tools that use the sidecar model to instrument applications. You need a sidecar inside every pod. One of the things we did with cloud native, with containers is we deliberately isolate our applications from each other. If we put them into pods so that they don't have visibility over each other, that means you have to put a sidecar inside every pod if you want a user space application to have visibility and influence over that pod.

To get it there, we have to have some YAML. That YAML could be automated. Maybe it's created during CI/CD. Maybe it's created during a webhook, but something has to create the YAML to reconfigure this application. We have to restart the pod to get the sidecar into each of our applications, whereas if we used eBPF, we don't have to change the running pod, we can just instrument the kernel underneath, and we get visibility over all of the pods. That means if we get some malicious activity, whether it's running on the host or running inside a pod, it's still going to be visible to the kernel, to those eBPF programs. If I am a malicious actor and I find some way of running a pod or running an executable on your host machine, I'm probably not going to turn on your security tooling on my malicious pod, but if you're using eBPF based tools, you will see my malicious activity. Avoiding sidecars also allows us to reduce resource usage.

If you have a sidecar in every pod, they probably have some configuration information. Think, for example, about a network proxy for a service mesh. Every sidecar has got proxy routing information, and you need a copy of that in every single pod. If we're allowed to share that information, if we only have one instance of that configuration per node, rather than sitting in every single pod, that's going to be reduced resource usage, and we don't have to keep updating it. Remember that pods are isolated from each other, so if we want to change the configuration inside sidecar-based tooling, we have to go update it inside every single pod, typically.

eBPF Enables Efficient Sidecarless Service Mesh

eBPF is one of the ways that we can enable service meshes that don't need sidecar pods. This can be really efficient. If you have a proxy inside a pod, then every single packet has to flow through that proxy. You remember, before I showed the diagram where we saw a network packet going through the networking stack in the host and then going through the networking stack in the pod namespace. If we add a proxy in there, it's got to go through the pod network namespace three times. Every single packet has to go through that flow. Whereas if we can use something like eBPF to route packets, A, we don't have a copy of the network stack for the proxy inside every pod, and, B, we can make decisions about whether it needs to go through the layer 7 proxy. We only need to go through a proxy where layer 7 termination is required.

For example, if we were doing things like mTLS termination. Not every packet needs to go through a layer 7 proxy. We've seen some really good performance improvements through avoiding sidecars. eBPF is not the only way to do this. Istio has their Ambient Mesh option. Cilium's eBPF based service mesh allows us to do this using eBPF, and we can see just removing those sidecars can reduce costs up to 90%, and it makes things a lot easier to manage.

eBPF Enables High Performance Security Tools

I mentioned a little bit about security tools and how we can use eBPF for high performance security. As you know, we're talking about being able to observe everything that's involved with hardware, so files and networking and memory. The kernel is also managing processes and things like permissions and privileges. Those are really the things that are interesting for security purposes. If we want to spot malicious activity, it probably involves malicious access to a file, malicious network activity, some kind of memory overrides or privilege escalation, all of those things involve the kernel. If we spot those things using eBPF, that's a really great basis for runtime security tooling. If you think about the difference between just observing events and security observability, the difference is that we have a policy.

We might use eBPF to spot a set of events, and then we need to compare those events against a policy to decide whether or not they're legitimate or suspicious. Then if we think it looks suspicious or malicious, we do something with that event. We might log it. We might send an alert. We might have some metrics. We might want a user or an operator to come along and look at those events and go, why is this malicious activity happening? One example of an eBPF based tool for security observability is Falco. This is another CNCF project. It has two modes. It has a kernel module mode or an eBPF mode that can detect security relevant events and pass those to user space where they're compared against a policy.

The actual filtering of events is going to happen in user space with Falco. We can collect the events in the kernel and then we compare them against a policy using some kind of user space rules engine, and then decide what to do with that information, like create an alert, or log it, or send it to a SIM.

I want to talk a little bit about Tetragon, which is a sub-project in Cilium, where what we're doing that's different from Falco, is filtering those events, so comparing those events against the policy inside the kernel, so we can massively reduce the number of events that are actually getting sent from the kernel to user space. Let's take a look at an example here. I've got a few pods running in my namespace, far away. I'm running Tetragon in this top screen.

By default, Tetragon will always log process start and exit events. If I do something like exec into my xwing, and I do something like list files, I get the output, but I've also got a couple of logs here for the process, start and stop. I've also got a policy here that's called file monitoring. This is quite a complicated policy. Essentially, this is monitoring access to a set of sensitive files and directories. It's going to say, if I spot this kind of access, I want some additional events emitted. If I go back to my output and I do something like, cat the password file, we can see some read events. What I'm showing here is just like a very compact view of what Tetragon is gathering.

It actually gathers a huge amount of data about what the process was, what the container was, what the Kubernetes identities were, when it happened, when the executable that created it was started, what the process hierarchy is. There's absolutely tons of information that could then be sent to a SIM to analyze, why was somebody reading the password file, later? Being able to exercise that policy and filter events inside the kernel is dramatically improving performance. On the left-hand side, we're basically doing a benchmark here with Tetragon or another eBPF based solution. We can see that we're able to use significantly less CPU to monitor access to sensitive files.

Runtime Security Enforcement in eBPF

That's all well and good, spotting these security events, but can we do high performance security enforcement with eBPF? Can we stop these malicious activities from actually happening? Traditionally, with kernel-based security tools, you might spot these events happening in the kernel and you send the event information to user space. User space looks at it, compares it against a policy, realizes this looks suspicious, and in an enforcement mode, it would do something like kill the process.

Unfortunately, there is time between spotting that malicious event from occurring, or spotting that it occurred and actually killing the process. That time might be enough for damage to be done, for malicious activity to have happened, maybe data to be exfiltrated. Certainly, if the thing you were looking at was a malicious network activity, it already happened by the time you killed the process. With Tetragon, what we can actually do, because we have a team of incredibly smart kernel engineers who know how to do this stuff, we can synchronously kill the process from within the kernel in enforcement mode. This is optional, but it means you can prevent that malicious activity from happening at all. You're still getting this incredibly low overhead ability to monitor these events.

For example, I've got another security policy in this cluster, and this one's a little bit easier to read. It's looking at TCP connections. It's basically saying, if the connection is not to one of the addresses inside the cluster: those IP addresses are all inside my cluster, if it's outside of those, then we're going to kill the process. Let's bring my Tetragon logs back up again, and let's first of all try curling to the deathstar service. Let's show you the services. I have a deathstar service inside my cluster. If I, from my xwing, curl to the death star's API, then we should see that activity happened.

In fact, we're also seeing that that caused a read to the password file. We saw the curl activity happening, but because it was inside the cluster, that's fine, it's not considered malicious. If I try to curl to something outside the cluster, that was prevented. We didn't see any response, because we never got any response. We saw the connect attempt to an external address, and that connect attempt was killed. That process was killed synchronously within the attempt to make that TCP connect call, which is super powerful and awesome.

Wrap-Up, and Resources

I hope I've shown a variety of examples of how eBPF can let us do some incredibly powerful things, in a lot of cases, dramatically improving performance and giving us a level of control and customization that just hasn't really been possible before without eBPF. There are some books, two of which I wrote that you can download from the Isovalent website if you want to learn more. There are also some nice labs if you want to play with some of these tools, on isovalent.com/labs. Of course, you can find out lots more about eBPF and the Cilium project from their respective websites. Everything I've shown you is open source.

Questions and Answers

Participant 1: As eBPF programs are in the kernel, if there are multiple user space programs that try to load an eBPF program, can this be a problem of interactions, because I know there are some hooks, like XDP that does not like being used by multiple programs at the same time.

Rice: It depends. It depends on the attachment type.

Participant 1: Because, for example, if in my Kubernetes cluster, I'm using Cilium, maybe I want to debug a service, I'm going to maybe use bpftrace, maybe it can create some interactions. Also, if, for example, I'm using Cilium, but I want also to use an operator to do something specific that creates services that also use eBPF for their own stuff. If they are not designed together, it may create bad interactions.

Rice: Really, this is about if you have multiple user space programs or agents loading BPF programs, can they get in each other's way? Yes, and it depends on the attachment point. It's also an area that the kernel community is looking at. The problem is, in some attachment points, you can't have more than one program attached. There is some work being looked at around prioritizing different programs, having some ordering. At the moment, in the places where you are allowed multiple programs to be attached, it's either first in executed or last in first executed. Unless you know which order they were loaded, you wouldn't know which order they'll get processed in.

It's something that has become more of an issue recently because of the popularity of eBPF and the fact that there are more eBPF based tools. I wouldn't say it's something that we see a lot in the field yet. Let's say people are using Cilium for networking, and they'll be using Cilium based tools for diagnosing networking problems, and then maybe they're using bpftrace or Pixie for some other performance measurements, and they're attached to different hook points, so it's not really a problem. We have seen occasionally concerns where you can't have more than one thing attached to a particular point.

Participant 2: If I was to think about an example, because right now [inaudible 00:42:24] announced their main firewall solution is iptables, like it's per code, but not per applications. With eBPF with cgroups and all, it became possible to make a per application firewall. If such applications come into being more common, then it will completely use the network stack. That's also a concern.

Right now, we are using iptables, but if, on the long term, eBPF solutions became more popular to replace iptables rules, maybe for, for example, specific application rules, like it's technically already done by Cilium network-wise. In the long term, if there are some eBPF based firewall, it can go in the way of other eBPF?

Rice: The example here is, instead of using iptables rules for application specific firewalls, you might want eBPF based application specific firewall rules, which you can already do with Cilium.
I think the answer is largely going to be, don't try to use multiple different tools that do stand on top of each other, but use a tool that can give you the solution that you need. For example, if you were using Cilium for network policies per application, I'm not sure why you would have something else also simultaneously trying to do network firewalling at the same attachment points, but maybe there's a use case.

Participant 2: eBPF makes messing up with the kernel so easy. Is there a danger that by dropping ping packages, you're breaking something else on the node. Would you recommend the developers writing eBPF code as they write the application code?

Rice: How do you know if you're doing something like dropping network packets, that that's a good thing. Could you not be inadvertently messing something else up?

Yes, you could. Also, importantly, the eBPF verifier can tell you that your program is safe to run and won't crash the kernel. It cannot tell you whether the intent is legitimate or malicious. For example, I could be dropping network packets, because I think they're going to packet of death, and I think that would be a good idea to drop them. Or maybe I'm doing it for denial of service or to prevent denial of service attacks. I could also be a malicious actor dropping packets just to mess with your system, and there is no way for eBPF as a platform to know the difference. It can't tell you what the intent is. Importantly, you need to treat eBPF as privileged. It is privileged.

There is no greater privilege than being able to change the way your kernel behaves. Don't randomly run eBPF programs that you downloaded off the internet from some rando. You do want to know where your eBPF programs came from. There's quite a lot of work going on around validating the eBPF programs that you run to make sure that they have come from a legitimate source. You might think, can't you just sign them? Just signing them is not as simple as you might think.

Participant 3: In the previous slides we saw how we run it in Python. Can we say that in order to launch the program, we need admin privileges on the Python script itself to change the code on the kernel.

Rice: eBPF programs are essentially classed into two. There's network related programs and the other ones sometimes called perf or tracing. From my perspective, it's networking and everything else, and you need CAP_NET_ADMIN capability in the Linux kernel to attach network related eBPF programs, and CAP_SYS_ADMIN for everything else. Plus, there's a BPF capability that was introduced probably long enough for now that everybody has a kernel that supports those capabilities. Yes. You get those by being root. If I attempted to run those examples as non-root user, without those capabilities, then, yes, it would have just rejected loading the program.

Participant 1: When I'm a malicious actor and I create a rootkit for eBPF and just have it there. What are the ways that other eBPF programs can observe that, or does Cilium have that on the roadmap to observe which programs are loaded, because in my experience, it's super hard to get an insight what's running.

Rice: What if I have created an eBPF based rootkit, and I've loaded something malicious into the kernel, can I have BPF tooling to observe other BPF programs?
Yes and no. Yes, there's tools like bpftool, which is a command line tool that you can use to see what BPF programs are loaded and what maps are loaded, and get all kinds of interesting information about those. I believe it is possible to obfuscate the results because if you can influence the syscalls that are returning the results that are giving you this information, then, yes, it becomes an arms race.

If you've got there first with your eBPF program loaded, first thing after boot, and it manages to obfuscate itself or hide itself from tools like bpftool, then, yes. I think it again comes down to the trust, making sure that the boot image that you're running with is something that you trust. The continued arms race that no doubt we will have with interesting security attack vectors. eBPF enables a lot of really cool things, and it also enables some new attack vectors. Like any other bit of software, people will find some bad things to do with it.

 

See more presentations with transcripts

 

Recorded at:

Dec 19, 2024

BT