Accessing Kernel struct Data in Systemtap
December 30, 2017
https://sourceware.org/systemtap is an instrumentation framework which enables you to write scripts which can
measure and inspect code and data running on a live Linux system. It is incredibly useful for gathering data and metrics
when diagnosing problems too complex to be understood using the standard kernel metrics available via
netlink endpoints. Scripts are based around the concept of probes, which attach to probe points - which are primarily
functions in the kernel or userspace. Probes can then be then used to extrapolate a huge amount of insight - things like
how many times was this specific function called? Which kernel module is flushing the disk cache? The kind of data you
can’t just pull generic counters for hoping to guess what’s going on by inferring it. Either probe points, which are
compiled into modern kernels, or a full debug symbol table (usually installed seperately via your package manager) can be
used as a source of information for SystemTap to map system calls and memory locations back to something useful.
Using Systemtap we can gain some very deep introspection of the kernel, how it works, and the program flow and in-memory structures which are causing specific behaviour you want to diagnose. Systemtap can help you! Using Systemtap, you can also get access to the variables and their contents passed to kernel functions as they are called, which allows for all sorts of very powerful analysis. Let’s dig into that - by examining dropped network packets. Examining dropped network packets requires quite a lot of kernel introspection, so it’s a good topic to step through and examine when talking about breaking down kernel structures whilst tracing. It’s also a difficult problem to diagnose typically.
But first, some context on what exactly we are analysing when we talk about kernel introspection.
Network Structures in Linux
Most things in the Linux kernel are represented by purpose-built in-memory structures. These are often just referred to
structs as this is the low-level code primitive used to build representative structures in memory in C, and are
widely used to represent key kernel concepts.
We are going to refer to two key
struct types in the Linux kernel. These are just examples though, you can use the approach
I’m describing to get introspection into anything defined in the kernel and passed as an argument to a probed function.
Firstly, network devices. In the Linux kernel, every interface is described by a
include/linux/netdevice.h, if you are interested, I won’t reproduce it here because it’s hefty). Network drivers
will allocate and update a
struct net_device for each interface, and these are often passed around as pointers when other
kernel structures need to keep track of a device they are interacting with. For example, when a device is added to a bond
or bridge, the child interfaces are referred to by their net_device structs. Essentially, each interface you see in
net_device struct floating around in kernel memory.
(Side note: This is somewhat loosely - as is the case in a few kernel bugs in recent memory - tracked by the
which has to fall to zero before a device can be freed, or not, as the case was…).
Secondly, packets. Every packet which passes through the Linux networking stack is stored in a structure called the SKB (“Socket Buffer”). It keeps track of a packet as it is received, handled, and transmitted, the devices which handled it, the priority, TTL all the way until it is handled and removed from the kernel’s socket buffer.
Whenever a packet is dropped in the Linux kernel, it is as a result of a module, or the net code in the Kernel itself, deciding to drop it, rather than send it on to somewhere else. Sometimes, in the case of QoS, or active/active bonding, it is desirable or at least expected to drop a packet - maybe you’ve hit the bandwidth limit in the qdisc the traffic is assigned to, or maybe a packet come in on the wrong leg of the bond - so the packet is dropped and the protocol compensates.
Other times, it is a move made out of panic or not other code path presenting itself. Every time it happens however, there is a
kfree_skb is the function in the kernel which sends a packet into the void, never to be seen again.
The standard tooling tells us when packets are dropped, and when packets are not being transmitted/received, per interface,
by way of the interface statistics present in
ip link. But this doesn’t help us understand the state of the kernel,
where in the kernel the decision to drop happened, the contents of a dropped packet, and the code path leading up to the
decision made when dropping a packet, which can be key to understanding why a packet is dropped, and if you’re dealing with
a bug, or just plain old configuration issues.
Anatomy of a SystemTap Script
The easiest way to tie all of the above together is with an example script. This script below has been constructed to identify
kfree_skb call is being called in the kernel, and to use the passed structures to identify details about the packet
passed to the call. This allows us to identify useful information such as the interface and kernel module referenced when
a packet was dropped.
Et Voila. Comments inline. Any questions, feel free to get in touch!