Revolutionizing Kernel Development: Testing With Rump


August 19, 2010 posted by Antti Kantee

There are numerous good tools which do an excellent job of testing kernel features and help to catch bugs. The more frequently they are run as part of the regular development cycle, the more bugs they expose before the bugs are shipped to be discovered by end users. However, prior to being able to execute kernel tests configuration is required. Examples of configuration steps include mounting the file system under test, setting up an NFS server, selecting a network interface and configuring an IP address or setting up a test network. This makes taking a kernel test suite into use unnecessarily complicated and reduces the likelihood of all tests being run by any single kernel developer as part of the development process, thereby reducing the number of bugs which are caught early on.

This article explains how rump is the enabling technology for a safe, fast and run-anywhere kernel test suite which requires absolutely no configuration from the person running the tests. It is a logical continuation for the article about automated testing of NetBSD which described the tools used to run the NetBSD test suite periodically. We will look at various kernel tests, such as those related to file systems, IP routing and kernel data structures, and point out the advantages of using rump as compared to conventional testing approaches.

Features of rump

Figure 1: An example of a rump kernel hosted in a userspace process

While the rump kernel architecture entails more than can be explained in this space, the key thing to know for understanding this article is that rump enables running a fully virtual kernel in a userspace process. In a sense it is similar to usermode operating systems, but with the two critical distinctions that the kernel is highly componentized and there is no separate userland. This means:

  • very fast bootstrap times (in the order of 0.01s)
  • allows writing tests as self-contained packages which include all configuration steps necessary for the test
  • makes result gathering automatic and easy even in the case where the test leads to a kernel panic
  • testing uses only the minimum resources necessary
Leaving out irrelevant baggage such as the virtual file system code in networking tests dramatically reduces hardware resources required for testing, and can result in significant savings in test farm hardware costs, regardless of whether testing was previously done on raw hardware or in a full (para)virtual machine.

Virtual

Rump provides a virtual instance of the kernel. This means that it is both in a separate namespace from the host and that if the test requires it, an arbitrary number of kernels be spawned by calling fork().

First, let us identify what an isolated virtual namespace means: for example memory counters, received network packet counters, sysctl switch values, and buffer cache size limits are private to the rump kernel. The same privacy applies to other system resources such as the file system namespace, IP ports and driver instance identifiers (e.g. raid2). This gives the test author the freedom to use and modify resources without affecting the host and also gives confidence that irrelevant events caused by other processes running on the same system do not lead to incorrect results. This means that tests are also protected from each other: it is possible to run multiple tests in parallel without risk of them affecting each other and causing hard-to-duplicate test failures.

Figure 2: NFS test setup

Second, virtualization can be used to start multiple instances of a kernel. This is important in cases like networking, where testing complex routing schemes requires numerous hosts to be available. Another example is testing NFS by providing the NFS server and client in separate kernels. This case is illustrated Figure 2. The network interface depicted in the figure deserves special mention: as opposed to the normal approach of using the host's Ethernet tap driver and bridges for creating virtual networks, rump tests executed within one host use a special interface with a memory mapped file as the interprocess bus. This means that as long as tests are executed in a private directory (and they are), networking is fully private as opposed to depending on a shared host resource.

As an example we briefly discuss the test for NetBSD PR kern/43548. The bug, in essence, is a kernel panic in the IP forwarding routine when sending an ICMP TTL exceeded. In a virtual kernel the value of net.inet.icmp.returndatabytes necessary to tickle the bug can be set without disturbing the network traffic on the host. The second advantage of rump virtualization in the test is fast bootstrapping of two kernels: one to send a packet and one to act as the router for the packet to catch and process it in ip_forward(), thus triggering the bug.

Note that for networking other solutions for lightweight virtualization exist, such as Solaris Zones. However, they make a poor candidates for the above test, since they use the same kernel as the host and are not suitable in tests which aim to cause a kernel crash. Acceptable alternatives would be fully virtual OSs like ones running on top of Xen or QEMU, or a Usermode OS, but those have very high setup and per-test bootstrap costs.

Crashproof

The rump kernel runs entirely as part of a normal userspace process. This means that no bug in the kernel under test can bring the test host down. One of the goals of the NetBSD test suite is to make it easy to run without special setups, and that requires that there is no increased risk of crashing the test host.

However, it is reasonable to desire to test experimental features which are not yet production stable. This not only makes it possible to monitor the progress of the subsystem, but also ensures that no regressions in stability happen due to other changes. For example, LFS (log-structured file system) is considered experimental. While it works, some operations carry a high risk of kernel panic. The NetBSD test suite contains numerous tests for LFS with associated Problem Report (PR) numbers where the test causes a kernel panic or otherwise fails. These tests can also be used as a guideline to repeat and fix LFS problems.

In addition to being able to test experimental kernel features, it is also possible to immediately add and enable a test case for a bug causing a kernel crash. Test suites based on the host kernel require waiting until the bug is fixed to prevent other test suite users from suffering disruptive crashes during test suite execution.

Furthermore, for a limited number of tests the correct outcome is a kernel panic or reboot. For example, one of the test cases for the kernel watchdog expects a reboot in case the watchdog is not tickled within the timeout period. The automated rump test forks a new virtual kernel and gives a verdict based on the return value of wait().

Another advantage of being crashproof is fuzzing and fault injection. These two types of tests feed seemingly random input to the unit under test or cause a backend used by the unit under test to fail, respectively. As these tests are designed to exercise error paths where undetected bugs most often lurk, it is not uncommon to see them cause a kernel crash.

Finally, tests for kernel high availability features such as CARP can be executed in an authentic fashion by simply killing off the master and observing if handover to the backup is done according to specification.

Complete

It is normal practice for kernel features to be unit tested in userspace by compiling the source module(s) with a special testing #define. This straightforward method has two problems. First, it requires working towards this target already when writing the code and can be difficult to achieve as an afterthought. Second, it is limited by the inherent problems of just unit testing: only a limited fraction of the full picture is exercised and the test may miss subtle interactions between kernel components.

Rump, on the other hand, provides a complete software stack from the system call entry point to the kernel feature under test. Kernel code runs in a rump kernel without the code being written in a specific way. On i386 and amd64 it is even possible to use standard binary kernel modules as drivers in a rump kernel -- though, most of the time kernel module binaries lack debugging information and are not useful in case a problem is discovered.

Debugging

If testing uncovers a problem, figuring out what went wrong is the prerequisite for fixing the problem. We look at how rump helps with debugging the problem. For discussion, we divide issues into three broad categories:

  1. system call returns an incorrect value
  2. system call causes a deterministic kernel crash
  3. a set of timing related events cause a kernel crash

1: An all too common error value returned from a system call is EINVAL. Sometimes the location of the error is extremely difficult to determine by educated guesses. For example, a problem with mounting small FFS file systems was reported recently. The easy way to figure out where the error was coming from was to single-step the mount system call into the rump kernel to discover the source of the error.

2: Deterministic kernel crashes are nice to debug since they can be reliably repeated. In the normal scenario the choices are to debug the situation directly with the in-kernel debugger, attach an external debugger, or write a kernel core image to swap space, reboot, and examine the status post-mortem. The first option provides instant debugging, but works on a machine code level instead of the source level. The second option requires additional setup, and the third option introduces a delay in between the crash and being able to debug it. In contrast, a rump kernel panic is an application level core dump, and it can be debugged immediately with a source level debugger.

3: Timing problems are generally the hardest problems to debug, since they depend on a seemingly random order of events which in the best case happens frequently and in the worst case extremely rarely. Usually the only way is to make educated guesses and put print calls into suitable places to gather more data and narrow down the suspects. However, the problem with calling the kernel print routine is that it runs inside the kernel, takes locks and generally influences kernel execution. Rump provides a print routine which executes on the host. It requires no special setup or parameters, just calling rumpuser_dprintf() instead of printf(). While the call still takes wall time, it does not use rump kernel resources and has generally been found better for tracking down timing problems. This combined with the fast iteration cycle that rump offers are powerful tools in debugging failed test cases.

For example, a faulty invariant due to a lockless algorithm in the file descriptor code was discovered as part of unrelated testing, quickly narrowed down, isolated, and fixed. Also, a test which reliably triggers the invariant panic in around 100000 iterations was added to make sure the bug does not resurface. This issue is described in PR kern/43694.

Timing problems are sometimes exposed more readily on a multiprocessor setup where multiple threads enter the same areas of code simultaneously. Since the CPU count provided by rump is virtual, SMP can be simulated on uniprocessor machines. For example, the race condition described in PR kern/36681 triggers on a uniprocessor system when rump is configured to provide two virtual CPUs. As opposed to some other technologies, a rump kernel can use all of the host processors and does not experience expotential slowdown as the number of virtual CPUs increases.

In summary, the advantages provided by rump for the cases mentioned in the beginning of the section are:

  1. seamless single-stepping into the kernel
  2. instant source level debugging
  3. external print routine, quick iteration, virtual SMP

Limitations

Although applicable for a major part of the NetBSD kernel, rump does not support all subsystems. This is by design for reasons which are beyond the scope of this article. The subsystems beyond the reach rump are the virtual memory subsystem (UVM) and the thread scheduler. Even so, it is important not to confuse the lack of the standard VM subsystem with the inability to stress the kernel under resource shortage. As already hinted earlier, a resource shortage such as lack of free memory can be trivially simulated with rump without affecting the test host or the test program.

Hardware testing is another area where tests cannot be fully automated with rump. This is not so much due to rump -- rump does support USB hardware drivers -- but more because of the nature of hardware: the test operator is required to plug in the hardware under test and supply its location to the test.

Conclusions

We presented a revolutionary method for automated kernel testing built upon the unique rump kernel architecture of NetBSD. Rump provides a crashproof and lightweight fully virtual kernel featuring a complete software stack and is second to none for kernel testing.

The result of using rump for kernel testing is a test suite which anyone can safely run anywhere, and which is not limited only to kernel developers with highly specialized knowledge on how to set up the test environment. The author strongly believes that the superior test suite will solidify NetBSD's reputation as the most stable and bug-free operating system.

Further Resources

A standard NetBSD installation contains the test suite in /usr/tests. All of the tests described in this article are available in the NetBSD source tree under src/tests.

While some features mentioned in this article are available in the NetBSD 5.0 release, most are present only in the development branch of NetBSD and will be new in NetBSD 6.0 when it is eventually released.

[7 comments]

 



Comments:

Very interesting article. One question - is it possible to run an entire NetBSD userspace instance on top of a RUMP kernel or is this limited to a single process ?

Posted by PaulC on August 20, 2010 at 07:02 AM UTC #

How does this compare to DragonFly BSD's "vkernel" feature? It's also a virtualized kernel that runs as a userland process on a normal kernel. In contrast to rump, you can attach a full userland to a vkernel and boot it into multi-user mode if desired. The DF developers use it extensively for debugging kernel facilities (such as the HAMMER file system, the scheduler etc.) which was the main motivation for developing this feature. They have it since 2006.

Posted by Thomas on August 20, 2010 at 09:38 AM UTC #

PaulC: it's not limited to one address space, but cross-process calls are currently done through a socket (even remote ones via TCP!). This is slow, but it is by design: allow some freedom, but optimize for running in one address space. If you want to run an entire virtual operating system, there are many well-known technologies available (such as Xen or qemu). This is not to say there is no synergy between rump and a usermode OS, just that they do not match 1:1 either in purpose or in implementation.

Posted by Antti Kantee on August 20, 2010 at 02:07 PM UTC #

Thomas: The Dragonfly vkernel is what the article refers to as a "usermode operating system", and you can find some comparisons for it. Already the manual page you linked contains a long list of steps necessary to configure the vkernel. Although same setup may be applicable for a group of kernel tests (such as all HAMMER tests), it is unlikely to apply to all. From the perspective of someone who writes a lot of automated kernel tests, I do not find a full OS as convenient as rump, be it usermode or not. If you wish to challenge specific points the article makes against the usermode OS approach for testing, I'd be happy to respond.

Posted by Antti Kantee on August 20, 2010 at 02:18 PM UTC #

Thanks for the quick response. It's now a bit clearer to me. It wasn't my intention to challenge anything. I guess both approaches have advantages and disadvantages, depending on how you look at it. By the way, regarding the "long list of steps": Most of the steps are just required to create the file system image, which has to be done only once. Then, starting vkernels is quite a simple and quick thing.

Posted by Thomas on August 20, 2010 at 08:32 PM UTC #

No problem, I welcome challanges ;-). Yes, it is simple in trivial cases and manual testing, but what if you start pushing it? What if a test case needs 128 kernels? Do you set up 128 different images, or do you attempt some sharing scheme? How long does the OS boot until it's ready? To pick a probably generous number, let's say 3s. That doesn't sound long until you need to start 10000 of them during a test suite run: then it's over 8.5 hours just for overhead. Can you cache bootstrapped instances? Then which ones are available? What if some test corrupts your root file system? How do you /automatically/ extract the result from a kernel panic? The list goes on and on. I am not claiming a usermode OS is not an option for testing, but I am claiming it is not the optimal tool for most automated kernel tests (in a limited number of cases it is, as you say, better). You can always deal with issues, but energy is better spent on the tests instead of working around the wrong tool.

Posted by Antti Kantee on August 21, 2010 at 12:54 AM UTC #

t? What if a test case needs 128 kernels? Do you set up 128 different images, or do you attempt some sharing scheme? How long does the OS boot until it's ready? To pick a probably generous number, let's say 3s. That doesn't sound long until you need to start 10000 of them during a test suite run: then it's over 8.5 hours just for overhead. Can you cache bootstrapped instances? Then which ones are available?

Posted by moncler jacke on October 08, 2010 at 08:34 AM UTC #

Post a Comment:
Comments are closed for this entry.