NetBSD Blog

Bookmarks

Feeds

An Internet-Ready OS From Scratch in a Week — Rump Kernels on Bare Metal

August 08, 2014 posted by Antti Kantee

The most time-consuming part of operating system development is obtaining enough drivers to enable the OS to run real applications which interact with the real world. NetBSD's rump kernels allow reducing that time to almost zero, for example for developing special-purpose operating systems for the cloud and embedded IoT devices. This article describes an experiment in creating an OS by using a rump kernel for drivers. It attempts to avoid going into full detail on the principles of rump kernels, which are available for interested readers from rumpkernel.org. We start by defining the terms in the title:

OS: operating system, i.e. the overhead that enables applications to run
internet-ready: supports POSIX applications and talks TCP/IP
a week: 7 days, in this case the period between Wednesday night last week and Wednesday night this week
from scratch: began by writing the assembly instructions for the kernel entry point
rump kernel: partial kernel consisting of unmodified NetBSD kernel drivers
bare metal: what you get from the BIOS/firmware

Why would anyone want to write a new OS? If you look at our definition of "OS", you notice that you want to keep the OS as small as possible. Sometimes you might not care, e.g. in case of a desktop PC, but other times when hardware resources are limited or you have high enough security concerns, you actually might care. For example, NetBSD itself is not able to run on systems without a MMU, but the OS described in this article does not use virtual memory at all, and yet it can run most of the same applications as NetBSD can. Another example: if you want to finetune the OS to suit your application, it's easier to tune a simple OS than a very complicated general purpose OS. The motivation for this work came in fact from someone who was looking to provision applications as services on top of VMWare, but found that no existing solution supported the system interfaces his applications needed without dragging an entire classic OS along for the ride.

Let's move on to discussing what an OS needs to support for it to be able to host for example a web server written for a regular OS such as Linux or the BSDs. The list gets quite long. You need a file system where the web server reads the served pages from, you need a TCP/IP stack to communicate with the clients, and you need a network interface driver to be able to send and receive packets. Furthermore, you need the often overlooked, yet very surprisingly complicated system call handlers. For example, opening a socket is not really very complicated to handle. Neither is reading and writing data. However, when you start piling things like fcntl(O_NONBLOCK) and poll() on top, things get trickier. By a rough estimate, if you run an httpd on NetBSD, approximately 100k lines of code from kernel are used just to service the requests that the httpd makes. If you do the math (and bc did), there are 86400 seconds in a week. The OS we are discussing is able to run an off-the-shelf httpd, but definitely I did not write >1 line of code per second 24/7 during the past week.

Smoke and Mirrors, CGI Edition

The key to happiness is not to write 100k lines of code from scratch, nor to port it from another OS, as both are time-consuming and error-prone techniques, and error-proneness leads to even more consumption of time. Rump kernels come into the picture as the key to happiness and provide the necessary drivers.

As the old saying goes: "rump kernels do not an OS make", and we need the rest of the bits that make up the OS side of the software stack from somewhere. These bits need to make it seem like the drivers in a rump kernel are running inside the NetBSD kernel, hence "smoke and mirrors". What is surprising is how little code needs to exist between the drivers and the hardware, just some hundreds of lines of code. More specifically, in the bare metal scenario we need support for:

low level machine dependent code
thread support and a scheduler
rump kernel hypercall layer
additionally: bundling the application into a bootable image

The figure below illustrates the rump kernel software stack. The arrows correspond to the above list (in reverse order). We go over the list starting from the top of the list (bottom of the figure).

Low level machine dependent code is what the OS uses to get the CPU and devices to talking terms with the rest of OS. Before we can do anything useful, we need to bootstrap. Bootstrapping x86-32 is less work than one would expect, which incidentally is also why the OS runs only in 32bit mode (adding 64bit support would not likely be many hours of work — and patches are welcome). Thanks to the Multiboot specification, the bootstrap code is more or less just a question of setting the stack pointer and jumping to C code. In C code we need to parse the amount of physical memory available and initialize the console. Since NetBSD device drivers mainly use interrupts, we also need interrupt support for the drivers to function correctly. On x86, interrupt support means setting up the CPU's interrupt descriptor tables and programming the interrupt controller. Since rump kernels do not support interrupts, in addition we need a small interrupt stub that transfers the interrupt request to a thread context which calls the rump kernel. In total, the machine dependent code is only a few hundred lines. The OSDev.org wiki contains a lot of information which was useful when hammering the hardware into shape. The other source of x86 hardware knowledge was x86 support in NetBSD.

Threads and scheduling might sound intimidating, but they are not. First, rump kernels can run on top of any kinds of threads you throw at them, so we can just use the ones which are the simplest to implement: cooperative threads. Note, simple does not mean poorly performing threads, and in fact the predictability of cooperative threads, at least in my opinion, makes them more likely to perform better than preemptive threading in cases where you are honing an OS for a single application. Second, I already had access to an implementation which served as the basis: Justin Cormack's work on userspace fibers, which in turn has its roots in Xen MiniOS we use for running rump kernel on the Xen hypervisor, could be re-purposed as the threads+scheduler implementation, with the context switch code kindly borrowed from MiniOS.

The rump kernel hypercall interface is what rump kernels themselves run on. While the implementation is platform-specific, our baremetal OS shares a large portion of its qualities with the Xen platform that was already supported. Therefore, most of the Xen implementation applied more or less directly. One notable exception to the similarities is that Xen paravirtualized devices are not available on bare metal and therefore we access all I/O devices via the PCI bus.

All we need now is the application, a.k.a. "userspace". Support for application interfaces (POSIX syscalls, libc, etc.) readily exists for rump kernels, so we just use what is already available. The only remaining issue is building the bundle that we bootstrap. For that, we can repurpose Ian Jackson's app-tools which were originally written for the rump kernel Xen platform. Using app-tools, we could build a bootable image containing thttpd simply by running the app-tools wrappers for ./configure and make. The image below illustrates part of the build output, along with booting the image in QEMU and testing that the httpd really works. The use of QEMU, i.e. software-emulated bare metal, is due to convenience reasons.

Conclusions

You probably noticed that whole thing is just bolting a lot of working components together while writing minimal amounts of necessary glue. That is exactly the point: never write or port or hack what you can reuse without modification. Code reusability has always been the strength of NetBSD and rump kernels add another dimension to that quality.

The source code for the OS discussed in this post is available under a 2-clause BSD license from repo.rumpkernel.org/rumpuser-baremetal.

[8 comments]

« Unbloating the VAX... | Main | NetBSD developer... »

Comments:

You either meant to write 604800 seconds, or 86400 minutes in the opening section of the article.

Posted by Kar on August 09, 2014 at 02:52 PM UTC #

Hello! Sorry, but I don't understand everything. Can I launch two "Rump Kernels on Bare Metal" on the one hardware with the same IP address for every instance? Or I need IP support inside "Low level machine dependent code" ? Thank you.

Posted by Ilia on August 12, 2014 at 08:17 AM UTC #

Ilia: In the above example, each rump kernel hosts a full TCP/IP stack, and configuring the same IP in multiple TCP/IP stacks (interfaces, really) will not work. However, there is something called "sockin" (see book.rumpkernel.org for details), the concept of which could be extended to provide what you want. You could boot n-1 rump kernels with sockin, one with a full TCP/IP stack, and share the same IP with all guests. The inherent limitation of that model, of course, is that you can use a TCP/UDP port only once per cluster.

Posted by Antti Kantee on August 12, 2014 at 12:44 PM UTC #

Kar: probably ...

Posted by Antti Kantee on August 12, 2014 at 12:45 PM UTC #

Hello, one more time! Thank you, for "sockin" I shall be reading this in any case. One little question: Were you doing some benchmark tests for this technique? For instance 10000 request for second utilize: thttpd on QEMU - XX% of host CPU. httpd on QEMU with full NetBSD image - YY% host CPU. Thank you.

Posted by Ilia on August 13, 2014 at 11:59 AM UTC #

"10000 requests per second"

Posted by Ilia on August 13, 2014 at 12:24 PM UTC #

Ilia: No, I did not run such benchmarks. I'd expect the default CPU utilization to be mostly the same, with the simple stack being easier to profile and tune. ... but only one way to find out ;)

Posted by Antti Kantee on August 18, 2014 at 01:39 PM UTC #

Antti, what about running automated test cases using ATS? Any idea of the effort included for that...?

Posted by ado on September 09, 2014 at 09:13 AM UTC #