December 17, 2013 posted by Antti Kantee
A cyclic trend in operating systems is moving things in and out of the
kernel for better performance. Currently, the pendulum is swinging
in the direction of userspace being the locus of high performance.
architecture of NetBSD ensures that the same kernel drivers work in a
monolithic kernel, userspace and beyond. One of those driver stacks is
networking. In this article we assume that
the NetBSD networking stack is run outside of the monolithic kernel in
a rump kernel and survey
the open source interface layer options.
There are two sub-aspects to networking. The first facet is supporting
network protocols and suites such as IPv6, IPSec and MPLS. The second
facet is delivering packets to and from the protocol stack, commonly
referred to as the interface layer. While the first facet for rump
kernels is unchanged from the networking stack running in a monolithic
NetBSD kernel, there is support for a number of interfaces not available
in kernel mode.
The Data Plane Development Kit is meant to be used for high-performance,
multiprocessor-aware networking. DPDK offers network access by attaching
to hardware and providing a hardware-independent API for sending and
receiving packets. The most common runtime environment for DPDK is
Linux userspace, where a UIO userspace driver framework kernel module
is used to enable access to PCI hardware. The NIC drivers
themselves are provided by DPDK and run in application processes.
For high performance, DPDK uses a run-to-completion scheduling model --
the same model is used by rump kernels. This scheduling model means
that NIC devices are accessed in polled mode without any interrupts
on the fast path. The only interrupts that are used by DPDK are for
slow-path operations such as notifications of link status change.
The rump kernel interface driver for DPDK is available
DPDK itself is described in significant detail in the documents available
from the Intel DPDK page.
Like DPDK, netmap offers user processes access to NIC hardware with
a high-performance userspace packet processing intent. Unlike DPDK,
netmap reuses NIC drivers from the host kernel and provides memory-mapped
buffer rings for accessing the device packet queues. In other words,
the device drivers still remain in the host kernel, but low-level and
low-overhead access to hardware is made available to userspace processes.
In addition to the memory-mapping of buffers, netmap uses other
performance optimization methods such as batch processing and buffer
reallocation, and can easily saturate a 10GigE with minimum-size frames.
Another significant difference to DPDK is that netmap allows also for
a blocking mode of operation.
Netmap is coupled with a high-performance software virtual switch called
VALE. It can be used to interconnect networks between virtual machines
and processes such as rump kernels. The netmap API is used also by VALE,
so VALE switching can be used with the rump kernel driver for netmap.
The rump kernel interface driver for netmap is available here.
Multiple papers describing netmap and VALE are available from the netmap page.
A tap device injects packets written into a device node, e.g.
/dev/tap, to a tap virtual network interface. Conversely,
packets received by the virtual tap network can be read from the device
node. The tap network interface can be bridged with other network
interfaces to provide further network access. While indirect access to
network hardware via the bridge is not maximally efficient, it is not
hideously slow either: a rump kernel backed by a tap device can saturate a
gigabit Ethernet. The advantage of the tap device is portability, as it
is widely available on Unix-type systems. Tap interfaces also virtualize
nicely, and most operating systems will allow unprivileged processes
to use tap interface as long as the processes have the credentials to
access the respective device nodes.
The tap device was the original method for accessing with a rump kernel.
In fact, the in-kernel side of the rump kernel network driver was rather
short-sightedly named virt back in 2008. The virt driver and the
associated hypercalls are available in the NetBSD tree.
Fun fact: the tap driver is also the method for packet shovelling when running the NetBSD TCP/IP stack in the Linux kernel; the rationale
is provided in a comment here
and also by running wc -l.
After a fashion, using Xen hypercalls is a variant of using the TAP
device: a virtualized network resource is accessed using high-level
hypercalls. However, instead of accessing the network backend from a device
node, Xen hypercalls are used. The Xen driver is
limited to the Xen environment and is available here.
NetBSD PCI NIC drivers
The previous examples we have discussed use a high-level interface to
packet I/O functions. For example, to send a packet, the rump kernel
will issue a hypercall which essentially says "transmit these data",
and the network backend handles the request. When using NetBSD PCI
drivers, the hypercalls work at a low level, and deal with for example
reading/writing the PCI configuration space and mapping the device memory
space into the rump kernel. As a result, using NetBSD PCI device drivers
in a rump kernel work exactly like in a regular kernel: the PCI devices
are probed during rump kernel bootstrap, relevant drivers are attached,
and packet shovelling works by the drivers fiddling the relevant device
The hypercall interfaces and necessary
kernel-side implementations are currently hosted in the repository
providing Xen support for rump kernels. Strictly speaking, there
is nothing specific to Xen in these bits, and they will most likely be
moved out of the Xen repository once PCI device driver support for other
planned platforms, such as Linux userspace, is completed. The hypercall
implementations, which are Xen specific, are available
For testing networking, it is advantageous to have an interface which
can communicate with other networking stacks on the same host without
requiring elevated privileges, special kernel features or a priori setup
in the form of e.g. a daemon process. These requirements are filled by
shmif, which uses file-backed shared memory as a bus for Ethernet frames.
Each interface attaches to a pathname, and interfaces attached to the
same pathname see the the same traffic.
The shmif driver is available in
the NetBSD tree.
We presented a total of six open source network backends for networking
with rump kernels. These backends represent four different methodologies:
- DPDK and netmap provide high-performance network
hardware access using high-level hypercalls.
- TAP and Xen hypercall drivers provide access to
virtualized network resources using high-level hypercalls.
- NetBSD PCI drivers access hardware directly using
register-level device access to send and receive packets.
- shmif allows for unprivileged testing of the networking
stack without relying on any special kernel drivers or global
Choice is a good thing here, as the optimal backend ultimately depends
on the characteristics of the application.