NetBSD Blog

Project Report: Add support for chdir(2) support in posix_spawn(3)

2021-11-22T18:01:54+00:00

This post was written by Piyush Sachdeva:

Abstract

The primary goal of the project was to extend posix_spawn(3) to include chdir(2) for the newly created child process. Two functions were supposed to be implemented, namely posix_spawn_file_actions_addchdir() and posix_spawn_file_actions_addfchdir(), to support both chdir(2) and fchdir(2) respectively. posix_spawn() is a POSIX standard method responsible for creating and executing new child processes.

Implementation

The original code can be found at my github tree.

The implementation plan was discussed and made with the guidance of both my mentors Martin Husemann and Joerg Sonnenberger. The plan was divided into three phases each corresponding to the specific part of The NetBSD code-base which is supposed to be touched:

User-Land

The following actions were performed in the user-land to set things up for the kernel-space.

Add another member to the posix_spawn_file_actions_t struct i.e a union, which would hold the path to chdir to.
Implement the two functions posix_spawn_file_actions_addchdir() and posix_spawn_file_actions_addfchdir(). These functions would:
1. allocate memory for another posix_spawn_file_actions_t object in the posix_spawn_file_actions_t array.
2. take the path/file descriptor from the user as an argument and make the relative field of the newly allocated file actions object, point to it.
The final step was to add the prototypes for the two new functions to the `src/include/spawn.h' header file

Once the aforementioned changes were made, the only thing left to do was to make the kernel support these two new functions.

Kernel-Space

The following actions were performed inside the kernel space.

The three functions in the `src/sys/kern_exec.c' file which correspond to the posix_spawn_file_actions were edited:
- posix_spawn_fa_alloc() was adjusted to make sure that the path passed to posix_spawn_file_actions_addchdir() gets copied from the user-land to the kernel-space.
- Similarly posix_spawn_fa_free() was adjusted to make sure that the memory allocated in case of FAE_CHDIR gets freed as well.
- Finally, two new cases FAE_CHDIR & FAE_FCHDIR were added in the handle_posix_spawn_file_actions(). In each of the cases, a call to one of the two newly created functions (discussed in the next point) do_sys_chdir() and do_sys_fchdir() was made respectively.
Note: At the time of code integration, a helper function was written by Christos Zoulas. This function aimed to reduce the amount of repeated code in both posix_spawn_fa__free() and posix_spawn_fa_alloc()
Two new functions, similar to the already present sys_chdir() and sys_fchdir() in `src/sys/vfs_syscalls.c' were created. Namely do_sys_chdir() and do_sys_chdir were written with two specific thoughts in mind:
- By default sys_chdir() and sys_fchdir() took syscallargs as a parameter. The purpose of the new functions was to replace this with const char * and an int type parameter respectively.
- The do_sys_chdir() also replaced UIO_USERSPACE with UIO_SYSSPACE. This was done because the chdir path passed to this function already resided in the Kernel-space due to the change made in posix_spawn_fa_alloc().
Finally, the prototypes for the newly written functions were added to the `src/sys/sys/vfs_syscalls.h' file and this file was also included in the 'sys/kern/kern_exec.c'.

Note: Similar to the above changes of user-land and kernel-space, a few tweaks were also made to `src/sys/compat/netbsd/netbsd32.h' and `netbsd32_execve.c'. This was required to help COMPAT_NETBSD32 deal with the new file actions member. However, these changes were made at the time of integration by Martin Husemann.

With most of addition of new features being done, all that remained was testing and documentation.

Testing & Documentation

A total of ten new test cases have been added to the `src/tests/lib/libc/gen/posix_spawn/t_spawn.c' file.
Three utility functions were also used to aid in testing. Out of the three, one new function was written and two existing functions (filesize() and empty_outfile()) from `t_fileactions.c' were used. To make sure that the 2 existing functions were shared between both the files i.e `t_spawn.c' and `t_fileactions.c' a new header and C file was created, namely `fa_spawn_utils.h' and `fa_spawn_utils.c'. Following this, the bodies of both the functions were moved from `t_fileactions.c' to `fa_spawn_utils.c' and their prototypes were added to the corresponding header file.
The general approach that was taken to all test cases was to make posix_spawn() execute ``/bin/pwd'' and write the output to a file. Then read the file and do string comparison. The third function i.e. check_succes() was written for just this purpose.
The ten test cases cover the following scenarios:
- Absolute path test - for both chdir and fchdir.
- Relative path test - for both chdir and fchdir.
- Trying to open a file instead of directory - for both chdir and fchdir.
- Invalid path/file descriptor (fd=-1) - for both chdir and fchdir.
- Trying to open a directory without access permissions for chdir.
- Opening a closed file descriptor for fchdir.
The first 8 test cases had a lot of repetitive code. Therefore, at the time of integration, another function was created i.e spawn_chdir(). This function included a huge chunk of the common code and it did all the heavy lifting for those first 8 test cases.

Documentation:

In this matter, a complete man page is written which explains both posix_spawn_file_actions_addchdir() and posix_spawn_file_actions_addfchdir() in great detail. The content of the manual page is taken from the POSIX documentation provided to us by Robert Elz.

Issues

Since the project was well planned from the beginning, it resulted in few issues.

The user-land was the most straight forward part of the project and I had no trouble sailing through it.
Kernel space was where things got a bit complicated, as I had to add functionality to pre-existing functions.
I was completely new to using atf(7) and groff(1). Therefore, it took me some time to understand the respective man pages and become comfortable with testing and documentation part.

Most of the issues faced were generally logistical. As it was my first time doing a kernel project, I was new to building from source, Virtual Machines and other things like SSH. But luckily, I had great help from my mentors and the entire NetBSD community.

Thanks

I would like to express my heartfelt gratitude to The NetBSD Foundation for giving me this opportunity and sponsoring the Project. This project would not have been possible without the constant support and encouragement of both my mentors Martin Husemann and Joerg Sonnenberger. My gratitude to Christos Zoulas who worked on the crucial part of integrating the code. A special mention to all of the other esteemed NetBSD developers, who have helped me navigate through the thick and thin of this project and have answered even my most trivial questions.

wifi project status update

2021-08-26T17:46:53+00:00

After initial work on the wifi renewal branch went quite fast and smooth, things have slowed down a bit in the last few months.

Most of the slow down was due to me not being available for this type of work for unexpectedly long times - a problem that should be fixed now.

However, there were other obstacles and unexpected issues on the way:

bpf taps are handled differently in the new stack and some slightly obscure site conditions of this had been overlooked in the initial conversion. To make everything work, changes to our bpf framework were needed (and have landed in -current some time ago now).
Many wifi drivers seem to be in a, let's say, slightly fragile state. When testing the random collection of wifi hardware that I acquired during this project in -current, many drivers did not work at first try and often I was able to provoke kernel panics quickly. This is not a happy base to start converting drivers from.
After the great success of usbnet(9) for USB ethernet drivers, core and I agreed to do the same for wifi - the result is called usbwifi(9) and makes conversion of usb drivers a lot easier than other wifi drivers. See the conversion instructions for more details. usbwifi(9) is both quite similar but also quite different to usbnet(9), mostly for two reasons: it interfaces to a totally different stack, and many usb wlan chipsets are more complex than ethernet chipsets (e.g. have support for multiple send queues with different priorities). Developing usbwifi did cost quit some time (initially unplanned), but is expected to amortize over the next few drivers and quickly end up as a net win.
I have been hitting a bug in the urtwn(4) driver used for inial usbwifi(9) development and still not found it (as of today). It seems to hit randomly and not be caused by the usbwifi(9) conversion - a fact that I found out only recently. So for now I will put this driver aside (after spending *way* too much time on it) and instead work on other usb drivers, returning to the bug every now and then and see if I can spot it. Maybe I can borrow a USB analyzer and get more insight that way.

The current state of driver conversion and what drivers are still open are listed in the wifi driver conversion matrix.

Next steps ahead are:

~~make another pass over documentation and improve things / fixup for recent changes~~ (done before this blog post got published)
sync the branch with HEAD and keep tracking it more closely
convert run(4) to usbwifi
revisit rtwn(4) and decide if/how it should be merged with urtwn(4)
revisit iwm(4) and make it work fully
convert all other drivers, starting with the ones I have hardware for

Currently it is not clear if this branch can be merged to HEAD before branching for netbsd-10. We will not delay the netbsd-10 branch for this.

Support for chdir(2) in posix_spawn(3)

2021-06-10T15:28:05+00:00

This post was written by Piyush Sachdeva:

What really happens when you double click an icon on your desktop?

Support for chdir(2) in posix_spawn(3)

Processes are the bread and butter of your operating system. The moment you double click an icon, that particular program gets loaded in your Random Access Memory (RAM) and your operating system starts to run it. At this moment the program becomes a process. Though you can only see the execution of your process, the operating system (the Kernel) is always running a lot of processes in the background to facilitate you.

From the moment you hit that power button, everything that happens on the screen is the result of some or the other process. In this post we are going to talk about one such interface which helps in creation of your programs.

The fork() & exec() shenanigans

The moment a computer system comes alive, it launches a bunch of processes. For the purpose of this blog let’s call them, ‘the master processes’. These processes run in perpetuity, provided the computer is switched on. One such process is init/systemd/launchd (depending on your OS). This ‘init’ master process owns all the other processes in the computer, either directly or indirectly.

Operating systems are elegant, majestic software that work seamlessly under the hood. They do so much without even breaking a sweat (unless it’s Windows). Let's consider a scenario where you have decided to take a trip down memory lane and burst open those old photos. The ‘init master process’ just can’t terminate itself and let you look at your photos. What if you unknowingly open a malicious file, which corrupts all your data? So init doesn’t just exit, rather it employs fork() and exec() to start a new process. The fork() function is used to create child processes which are an exact copy of their parents. Whichever process calls fork, gets duplicated. The newly created process becomes the child of the original running process and the original running process is called the parent. Just how parents look after their kids, the parent process makes sure that the child process doesn't do any mischief. So now you have two exactly similar processes running in your computer.

One might think that the newly created child process doesn’t really help us. But actually, it does. Now exec() comes into the picture. What exec() does is, it replaces any process which calls it. So what if we replace the child process, the one we just thought to be useless, with our photos? That's exactly what we are going to do indeed. This will result in replacement of the fork() created child process with your photos. Therefore, the master init process is still running and you can also enjoy your photos with no threat to your data.

“Neither abstraction nor simplicity is a substitute for getting it right. Sometimes, you just have to do the right thing, and when you do, it is way better than the alternatives. There are lots of ways to design APIs for process creation; however, the combination of fork() and exec() is simple and immensely powerful. Here, the UNIX designers simply got it right.” Lampson’s Law - Getting it Right

Now you could ask me, `But what about the title, some ‘posix_spawn()’ thing?´ Don’t worry, that’s next.

posix_spawn()

posix_spawn() is an alternative to the fork() + exec() routine. It implements fork() and exec(), but not directly (as that would make it slow, and we all need everything to be lightning fast). What actually happens is that posix_spawn() only implements the functionality of the fork() + exec() routines, but in one single call. However, because fork() + exec() is a combination of two different calls, there is a lot of room for customization. Whatever software you are running on your computer, calls these routines on its own and does the necessary. Meanwhile a lot is cooking in the background. Between the call to fork() and exec() there is plenty of leeway for tweaking different aspects of the exec-ing process. But posix_spawn doesn’t bear this flexibility and therefore has a lot of limitations. It does take a lot of parameters to give the caller some flexibility, but it is not enough.

Now the question before us is, “If fork() + exec() is so much more powerful, then why have, or use the posix_spawn() routine?” The answer to that is, that fork() and exec() are UNIX system routines. They are not present in operating systems that are not a derivative of UNIX. Eg- Windows implements a family of spawn functions.
There is another issue with fork() (not exec() ), which in reality is one of the biggest reasons behind the growth of posix_spawn(). The outline of the issue is that, creating child processes in multi-threaded programs is a whole another ball game altogether.

Concurrency is one of those disciplines in operating systems where the order in which the cards are going to unravel is not always how you expect them to. Multi-threading in a program is a way to do different and independent tasks of a program simultaneously, to save time. No matter how jazzy or intelligent the above statement looks, multi-threaded programs require an eagle’s eye as they can often have a lot of holes. Though the “tasks” are different and independent, they often share a few common attributes. When these different tasks due to concurrency start running in parallel, a data race begins to access those shared attributes. To not wreak havoc, there are mechanisms through which, when modifying/accessing these common attributes (Critical Section) we can provide a sort of mutual exclusion (locks/conditional variables) - only letting one of the processes modify the shared attribute at a time. Here when things are already so intricate due to multithreading, and to top it off, we start creating child processes. Complications are bound to arise. When one of the threads from the multi-threaded program calls fork() to create a child process, fork() does clone everything (variables, their states, functions, etc) but it fails to clone other threads (though this is not required at all times).

The child process now knows only about that one thread which called fork(). But all the other attributes of the child that were inherited from the parent (locks, mutexes) are set from the parent’s address space (considering multiple threads). So there is no way for the child process to know which attributes conform to which parts of the parent. Also, those mechanisms that we used to provide mutual exclusion, like locks and conditional variables, need to be reset. This reset step is essential in letting the parent access it’s attributes. Failing this reset can cause deadlocks. To put it simply, you can see how difficult things have become all of a sudden. The posix_spawn() call is free from these limitations of fork() encountered in multi-threaded programs. However, as mentioned by me earlier, there needs to be enough rope to meet all the requirements before posix_spawn() does the implicit exec().

About my Project

Hi, I am Piyush Sachdeva and I am going to start a project which will focus on relaxing one limitation of posix_spawn - changing the current directory of the child process, before the said call to exec() is made. This is not going to restrict it to the parent’s current working directory. Just passing the new directory as one of the parameters will do the trick. Resolving all the impediments would definitely be marvelous. Alas! That is not possible. Every attempt to resolve even a single hindrance can create plenty of new challenges.

As already mentioned by me, posix_spawn() is a POSIX standard. Hence the effect of my project will probably be reflected in the next POSIX release. I came across this project through Google Summer of Code 2021. It was being offered by The NetBSD Foundation Inc. However, as the slots for Google Summer of Code were numbered, my project didn’t make the selection. Nevertheless, the Core Team at The NetBSD Foundation offered me to work on the project and even extended a handsome stipend. I will forever be grateful to The NetBSD Foundation for this opportunity.

Notes

init, systemd & launchd are system daemon processes. init is the historical process present since System III UNIX. systemd was the replacement for the authentic init which was written for the Linux kernel. launchd is the MacOS alternative of init/systemd.
This is taken from Operating Systems: The Three Easy Pieces book by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau.
UNIX is the original AT&T UNIX operating system developed at the Bell Labs research center, headed by Ken Thompson and Dennis Ritchie.

References

Operating Systems: Three Easy Pieces by Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau.
Advanced Programming in the UNIX Environment by W. Richard Stevens and Stephen A. Rago.
UNIX and Linux System Administration Handbook by Evi Nemeth, Garth Synder, Trent R. Hein, Ben Whaley and Dan Mackin.

aiomixer, X/Open Curses and ncurses, and other news

2021-05-12T13:35:14+00:00

aiomixer is an application that I've been maintaining outside of NetBSD for a few years. It was available as a package, and was a "graphical" (curses, terminal-based) mixer for NetBSD's audio API, inspired by programs like alsamixer. For some time I've thought that it should be integrated into the NetBSD base system - it's small and simple, very useful, and many developers and users had it installed (some told me that they would install it on all of their machines that needed audio output). For my particular use case, as well as my NetBSD laptop, I have some small NetBSD machines around the house plugged into speakers that I play music from. Sometimes I like to SSH into them to adjust the playback volume, and it's often easier to do visually than with mixerctl(1).

However, there was one problem: when I first wrote aiomixer 2 years ago, I was intimidated by the curses API, so opted to use the Curses Development Kit instead. This turned out to be a mistake, as not only was CDK inflexible for an application like aiomixer, it introduced a hard dependency on ncurses.

X/Open Curses and ncurses

Many people think ncurses is the canonical way to develop terminal-based applications for Unix, but it's actually an implementation of the X/Open Curses specification. There's a few other Curses implementations:

NetBSD curses is descended from the original BSD curses, but contains many useful extensions from ncurses as well. We use it all over the base system, and for most packages in pkgsrc. It's also been ported to other operating systems, including Linux. As far as I'm aware, NetBSD is one of the last operating systems left that doesn't primarily depend on ncurses.

There's one crucial incompatibility, however: ncurses exposes its internal data structures, NetBSD libcurses keeps them opaque. Since CDK development is very tied to ncurses development (they have the same maintainer), CDK peeks into those structures, and can't be used with NetBSD libcurses. There are also a few places where ncurses breaks with X/Open Curses, like this case I recently fixed in irssi.

Rewriting aiomixer

I was able to rewrite aiomixer in a few days using only my free time and NetBSD libcurses. It's now been imported to the base system. It was a good lesson in why Curses isn't actually that intimidating - while there are many functions, they're mostly variations on the same thing. Using Curses directly resulted in a much lighter and more usable application, and provided a much better fit for the types of widgets I needed.

Many people also provided testing, and I learned a lot about how different terminal attributes should be used in the process. NetBSD is probably one of the few communities where you'll get easy and direct feedback on how to not only make your software work well in a variety of terminal emulators, but also old school hardware terminals. During development, I was also able to find a strange bug in the curses library's window resizing function.

The API support was also improved, and the new version of aiomixer should work better with a wider variety of sound hardware drivers.

Other happenings

Since I'm done plugging my own work, I thought I might talk a bit about some other recent changes to CURRENT.

Most ports of NetBSD now build with GCC 10, thanks to work by mrg. The new version of GCC introduced many new compiler warnings. By default, since NetBSD is compiled with a fixed toolchain version, we use -Werror. Many minor warnings and actual-bugs were uncovered and fixed with the new compiler.
On the ARM front, support for the Allwiner V3S system-on-a-chip was introduced thanks to work by Rui-Xiang Guo. This is an older model Allwinner core, primarily used on small embedded devices. It's of likely interest to hardware hackers because it comes in an easily soldered package. A development board is available, the Lichee Pi Zero. Also in the Allwinner world, support for the H3 SoC (including the NanoPi R1) was added to the sun8icrypto(4) driver by bad.
Support for RISC-V is progressing, including an UEFI bootloader for 64-bit systems, and an in-kernel disassembler. Some NetBSD developers have recently obtained Beagle-V development boards.
On the SPARC64 front, support for sun4v is progressing thanks to work by palle. The sun4v architecture includes most newer SPARC servers that are based on the Logical Domains architecture - virtualization is implemented at the hardware/firmware level, and operating systems get an abstracted view of the underlying hardware. With other operating systems are discussing removing support for SPARC64, there's an interest among NetBSD developers in adding and maintaining support for this very interesting hardware from Oracle, Fujitsu, and Sun in an open source operating system, not just Oracle Solaris.
A kernel-wide audit and rework of the auto-configuration APIs was completed by thorpej.
Various new additions and fixes have been made to the networking stack's PPP over Ethernet support by yamaguchi.
A new API was introduced by macallan that allows adding a -l option to the wsfontload(8) command, allowing easy viewing of the tty fonts currently loaded into the kernel.
... OK, I'm not done plugging my own work: I recently wrote new documentation on using NetBSD for virtualization, Power Management, and rewrote the NetBSD Guide's sections on Networking in Practice and Audio. I also recently added support for the Neo 2 keyboard layout to NetBSD's console system - Neo 2 is a Dvorak-like optimized layout for German and other languages based on multiple layers for alphabetical characters, navigation, and symbols.

The GNU GDB Debugger and NetBSD (Part 2)

2020-05-04T20:17:01+00:00

The NetBSD team of developers maintains two copies of GDB:

One in the base-system with a stack of local patches.
One in pkgsrc with mostly build fix patches.

The base-system version of GDB (GPLv3) still relies on a set of local patches. I set a goal to reduce the local patches to bare minimum, ideally reaching no local modifications at all.

Over the past month I've reimplemented debugging support for multi-threaded programs and upstreamed the support. It's interesting to note that the old support relied on GDB tracking only a single inferior process. This caused the need to reimplement the support and be agnostic to the number of traced processes. Meanwhile the upstream developers introduced new features for multi-target tracing and a lot of preexisting code broke and needed resurrection. This affected also the code kept in the GDB basesystem version. Additionally over the past 30 days, I've also developed new CPU-independent GDB features that were for a long time on a TODO list for NetBSD.

After the past month NetBSD has now a decent and functional GDB support in the mainline. It's still not as featured as it could and CPU-specific handling will need a dedicated treatment.

Signal conversions

GDB maintains an internal representation of signals and translates e.g. SIGTRAP to GDB_SIGNAL_TRAP. The kernel independent management of signal names is used by the GDB core and requires translation on the border of kernel-specific implementation. So far, the NetBSD support relied on an accidental overlap of signal names between the GDB core and the NetBSD definitions, while this worked for some signals it wasn't matching for others. I've added a patch with appropriate NetBSD->GDB and GDB->NetBSD signal number conversions and enabled it in all NetBSD CPU-specific files.

Later, newly added code respects now these signal conversions in the management of processes.

Threading support

I've implemented the NetBSD specific methods for dealing with threads. Previously the pristine version of GDB was unaware about threading on NetBSD and the basesystem GDB relied on local patches that needed reimplementation to meet the expectations (especially rewrite to C++) of the upstream maintainers.

I have upstreamed this with the following commit:

Implement basic threading support in the NetBSD target

Use sysctl(3) as the portable interface to prompt NetBSD threads on
all supported NetBSD versions. In future newer versions could switch
to PT_LWPSTATUS ptrace(2) API that will be supported on NetBSD 10.0
and newer.

Implement as part of nbsd_nat_target:
 - thread_name()         - read descriptive thread name
 - thread_alive()        - check whether a thread is alive
 - post_attach()         - updates the list of threads after attach
 - update_thread_list()  - updates the list of threads
 - pid_to_str()          - translates ptid to a descriptive string

There are two local static functions:
 - nbsd_thread_lister()  - generic LWP lister for a specified pid
 - nbsd_add_threads()    - utility to update the list of threads

Now, GDB on NetBSD can attach to a multithreaded process, spawn
a multithreaded process, list threads, print their LWP+PID numbers
and descriptive thread names.

ELF symbol resolver

The NetBSD operating system relies on the ELF file format for native applications.

One of the features in GDB is to skip single-stepping over the internal ELF loader code. When a user of GDB instructs ther debugger to step a line in a source code, and whenever we land into an unresolved symbol in the GOT table, we could step internal code of the ELF loader, for the first usage of a public symbol, which is not necessarily wrong, but could be confusing. This is typically worked around with a generic code that tries to detect such scenario examining among others the code sections of the stepped code, but the default fallback is not functional on every NetBSD port, especially Alpha and ARM. The new code merged into GDB uses the same logic for all NetBSD ports, trying to detect the _rtld_bind_start symbol and act acordingly in case of detecting it.

SVR4 psABI parsers of AUXV entries

The ELF format ships with a mechanism to transfer certain kernel level information to the user process. AUXV is a key-value array located on the stack and available to an ELF loader. While Linux uses a different format than the one specified in SVR4 psABI, NetBSD follows the standard and stores the key value in a 32-bit integer always. This caused breakage on 64-bit CPUs and the NetBSD developers used to patch the Linux AUXV code to be compatible with the NetBSD behavior. I've added a dedicated function for the NetBSD AUXV handling and switched to it all NetBSD CPUs.

Process information (`info proc`)

As documented by the GDB project: Some operating systems provide interfaces to fetch additional information about running processes beyond memory and per-thread register state. If GDB is configured for an operating system with a supported interface, the command info proc is available to report information about the process running your program, or about any process running on your system.

Previously the info proc functionality was implemented only for Linux and FreeBSD. I've implemented support for the following commands:

info proc | info proc process-id - Summarize available information about a process.
info proc cmdline - Show the original command line of the process.
info proc cwd - Show the current working directory of the process.
info proc exe - Show the name of executable of the process.
info proc mappings - Report the memory address space ranges accessible in a process.
info proc stat | info proc status - Show additional process-related information.
info proc all - Show all the information about the process described under all of the above info proc subcommands.

All of these pieces of information are retrieved with the sysctl(3) interface. Example of an execution of the command is below:

(gdb) info proc all
process 26015
cmdline = '/usr/bin/cal'
cwd = '/public/binutils-gdb-netbsd'
exe = '/usr/bin/cal'
Mapped address spaces:

          Start Addr           End Addr       Size     Offset   Flags   File
            0x200000           0x204000     0x4000        0x0  r-x C-PD /usr/bin/cal
            0x404000           0x405000     0x1000     0x4000  r-- C-PD /usr/bin/cal
            0x405000           0x406000     0x1000        0x0  rw- C-PD 
      0x7f7ff6c00000     0x7f7ff6c10000    0x10000        0x0  rw- C-PD 
      0x7f7ff6c10000     0x7f7ff6db0000   0x1a0000        0x0  rw- CNPD 
      0x7f7ff6db0000     0x7f7ff6dc0000    0x10000        0x0  rw- C-PD 
      0x7f7ff6dc0000     0x7f7ff7000000   0x240000        0x0  rw- CNPD 
      0x7f7ff7000000     0x7f7ff7010000    0x10000        0x0  rw- C-PD 
      0x7f7ff7010000     0x7f7ff7200000   0x1f0000        0x0  rw- CNPD 
      0x7f7ff7200000     0x7f7ff7260000    0x60000        0x0  r-x CNPD /lib/libc.so.12.215
      0x7f7ff7260000     0x7f7ff7270000    0x10000    0x60000  r-x C-PD /lib/libc.so.12.215
      0x7f7ff7270000     0x7f7ff73c6000   0x156000    0x70000  r-x CNPD /lib/libc.so.12.215
      0x7f7ff73c6000     0x7f7ff75c6000   0x200000   0x1c6000  --- CNPD /lib/libc.so.12.215
      0x7f7ff75c6000     0x7f7ff75d1000     0xb000   0x1c6000  r-- C-PD /lib/libc.so.12.215
      0x7f7ff75d1000     0x7f7ff75d7000     0x6000   0x1d1000  rw- C-PD /lib/libc.so.12.215
      0x7f7ff75d7000     0x7f7ff75f0000    0x19000        0x0  rw- C-PD 
      0x7f7ff75f0000     0x7f7ff76e0000    0xf0000        0x0  rw- CNPD 
      0x7f7ff76e0000     0x7f7ff76f0000    0x10000        0x0  rw- C-PD 
      0x7f7ff76f0000     0x7f7ff77e0000    0xf0000        0x0  rw- CNPD 
      0x7f7ff77e0000     0x7f7ff77f8000    0x18000        0x0  rw- C-PD 
      0x7f7ff7800000     0x7f7ff780e000     0xe000        0x0  r-x CNPD /lib/libterminfo.so.2.0
      0x7f7ff780e000     0x7f7ff7a0d000   0x1ff000     0xe000  --- CNPD /lib/libterminfo.so.2.0
      0x7f7ff7a0d000     0x7f7ff7a0e000     0x1000     0xd000  r-- C-PD /lib/libterminfo.so.2.0
      0x7f7ff7a0e000     0x7f7ff7a0f000     0x1000     0xe000  rw- C-PD /lib/libterminfo.so.2.0
      0x7f7ff7c00000     0x7f7ff7c0f000     0xf000        0x0  r-x C-PD /libexec/ld.elf_so
      0x7f7ff7c0f000     0x7f7ff7e0f000   0x200000        0x0  --- CNPD 
      0x7f7ff7e0f000     0x7f7ff7e10000     0x1000     0xf000  rw- C-PD /libexec/ld.elf_so
      0x7f7ff7e10000     0x7f7ff7e11000     0x1000        0x0  rw- C-PD 
      0x7f7ff7eed000     0x7f7ff7eff000    0x12000        0x0  rw- C-PD 
      0x7f7ff7eff000     0x7f7fffbff000  0x7d00000        0x0  --- CNPD 
      0x7f7fffbff000     0x7f7fffff0000   0x3f1000        0x0  rw- CNPD 
      0x7f7fffff0000     0x7f7ffffff000     0xf000        0x0  rw- C-PD 
Name: cal
State: STOP
Parent process: 11837
Process group: 26015
Session id: 15656
TTY: 1288
TTY owner process group: 11837
User IDs (real, effective, saved): 1000 1000 1000
Group IDs (real, effective, saved): 100 100 100
Groups: 100 0 5
Minor faults (no memory page): 292
Major faults (memory page faults): 0
utime: 0.000000
stime: 0.003510
utime+stime, children: 0.000000
'nice' value: 20
Start time: 1588600926.724211
Data size: 8 kB
Stack size: 8 kB
Text size: 16 kB
Resident set size: 1264 kB
Maximum RSS: 2000 kB
Pending Signals: 00000000 00000000 00000000 00000000
Ignored Signals: 98488000 00000000 00000000 00000000
Caught Signals: 00000000 00000000 00000000 00000000

Event handling

I've implemented event handling for the following trap types:

single step (TRAP_TRACE)
software breakpoint (TRAP_DBREG)
exec() (TRAP_EXEC)
syscall entry/exit (TRAP_SCE / TRAP_SCX)

While there, I have added proper support for ::wait () and ::resume () methods.

Syscall entry/exit tracing

I've added a support for syscall entry/exit breakpoint traps. There used to be some support for this mode in the basesystem GDB, however we missed a list of mapping of syscall number to syscall name. I've borrowed the script from FreeBSD to generate netbsd.xml with the mapping, based on definitions in /usr/include/sys/syscall.h. This approach of mapping syscalls for NetBSD is imperfect as the internal names are not stable and change whenever a syscall is versioned. There is no ideal approach to handle this (e.g. debugging programs on NetBSD 9.0 and NetBSD 10.0 can behave differently), and end-users will need to deal with it.

Threading events

As threading support is mandatory these days, I have implemented and upstreamed support for thread creation and thread exit events.

Other changes

As in general, I oppose to treating all BSD Operating Systems under the same ifdef in existing software, as it leads to conditionals like #if defined(AllBSDs) && !defined(ThisBSD) and/or a spaghetti of #define symbol renames. There was an attempt to define support for all BSDs in a shared file inf-ptrace.c, however in the end developers pushing for that reimplemented support part of the code for their kernels in private per-OS files. This left a lot of shared, sometimes unneeded code in the common layer.

I was kind to fix the build of OpenBSD support in GDB (and only build-tested) and moved OpenBSD-specific code from inf-ptrace.c to obsd-nat.c. Code that was no longer needed for OpenBSD in inf-ptrace.c (as it was reimplemented in obsd-nat.c) and in theory shared with NetBSD was removed.

My intention is to restrict common shared code of GDB to really common parts and whenever kernels differ, implement their specific handling in dedicated files. There are still some hacks left in the GDB shared code (even in inf-ptrace.c) for long removed kernels that obfuscate the support... especially the Gould NP1 support from the 1980s and definitely removed in 2000.

Plan for the next milestone

Finish and upstream operational support of follow-fork, follow-vfork and follow-spawn events. Rewrite the gdbserver support and submit upstream.

Improving libossaudio, and the future of OSS in NetBSD

2020-04-27T13:01:29+00:00

There's two ways user applications can communicate with the kernel audio layer in NetBSD:

audio(4) – the native API, based on the Sun API with a number of very useful NetBSD extensions
ossaudio(3) – a translation layer providing approximate compatibility with OSSv4's ioctls, as also supported in FreeBSD, Solaris, and popular in the past on Linux.

Linux drifted away from OSS and towards ALSA due to licensing disagreements.

Because of this drift, we're seeing increasing problems with OSS adoption today, even if the licensing concerns are no longer relevant, and other implementations of OSS have surpassed the original Linux OSSv3 implementation as far as their feature set and usability are concerned.

So, in NetBSD, it's recommended to use the native API for new code and only rely on the OSS layer for compatibility with existing code.

I spent a while working on third-party software to improve support for native NetBSD audio. These included Firefox, SDL, PortAudio, ffmpeg (working with yhardy@), and more.

However, I've turned my attention to the OSS translation layer. Since a lot of older and less popular software still relies on it, I wanted to go over the OSSv4 specification and iron out surprising differences.

Audacity/PortAudio's OSS usage is strange

I should note that most of these fixes were to enable Audacity to work without patching. Audacity is interesting because it hits a lot of edge cases as far as OSS API usage is concerned. Once I fixed the most notable issues, I made sure Audacity also supported the native API. Writing the necessary PortAudio glue for Sun/NetBSD audio and implementing these fixes took approximately two days.

Incompatibility 1 – SNDCTL_DSP_SPEED

[Out of range sample rates are now handled properly by the OSS layer in NetBSD-current.]

The NetBSD 9 kernel supports sample rates up to 192kHz. Specify anything higher, and NetBSD's audio API returns an error code and keeps the sample rate at its original value, or the legacy default of 8000 Hz (not particularly useful with modern devices).

However, OSS applications expected setting the sample rate to always succeed. The specification states that the actual set sample value may be an approximation and will not always use the exact requested value. So, if the requested value is out of range, NetBSD will now return as if the call succeeded, and set the sample rate to the current configured hardware rate (usually some multiple of 48kHz).

During its startup process, Audacity requested an overly high sample rate of 384kHz. This is well above the maximum supported. I'm still not sure why it does this, because it later configures the audio device to standard CD rate, but it meant that Audacity couldn't properly start without our local patches.

Incompatibility 2 – SNDCTL_DSP_CHANNELS

[Out of range channel numbers are now handled properly by the OSS layer in NetBSD-current.]

This was a very simple fix, similar to that of SNDCTL_DSP_SPEED. The NetBSD kernel supports between 1 and 12 channels for audio playback. Most commonly 1 is mono, 2 is stereo, and higher numbers are used with surround sound systems. The limit of 12 comes from the USB audio device specification.

If an out of range number is specified, libossaudio will now set the channel count to the currently configured number in use at the hardware level.

However, we encounter a more difficult difference between OSS and NetBSD audio when using the audio device in full duplex (recording from and playing back to the same device simultaneously). If your mic is mono and your speakers aren't, how do you set the channel counts to different numbers in OSS? You can't. There is one ioctl for setting both the recording and playback channels. In the native API, this is possible by setting info.record.channels and info.play.channels separately. We should ensure that the recording channels are always duplicated to be same as the number of playback channels.

Incompatibility 3 – SNDCTL_DSP_SETTRIGGER

[NetBSD-current now implements SNDCTL_DSP_SETTRIGGER.]

SNDCTL_DSP_SETTRIGGER is a somewhat more obscure part of the OSS API, in that it's only really useful if you are using poll() or another event notification mechanism on an audio device before performing any I/O, or you're performing I/O via mmap(), neither being particularly common in practice. It has the ability to force initialisation of playback/recording for the device if this isn't already the case.

In terms of the native API, this means that playback/recording becomes unpaused.

Previously in NetBSD, this part of the OSS API wasn't implemented and simply did nothing. However, it became obviously needed due to an incompatible change in NetBSD 9, as discussed on tech-kern.

Basically, we needed recording to be properly triggered without a read() so a few applications using poll() without prior I/O wouldn't block indefinitely.

Incompatibility 4 – SNDCTL_DSP_SETPLAYVOL

OSSv4 has special bits to manipulate the volume of an individual stream in an application while doing all the maths for this inside the kernel.

We don't support this properly yet (but reasonably could)... so code needs to be modified to do the volume manipulation in the application, or the OSSv4 support disabled.

I've only found a couple of applications that try to use this feature (audacious, qmmp). Currently, they're configured to avoid using OSSv4 and layer the audio through SDL or qt-multimedia instead.

I've at least fixed SNDCTL_DSP_GETPLAYVOL to return conforming values. NetBSD audio uses 0-255 for the gain of all channels. OSSv4 uses a range of 0-100 and encodes two channels into an integer, which is very odd in my opinion, and also limits surround sound support.

The future of libossaudio in NetBSD?

Hopefully, after my changes, OSS compatibility is in a much better shape when dealing with unusual parameters and uncommon API usage. The quality of the code also improves – in the process of this work, maxv@ pointed me towards a related information leak in the Linux OSSv3 compatibility layer in the kernel, and I was able to deal with it properly after looking at the OSS specification and Linux headers. All the fixes should be pulled up to 9-stable.

However, I'd personally like to eventually reach a point where we no longer need libossaudio. I've been writing a lot of code towards this goal.

In many cases, the applications relying on it could be easily modified or told to use libao/SDL2/PortAudio/OpenAL/etc instead, which all have native NetBSD audio support.

OSS aside...

We probably need to start thinking about supporting 24-bit PCM in the kernel, since I've found a few audio players that can't handle making the samples 32-bit before writing them to the device. The Sun audio implementation in Solaris has supported this for a long time now.

Wifi renewal restarted

2020-04-08T14:16:37+00:00

Back in 2018, Phil Nelson started a long needed WiFi-refresh, basically syncing our src/sys/net80211/ with the version from FreeBSD. He got a few things working, but then ran out of time and was unable to spend enough time on this to complete the task. I am now taking over this work, Phil hopes to join in again later this summer.

The main idea is to get better SMP (locking) support, support for newer (faster) WiFi standards, and support for virtual access points, while also making future updates (and import of drivers) from FreeBSD easier.

I have collected quite a few WiFi cards and USB dongles, but will ask for help with other hardware I don't have handy later.

I hope to have a working setup and a few working drivers by the end of this months.

Thanks to the NetBSD Foundation for funding this work! Please note that we are looking for donations again, see Fundraising 2020.

P.S.: for the curious: the first drivers I hope to have working are the ones we have already drivers for and I have hardware:

followed by things other developers can convert and have hardware (with me assisting where needed), optionally followed by some of the ones where I have hardware but we have no driver yet:

Realtek (0x2357) 802.11ac WLAN Adapter (0x011e)
(old) pinebook sdio wifi
cubietruck a20 wifi (broadcom 40181?)
GuruPlug wifi: Marvell, 802.11 SDIO (manufacturer 0x2df, product 0x9103)

LLDB work concluded

2020-04-04T20:11:02+00:00

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In February 2019, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, fixing watchpoint and threading support, porting to i386.

March 2020 was the last month of my contract. During it my primary focus was to prepare integration of LLDB into NetBSD's src tree.

LLDB integration

The last important goal for the contract was to include LLDB in the NetBSD src tree. This mainly involved porting LLDB build into NetBSD src tree Makefiles. The resulting patches were sent to the tech-toolchain mailing list: [PATCH 0/7] LLDB import to src.

My proposed integration is based on LLDB tree from 2019-10-29. This matches the LLVM/Clang version currently imported in NetBSD. Newer version can not be used directly due to API incompatibility between the projects, and it is easier to backport LLDB fixes than to fix LLVM API missync.

The backports applied on top of this commit include all my contracted work, plus Kamil Rytarowski's work on LLDB. This also includes necessary fixes to make LLDB build against current NetBSD ptrace() API. Two source files in liblldbHost are renamed to ensure unique filenames within that library, as necessary to build from NetBSD Makefiles without resorting to ugly hacks.

Upstream uses to build individual LLDB components and plugins into split static libraries, then combine them all into a shared liblldb.so library. Both lldb and lldb-server executables link to it. We currently can not follow this model as LLVM and Clang sources are built without -fPIC and therefore are not suitable for shared libraries.

Therefore, we build everything as static libraries instead. This causes the logic that upstream uses to find lldb-server to fail, as it relies on obtaining the library path from the dynamic loader and finding executables relative to it. I have replaced it with hardcoded path to make LLDB work.

The patches are currently waiting for Joerg Sonnenberger to finish LLVM/Clang update that's in progress already.

Pending tasks

The exact list of pending tasks from my contract follows:

Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
Add support for aarch64 target.
Stabilize LLDB and address breaking tests from the test suite.

Notes on backtracing through signal trampoline

I have described the problem of backtracing through signal trampoline in February's report. I haven't managed to finish the work on the topic within the contract but I will try to work on it in my free time.

Most likely, the solution would involve modifying the assembly in lib/libc/arch/*/sys/__sigtramp2.S. As suggested by Andrew Cagney, the CFI directives for amd64 would look like:

NENTRY(__sigtramp_siginfo_2)
    .cfi_startproc
    .cfi_signal_frame
    .cfi_def_cfa r15, 0
    /* offsets from mcontext_t */
    .cfi_offset rax, 0x70
    .cfi_offset rbx, 0x68
    .cfi_offset rcx, 0x18
    .cfi_offset rdx, 0x10
    /* ... */
    .cfi_def_cfa rsp, 8
    movq    %r15,%rdi
    movq    $SYS_setcontext, %rax
    syscall
    movq    $-1,%rdi /* if we return here, something is wrong */
    movq    $SYS_exit, %rax
    syscall
    .cfi_endproc
END(__sigtramp_siginfo_2)

Addressing breaking tests

While the most important functions of LLDB work on NetBSD, there are still many test failures. At this moment, there are 80 instances of @expectedFailureNetBSD decorator and 18 cases of @skipIfNetBSD. The former generally indicates that the test reliably fails on NetBSD and needs a fix, the latter is sometimes used to decorate tests specific to other systems but also to indicate that the test crashes, hangs or otherwise can not be reliably run.

Some tests are failing due to the concurrent signal kernel bug explained in the previous post and covered by XFAIL-ing ATF tests.

New regressions both in LLDB and in LLVM in general appear every month. Most of them are fixed by their authors once we report them. I will continue fighting new bugs in my free time and trying to keep the build bot green.

This work is sponsored by The NetBSD Foundation

The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL to chip in what you can:

https://netbsd.org/donations/#how-to-donate

LLDB now works on i386

2020-02-08T11:24:35+00:00

The original NetBSD port of LLDB was focused on amd64 only. In January, I have extended it to support i386 executables. This includes both 32-bit builds of LLDB (running natively on i386 kernel or via compat32) and debugging 32-bit programs from 64-bit LLDB.

Build bot failure report

I have finished the previous report with indication that upstream broke libc++ builds with gcc. The change in question has been reverted afterwards and recommitted with the necessary fixes.

A test breakage has been caused by adding a clang driver test using env -u. The problem has been resolved by setting the variable to an empty value instead of unsetting it. However, maybe it is time to implement env -u on NetBSD?

Yet another problem was basic_string copy constructor optimization that broke programs at runtime, in particular TableGen. The commit in question has been reverted.

Lastly, adding sigaltstack interception in compiler-rt broke our builds. Missing bits for NetBSD have been added afterwards.

I would like to thank all upstream contributors who are putting an effort to fix their patches to work with NetBSD.

LLDB i386 support

Onto the mysterious UserArea

LLDB uses quite an interesting approach to support reading and writing registers on Linux. It abstracts two register lists for i386 and amd64 respectively. Those lists contain offsets to appropriate fields in data returned by ptrace. When debugging a 32-bit program on amd64, it uses a hybrid. It takes the i386 register list and combines it with offsets specific to amd64 structures. The offsets themselves are not written explicitly but instead established from UserData structure defined in the plugin.

The NetBSD plugin uses a different approach. Rather than using binary offsets, it explicitly accesses appropriate fields in ptrace structures. However, the plugin needs to declare UserData nevertheless in order to fill the offsets in register lists. What are those offsets used for then? That's the first problem I had to answer.

According to LLDB upstream developer Pavel Labath those offsets are additionally used to serialize and deserialize register values in gdb protocol packets. This opened a consideration of improving the protocol-wise compatibility between LLDB and GDB. I'm going to elaborate on this problem separately below. However, the immediate implication was that the precise field order does not matter and can be changed arbitrarily.

My first attempts at reordering the fields to improve GDB compatibility have resulted in new test failures. The offsets must be used for something else as well! After further research, I've realized that our plugin has two register reading/writing interfaces: an interface for operating on a single register, and an interface for reading/writing all registers (in this case, just the general-purpose registers). While the former uses explicit field names/indices, the latter just passes the whole structure as an abstract blob — and apparently the offsets are used to access data in this blob.

This meant that the initial portion of UserData must match the GPR structure as returned by ptrace. However, the remaining registers can be ordered and structured arbitrarily.

Native i386 support

I've decided to follow the ideas used in the Linux plugin. Most importantly, this meant having a single plugin for both 32-bit and 64-bit x86 variants. This is useful because, on one hand, both ptrace interfaces are similar, and on the other, 64-bit debugger uses 64-bit ptrace interface on 32-bit programs. The resulting code uses preprocessor conditions to distinguish between 32-bit and 64-bit API whenever necessary, and debugged program ABI to switch between appropriate register data.

Initially, I've started by implementing a minimal proof-of-concept for 32-bit program on 64-bit debugger support. This way I've aimed to ensure that I won't have to change design in the future in order to support both variants. Once this version started working, I've stashed it and focused on getting native i386 working first.

The result was introducing i386 support in NetBSD Process plugin. The patch added i386 register definitions, and the code to handle them. To reduce code duplication, the functions operate almost exclusively on amd64 constants, and map i386 constants to them when debugging 32-bit programs.

The main differences are in GPR structure for i386/amd64, using PT_GETXMMREGS on i386 (rather than PT_GETFPREGS) and abstracting out debug register constants. The actual floating-point and debug registers are handled via common code.

The second part is improving debugging 32-bit programs on amd64. It adds two features: explicitly recognizing 32-bit executables, and providing 32-bit-alike register context for them. The first part reuses existing LLDB routines in order to read the header of the underlying executable in order to distinguish whether it is 32-bit or 64-bit ELF file. The second part reuses the approach from Linux: takes 32-bit register context, and updates it with 64-bit offsets.

More on LLDB/GDB packet compatibility

I have mentioned the compatibility between LLDB and GDB protocols. In fact, LLDB is using a modified version of the GDB protocol that is only partially compatible with the original. This incompatibility particularly applies to handling registers.

The register packets transmit register values as packet binary data. On NetBSD, the layout used by LLDB is different than the one used by GDB, rendering them incompatible. This incompatibility also means that LLDB cannot be successfully used to connect to other implementations of GDB protocol server, e.g. in qemu.

Both GDB and LLDB support additional abstraction over register packet layout, making it possible to work with a different layout that the default. However, both implement different protocol for exposing those abstractions. LLDB has explicit register layout packet as JSON, while GDB transmits target definition as series of XML files. Ideally, LLDB should grow support for the latter in order to improve its compatibility with different servers.

i386 outside NetBSD

While working on i386 support in NetBSD plugin, I have noticed a number of failing tests that do not seem to be specific to NetBSD. Indeed, upstream indicates that i386 is not actively tested on any platform nowadays.

In order to improve its state a little, I have applied a few small fixes that could be done quickly:

Future plans

I am currently trying to build minimal reproducers for remaining race conditions in concurrent event handling (in particular signal delivery to debugged program).

The remaining tasks in my contract are:

Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
Add support for aarch64 target.
Stabilize LLDB and address breaking tests from the test suite.
Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

https://netbsd.org/donations/#how-to-donate

GSoC 2019 Final Report: Incorporating the memory-hard Argon2 hashing scheme into NetBSD

2020-01-12T15:19:58+00:00

Introduction

We successfully incorporated the Argon2 reference implementation into NetBSD/amd64 for our 2019 Google Summer of Coding project. We introduced our project here and provided some hints on how to select parameters here. For our final report, we will provide an overview of what changes were made to complete the project.

Incorporating the Argon2 Reference Implementation

The Argon2 reference implementation, available here, is available under both the Creative Commons CC0 1.0 and the Apache Public License 2.0. To import the reference implementation into src/external, we chose to use the Apache 2.0 license for this project.

During our initial phase 1, we focused on building the libargon2 library and integrating the functionality into the existing password management framework via libcrypt. Toward this end, we imported the reference implementation and created the "glue" to incorporate the changes into /usr/src/external/apache. The reference implementation is found in

m2$ ls /usr/src/external/apache2/argon2                                                                                    
Makefile dist     lib      usr.bin

The Argon2 reference implementation provides both a library and a binary. We build the libargon2 library to support libcrypt integration, and the argon2(1) binary to provide a userland command-line tool for evaluation. To build the code, we add MKARGON2 to bsd.own.mk

_MKVARS.yes= \
	...
        MKARGON2 \
	...

and add the following conditional build to /usr/src/external/apache2/Makefile

.if (defined(MKARGON2) && ${MKARGON2} != "no")
SUBDIR+= argon2
.endif

After successfully building and installation, we have the following new files and symlinks

/usr/bin/argon2
/usr/lib/libargon2.a
/usr/lib/libargon2.so
/usr/lib/libargon2.so.1
/usr/lib/libargon2.so.1.0

To incorporate Argon2 into the password management framework of NetBSD, we focused on libcrypt. In /usr/src/lib/libcrypt/Makefile, we first check for MKARGON2

.if (defined(MKARGON2) && ${MKARGON2} != "no")
HAVE_ARGON2=1
.endif

If HAVE_ARGON2 is defined and enabled, we append the following to the build flags

.if defined(HAVE_ARGON2)
SRCS+=          crypt-argon2.c
CFLAGS+=        -DHAVE_ARGON2 -I../../external/apache2/argon2/dist/phc-winner
-argon2/include/
LDADD+=         -largon2 
.endif

As hinted above, our most significant addition to libcrypt is the file crypt-argon2.c. This file pulls in the functionality of libargon2 into libcrypt. Changes were also made to pw_gensalt.c to allow for parameter parsing and salt generation.

Having completed the backend support, we pull Argon2 into userland tools, such as pwhash(1), in the same way as above

.if ( defined(MKARGON2) && ${MKARGON2} != "no" )
CPPFLAGS+=      -DHAVE_ARGON2
.endif

Once built, we can specify Argon2 using the '-A' command-line argument to pwhash(1), followed by the Argon2 variant name, and any of the parameterized values specified in argon2(1). See our first blog post for more details. As an example, to generate an argon2id encoding of the password password using default parameters, we can use the following

m2# pwhash -A argon2id password
$argon2id$v=19$m=4096,t=3,p=1$.SJJCiU575MDnA8s$+pjT4JsF2eLNQuLPEyhRA5LCFG
QWAKsksIPl5ewTWNY

To simplify Argon2 password management, we can utilize passwd.conf(5) to apply Argon2 to a specified user or all users. The same parameters are accepted as for argon2(1). For example, to specify argon2i with non-default parameters for user 'testuser', you can use the following in your passwd.conf

m1# grep -A1 testuser /etc/passwd.conf 
testuser:
        localcipher = argon2i,t=6,m=4096,p=1

With the above configuration in place, we are able to support standard password management. For example

m1# passwd testuser
Changing password for testuser.
New Password:
Retype New Password:

m1# grep testuser /etc/master.passwd  
testuser:$argon2i$v=19$m=4096,t=6,p=1$PDd65qr6JU0Pfnpr$8YOMYcwINuKHoxIV8Q0FJHG+
RP82xtmAuGep26brilU:1001:100::0:0::/home/testuser:/sbin/nologin

Testing

The argon2(1) binary allows us to easily validate parameters and encoding. This is most useful during performance testing, see here. With argon2(1), we can specify our parameterized values and evaluate both the resulting encoding and timing.

m2# echo -n password|argon2 somesalt -id -p 3 -m 8
Type:           Argon2id
Iterations:     3
Memory:         256 KiB
Parallelism:    3
Hash:           97f773f68715d27272490d3d2e74a2a9b06a5bca759b71eab7c02be8a453bfb9
Encoded:        $argon2id$v=19$m=256,t=3,p=3$c29tZXNhbHQ$l/dz9ocV0nJySQ09LnSiqb
BqW8p1m3Hqt8Ar6KRTv7k
0.000 seconds
Verification ok

We provide one approach to evaluating Argon2 parameter tuning in our second post. In addition to manual testing, we also provide some ATF tests for pwhash, for both hashing and verification. These tests are focus on encoding correctness, matching known encodings to test results during execution.

/usr/src/tests/usr.bin/argon2

tp: t_argon2_v10_hash
tp: t_argon2_v10_verify
tp: t_argon2_v13_hash
tp: t_argon2_v13_verify


cd /usr/src/tests/usr.bin/argon2
atf-run

info: atf.version, Automated Testing Framework 0.20 (atf-0.20)
info: tests.root, /usr/src/tests/usr.bin/argon2

..

tc-so:Executing command [ /bin/sh -c echo -n password | \
argon2 somesalt -v 13 -t 2 -m 8 -p 1 -r ]
tc-end: 1567497383.571791, argon2_v13_t2_m8_p1, passed

...

Conclusion

We have successfully integrated Argon2 into NetBSD using the native build framework. We have extended existing functionality to support local password management using Argon2 encoding. We are able to tune Argon2 so that we can achieve reasonable performance on NetBSD. In this final post, we summarize the work done to incorporate the reference implementation into NetBSD and how to use it. We hope you can use the work completed during this project. Thank you for the opportunity to participate in the Google Summer of Code 2019 and the NetBSD project!

Clang build bot now uses two-stage builds, and other LLVM/LLDB news

2019-12-12T13:50:07+00:00

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, and fixing watchpoint support. In October 2019, I've finished my work on threading support (pending pushes) and fought issues related to upgrade to NetBSD 9.

November was focused on finally pushing the aforementioned patches and major buildbot changes. Notably, I was working on extending the test runs to compiler-rt which required revisiting past driver issues, as well as resolving new ones. More details on this below.

LLDB changes

Test updates, minor fixes

The previous month has left us with a few regressions caused by the kernel upgrade. I've done my best to figure out those I could reasonably fast; for the remaining ones Kamil suggested that I mark them XFAIL for now and revisit them later while addressing broken tests. This is what I did.

While implementing additional tests in the threading patches, I've discovered that the subset of LLDB tests dedicated to testing lldb-server behavior was disabled on NetBSD. I've reenabled lldb-server tests and marked failing tests appropriately.

After enabling and fixing those tests, I've implemented missing support in the NetBSD plugin for getting thread name.

I've also switched our process plugin to use the newer PT_STOP request over calling kill(). The main advantage of PT_STOP is that it reliably notifies about SIGSTOP via wait() even if the process is stopped already.

I've been able to reenable EOF detection test that was previously disabled due to bugs in the old versions of NetBSD 8 kernel.

Threading support pushed

After satisfying the last upstream requests, I was able to merge the three threading support patches:

This fixed 43 tests. It also triggered some flaky tests and a known regression and I'm planning to address them as the part of final bug cracking.

Build bot redesign

Recap of the problems

The tests of clang runtime components (compiler-rt, openmp) are performed using freshly built clang. This version of clang attempts to build and link C++ programs with libc++. However, our clang driver naturally requires system installation of libc++ — after all, we normally don't want the driver to include temporary build paths for regular executables! For this reason, building against fresh libc++ in build tree requires appropriate -cxx-isystem, -L and -Wl,-rpath flags.

So far, we managed to resolve this via using existing mechanisms to add additional flags to the test compiler calls. However, the existing solutions do not seem to suffice for compiler-rt. While technically I could work on adding more support code for that, I've decided it's better to look for a more general and permanent solution.

Two-stage builds

As part of the solution, I've proposed to switch our build bot to a two-stage build model. That is, firstly we're using the system GCC version to build a minimal functioning clang. Then, we're using this newly-built clang to build the whole LLVM suite, including another copy of clang.

The main advantage of this model is that we're verifying whether clang is capable of building a working copy of itself. Additionally, it insulates us against problems with host GCC. For example, we've experienced issues with GCC 8 and the default -O3. On the negative side, it increases build time significantly, especially that the second stage needs to be rebuilt from scratch every time.

A common practice in compiler world is to actually do three stages. In this case, it would mean building minimal clang with host compiler, then second stage with first stage clang, then third stage using second stage's clang. This would have the additional benefit of verifying that clang is capable of building a compiler that's fully capable of building itself. However, this seems to have little actual gain for us while it would increase the build time even more.

Compiler wrappers

Another interesting side effect of using the two-stage build model is that it proves an opportunity of injecting wrappers over clang and clang++ built in the first stage. Those wrappers allows us to add necessary -I, -L and -Wl,-rpath arguments without having to patch the driver for this special case.

Furthermore, I've used this opportunity to add experimental LLD usage to the first stage, and use it instead of GNU ld for the second stage. The LLVM linker has a significantly smaller memory footprint and therefore allows us to improve build efficiency. Sadly, proper LLD support for NetBSD still depends on patches that are waiting for upstream review.

Compiler-rt status and tests

The builds of compiler-rt have been reenabled for the build bot. I am planning to start enabling individual test groups (e.g. builtins, ASAN, MSAN, etc.) as I get them to work. However, there are still other problems to be resolved before that happens.

Firstly, there are new test regressions. Some of them seem to be specifically related to build layout changes, or to use of LLD as linker. I am currently investigating them.

Secondly, compiler-rt tests aim to test all supported multilib targets by default. We are currently preparing to enable compat32 in the kernel on the host running build bot and therefore achieve proper multilib suppor for running them.

Thirdly, ASAN, MSAN and TSAN are incompatible with ASLR (address space layout randomization) that is enabled by default on NetBSD. Furthermore, XRay is incompatible with W^X restriction.

Making tests work with PaX features

Previously, we've already addressed the ASLR incompatibility by adding an explicit check for it and bailing out if it's enabled. However, while this somehow resolves the problem for regular users, it means that the relevant tests can't be run on hosts having ASLR enabled.

Kamil suggested that we should use paxctl to disable ASLR per-executable here. This has the obvious advantage that it enables the tests to work on all hosts. However, it required injecting the paxctl invocation between the build and run step in relevant tests.

The ‘obvious’ solution to this problem would be to add a kind of %paxctl_aslr substitution that evaluates to paxctl call on NetBSD, and to : (no-op) on other systems. However, this required updating all the relevant tests and making sure that the invocation keeps being included in new tests.

Instead, I've noticed that the %run substitution is already using various kinds of wrappers for other targets, e.g. to run tests via an emulator. I went for a more agreeable solution of substituting %run in appropriate test suites with a tiny wrapper calling paxctl before executing the test.

Clang/LLD dependent libraries feature

Introduction to the feature

Enabling the two stage builds had also another side effect. Since stage 2 build is done via clang+LLD, a newly added feature of dependent libraries got enabled and broke our build.

Dependent libraries are a feature permitting source files to specify additional libraries that are afterwards injected into linker's invocation. This is done via a #pragma originally used by MSVC. Consider the following example:

#include <stdio.h>
#include <math.h>
#pragma comment(lib, "m")

int main() {
    printf("%f\n", pow(2, 4.3));
    return 0;
}

When the source file is compiled using Clang on an ELF target, the lib comments are converted into .deplibs object section:

$ llvm-readobj -a --section-data test.o
[...]
  Section {
    Index: 6
    Name: .deplibs (25)
    Type: SHT_LLVM_DEPENDENT_LIBRARIES (0x6FFF4C04)
    Flags [ (0x30)
      SHF_MERGE (0x10)
      SHF_STRINGS (0x20)
    ]
    Address: 0x0
    Offset: 0x94
    Size: 2
    Link: 0
    Info: 0
    AddressAlignment: 1
    EntrySize: 1
    SectionData (
      0000: 6D00                                 |m.|
    )
  }
[...]

When the objects are linked into a final executable using LLD, it collects all libraries from .deplibs sections and links to the specified libraries.

The example program pasted above would have to be built on systems requiring explicit -lm (e.g. Linux) via:

$(CC) ... test.c -lm

However, when using Clang+LLD, it is sufficient to call:

clang -fuse-ld=lld ... test.c

and the library is included automatically. Of course, this normally makes little sense because you have to maintain compatibility with other compilers and linkers, as well as old versions of Clang and LLD.

Use of LLVM to approach static library dependency problem

LLVM started using the deplibs feature internally in D62090 in order to specify linkage between runtimes and their dependent libraries. Apparently, the goal was to provide an in-house solution to the static library dependency problem.

The problem discussed is that static libraries on Unix-derived platforms are primitive archives containing object files. Unlike shared libraries, they do not contain lists of other libraries they depend on. As a result, when linking against a static library, the user needs to explicitly pass all the dependent libraries to the linker invocation.

Over years, a number of workarounds were proposed to relieve the user (or build system) from having to know the exact dependencies of the static libraries used. A few worth noting include:

libtool archives (.la) used by libtool as generic wrappers over shared and static libraries,
library-specific *-config programs and pkg-config files, providing options for build systems to utilize,
GNU ld scripts that can be used in place of libraries to alter linker's behavior.

The first two solutions work at build system level, and therefore are portable to different compilers and linkers. The third one requires linker support but have been used successfully to some degree due to wide deployment of GNU binutils, as well as support in other linkers (e.g. LLD).

Dependent libraries provide yet another attempt to solve the same problem. Unlike the listed approaches, it is practically transparent to the static library format — at the cost of requiring both compiler and linker support. However, since the runtimes are normally supposed to be used by Clang itself, at least the first of the points can be normally assumed to be satisfied.

Why it broke NetBSD?

After all the lengthy introduction, let's get to the point. As a result of my changes, the second stage is now built using Clang/LLD. However, it seems that the original change making use of deplibs in runtimes was tested only on Linux — and it caused failures for us since it implicitly appended libraries not present on NetBSD.

Over time, users of a few other systems have added various #ifdefs in order to exclude Linux-specific libraries from their systems. However, this solution is hardly optimal. It requires us to maintain two disjoint sets of rules for adding each library — one in CMake for linking of shared libraries, and another one in the source files for emitting dependent libraries.

Since dependent libraries pragmas are present only in source files and not headers, I went for a different approach. Instead of using a second set of rules to decide which libraries to link, I've exported the results of CMake checks into -D flags, and made dependent libraries conditional on CMake check results.

Firstly, I've fixed deplibs in libunwind in order to fix builds on NetBSD. Afterwards, per upstream's request I've extended the deplibs fix to libc++ and libc++abi.

Future plans

I am currently still working on fixing regressions after the switch to two-stage build. As things develop, I am also planning to enable further test suites there.

Furthermore, I am planning to continue with the items from the original LLDB plan. Those are:

Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
Add support for i386 and aarch64 targets.
Stabilize LLDB and address breaking tests from the test suite.
Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

https://netbsd.org/donations/#how-to-donate

Debugging FFS Mount Failures

2019-11-27T18:17:02+00:00

This report was written by Maciej Grochowski as a part of developing the AFL+KCOV project.

This report is a continuation of my previous work on Fuzzing Filesystems via AFL. You can find previous posts where I described the fuzzing (part1, part2) or my EuroBSDcon presentation.
In this part, we won't talk too much about fuzzing itself but I want to describe the process of finding root causes of File system issues and my recent work trying to improve this process.
This story begins with a mount issue that I found during my very first run of the AFL, and I presented it during my talk on EuroBSDcon in Lillehammer.

Invisible Mount point

afl-fuzz: /dev/vnd0: opendisk: Device busy That was the first error that I saw on my setup after couple of seconds of AFL run.
I was not sure what exactly was the problem and thought that mount wrapper might cause a problem.
Although after a long troubleshooting session I realized that this might be my first found issue.
To give the reader a better understanding of the problem without digging too deeply into fuzzer setup or mount process.
Let's assume that we have some broken file system image exposed as a block device visible as a /dev/wd1a.

The device can be easily mounted on mount point mnt1, however when we try to unmount it we get an error: error: ls: /mnt1: No such file or directory, and if we try to use raw system call unmount(2) it also end up with the similar error.

However, we can see clearly that the mount point exists with the mount command:

# mount
/dev/wd0a on / type ffs(local)
...
tmpfson /var/shmtype tmpfs(local)
/dev/vnd0 on /mnt1 type ffs(local)

Thust any lstat(2) based command is trying to convince us that no such directory exists.

# ls / | grep mnt
mnt
mnt1

# ls -alh /mnt1
ls: /mnt1: No such file or directory
# stat /mnt1
stat: /mnt1: lstat: No such file or directory

To understand what is happening we need to dig a little bit deeper than with standard bash tools.
First of all mnt1 is a folder created on the root partition at a local filesystem so getdents(2) or dirent(3) should show it as a entry inside dentry structure on the disk.
Raw getdents syscall is great tool for checking directory content because it reads the data from the directory structure on disk.

# ./getdents  /
|inode_nr|rec_len|file_type|name_len(name)|
#:   2,      16,    IFDIR,       1 (.)
#:   2,      16,    IFDIR,       2 (..)
#:   5,      24,    IFREG,       6 (.cshrc)
#:   6,      24,    IFREG,       8 (.profile)
#:   7,      24,    IFREG,       8 (boot.cfg)
#: 3574272,  24,    IFDIR,       3 (etc)
...
#: 3872128,  24,    IFDIR,       3 (mnt)
#: 5315584,  24,    IFDIR,       4 (mnt1)

Getdentries confirms that we have mnt1 as a directory inside the root of our system fs.
But, we cannot execute lstat, unmount or any other system-call that require a path to this file.
A quick look on definitions of these system calls show their structure:

unmount(const char *dir, int flags);
stat(const char *path, struct stat *sb);
lstat(const char *path, struct stat *sb);
open(const char *path, int flags, ...);

All of these function take as an argument path to the file, which as we know will endup in vfs lookup.
How about something that uses filedescryptor? Can we even obtain it?
As we saw earlier running open(2) on path also returns EACCES.
Looks like without digging inside VFS lookup we will not be able to understand the issue.

Get Filesystem Root

After some debugging and code walk I found the place that caused error.
VFS during the name resolution needs to check and switch FS in case of embedded mount points.
After the new filesystem is found VFS_ROOT is issued on that particular mount point.
VFS_ROOT is translated in case of FFS to the ufs_root which calls vcache with fixed value equal to the inode number of root inode which is 2 for UFS.

#define UFS_ROOTINO     ((ino_t)2)

Below listning with the code of ufs_root from ufs/ufs/ufs_vfsops.c.

int
ufs_root(struct mount *mp, struct vnode **vpp)
{
...
        if ((error = VFS_VGET(mp, (ino_t)UFS_ROOTINO, &nvp)) != 0)
               return (error);

By using the debugger, I was able to make sure that the entry with number 2 after hashing does not exist in the vcache.
As a next step, I wanted to check the Root inode on the given filesystem image.
Filesystem debuggers are good tools to do such checks. NetBSD comes with FSDB which is general-purpose filesystem debugger.
Nonetheless, by default FSDB links against fsck_ffs which makes it tied to the FFS.

Filesystem Debugger for the help!

Filesystem debugger is a tool designed to browse on-disk structure and values of particular entries. It helps in understanding the Filesystems issues by giving particular values that the system reads from the disk. Unfortunately, current fsdb_ffs is a bit limited in the amount of information that it exposes.
Example output of trying to browse damaged root inode on corrupted FS.

# fsdb -dnF -f ./filesystem.out

** ./filesystem.out (NO WRITE)
superblock mismatches
...
BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE                                     
clean = 0
isappleufs = 0, dirblksiz = 512
Editing file system `./filesystem.out'
Last Mounted on /mnt
current inode 2: unallocated inode

fsdb (inum: 2)> print
command `print
'
current inode 2: unallocated inode

FSDB Plugin: Print Formatted

Fortunately, fsdb_ffs leaves all necessary interfaces to allows accessing this data with small effort.
I implemented a simple plugin that allows browsing all values inside: inodes, superblock and cylinder groups on FFS. There are still a couple of todos that have to be finished, but the current version allows us to review inodes.

fsdb (inum: 2)> pf inode number=2 format=ufs1
command `pf inode number=2 format=ufs1
'
Disk format ufs1inode 2 block: 512
 ---------------------------- 
di_mode: 0x0                    di_nlink: 0x0
di_size: 0x0                    di_atime: 0x0
di_atimensec: 0x0               di_mtime: 0x0
di_mtimensec: 0x0               di_ctime: 0x0
di_ctimensec: 0x0               di_flags: 0x0
di_blocks: 0x0                  di_gen: 0x6c3122e2
di_uid: 0x0                     di_gid: 0x0
di_modrev: 0x0
 --- inode.di_oldids ---

We can see that the Filesystem image got wiped out most of the root inode fields.
For comparison, if we will take a look at root inode from freshly created FS we will see the proper structure.
Based on that we can quickly realize that fields: di_mode, di_nlink, di_size, di_blocks are different and can be the root cause.

Disk format ufs1 inode: 2 block: 512
 ---------------------------- 
di_mode: 0x41ed                 di_nlink: 0x2
di_size: 0x200                  di_atime: 0x0
di_atimensec: 0x0               di_mtime: 0x0
di_mtimensec: 0x0               di_ctime: 0x0
di_ctimensec: 0x0               di_flags: 0x0
di_blocks: 0x1                  di_gen: 0x68881d2c
di_uid: 0x0                     di_gid: 0x0
di_modrev: 0x0
 --- inode.di_oldids ---

From FSDB and incore to source code

First we will summarize what we already know:

unmount fails in namei operation failure due to the corrupted FS
Filesystem has corrupted root inode
Corrupted root inode has fields: di_mode, di_nlink, di_size, di_blocks set to zero

Now we can find a place where inodes are loaded from the disk, this function for FFS is ffs_init_vnode(ump, vp, ino);.
This function is called during the loading vnode in vfs layer inside ffs_loadvnode.
Quick walkthrough through ffs_loadvnode expose the usage of the field i_mode:

         error = ffs_init_vnode(ump, vp, ino);                                                                                                                                                                                     
         if (error)                                                                                                                                                                                                                
                return error;                                                                                                                                                                                                     
                                                                                                                                                                                                                                   
         ip = VTOI(vp);                                                                                                                                                                                                            
         if (ip->i_mode == 0) {                                                                                                                                                                                                    
                 ffs_deinit_vnode(ump, vp);                                                                                                                                                                                        
                                                                                                                                                                                                                                   
                 return ENOENT;                                                                                                                                                                                                    
         }

This seems to be a source of our problem. Whenever we are loading inode from disk to obtain the vnode, we validate if i_mode is non zero.
In our case root inode is wiped out, what results that vnode is dropped and an error returned.
So simply we cannot load any inode with i_mode set to the zero, inode number 2 called root is no different here. Due to that the VFS_LOADVNODE operation always fails, so lookup does and name resolution will return ENOENT error. To fix this issue we need a root inode validation on mount step, I created such validation and tested against corrupted filesystem image.
The mount return error, which proved the observation that such validation would help.

Conclusions

The following post is a continuation of the project: "Fuzzing Filesystems with kcov and AFL".
I presented how fuzzed bugs, which do not always show up as system panics, can be analyzed, and what tools a programmer can use.
Above the investigation described the very first bug that I found by fuzzing mount(2) with Afl+kcov.
During that root cause analysis, I realized the need for better tools for debugging Filesystem related issues.
Because of that reason, I added small functionality pf (print-formatted) into the fsdb(8), to allow walking through the on-disk structures. The described bug was reported with proposed fix based on validation of the root inode on kern-tech mailing list.

Future work

Tools: I am still progressing with the fuzzing of mount process, however, I do not only focus on the finding bugs but also on tools that can be used for debugging and also doing regression tests. I am planning to add better support for browsing blocks on inode into the fsdb-pf, as well as write functionality that would allow more testing and potential recovery easier.
Fuzzing: In next post, I will show a remote setup of AFL with an example of usage.
I got a suggestion to take a look at FreeBSD UFS security checks on mount(2) done by McKusick. I think is worth it to see what else is validated and we can port to NetBSD FFS.

LLDB Threading support now ready for mainline

2019-11-09T22:01:05+00:00

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues and fixing watchpoint support. Then, I've started working on improving thread support which is taking longer than expected. You can read more about that in my September 2019 report.

So far the number of issues uncovered while enabling proper threading support has stopped me from merging the work-in-progress patches. However, I've finally reached the point where I believe that the current work can be merged and the remaining problems can be resolved afterwards. More on that and other LLVM-related events happening during the last month in this report.

LLVM news and buildbot status update

LLVM switched to git

Probably the most important event to note is that the LLVM project has switched from Subversion to git, and moved their repositories to GitHub. While the original plan provided for maintaining the old repositories as read-only mirrors, as of today this still hasn't been implemented. For this reason, we were forced to quickly switch buildbot to the git monorepo.

The buildbot is operational now, and seems to be handling git correctly. However, it is connected to the staging server for the time being. Its URL changed to http://lab.llvm.org:8014/builders/netbsd-amd64 (i.e. the port from 8011 to 8014).

Monthly regression report

Now for the usual list of 'what they broke this time'.

LLDB has been given a new API for handling files, in particular for passing them to Python scripts. The change of API has caused some 'bad file descriptor' errors, e.g.:

ERROR: test_SBDebugger (TestDefaultConstructorForAPIObjects.APIDefaultConstructorTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/data/motus/netbsd8/netbsd8/llvm/tools/lldb/packages/Python/lldbsuite/test/decorators.py", line 343, in wrapper
    return func(self, *args, **kwargs)
  File "/data/motus/netbsd8/netbsd8/llvm/tools/lldb/packages/Python/lldbsuite/test/python_api/default-constructor/TestDefaultConstructorForAPIObjects.py", line 133, in test_SBDebugger
    sb_debugger.fuzz_obj(obj)
  File "/data/motus/netbsd8/netbsd8/llvm/tools/lldb/packages/Python/lldbsuite/test/python_api/default-constructor/sb_debugger.py", line 13, in fuzz_obj
    obj.SetInputFileHandle(None, True)
  File "/data/motus/netbsd8/netbsd8/build/lib/python2.7/site-packages/lldb/__init__.py", line 3890, in SetInputFileHandle
    self.SetInputFile(SBFile.Create(file, borrow=True))
  File "/data/motus/netbsd8/netbsd8/build/lib/python2.7/site-packages/lldb/__init__.py", line 5418, in Create
    return cls.MakeBorrowed(file)
  File "/data/motus/netbsd8/netbsd8/build/lib/python2.7/site-packages/lldb/__init__.py", line 5379, in MakeBorrowed
    return _lldb.SBFile_MakeBorrowed(BORROWED)
IOError: [Errno 9] Bad file descriptor
Config=x86_64-/data/motus/netbsd8/netbsd8/build/bin/clang-10
----------------------------------------------------------------------

I've been able to determine that the error was produced by flush() method call invoked on a file descriptor referring to stdin. Appropriately, I've fixed the type conversion method not to flush read-only fds.

Afterwards, Lawrence D'Anna was able to find and fix another fflush() issue.

A newly added test revealed that platform process list -v command on NetBSD missed listing the process name. I've fixed it to provide Arg0 in process info.

Another new test failed due to our target not implementing ShellExpandArguments() API. Apparently the only target actually implementing it is Darwin, so I've just marked TestCustomShell XFAIL on all BSD targets.

LLDB upstream was forced to reintroduce readline module override that aims to prevent readline and libedit from being loaded into a single program simultaneously. This module failed to build on NetBSD. I've discovered that the original was meant to be built on Linux only, and since the problem still doesn't affect other platforms, I've made it Linux-only again.

libunwind build has been changed to link using the C compiler rather than C++. This caused some libc++ failures on NetBSD. The author has reverted the change for now, and is looking for a better way of resolving the problem.

Finally, I have disabled another OpenMP test that caused NetBSD to hang. While ideally I'd like to have the underlying kernel problem fixed, this is non-trivial and I prefer to focus on LLDB right now.

New LLD work

I've been asked to rebase my LLD patches for the new code. While doing it, I've finally committed the -z nognustack option patch from January.

In the meantime, Kamil's been working on finally resolving the long-standing impasse on LLD design. He is working on a new NetBSD-specific frontend to LLD that would satisfy our system-wide linker requirements without modifying the standard driver used by other platforms.

Upgrade to NetBSD 9 beta

Our recent work, especially the work on threading support has required a number of fixes in the NetBSD kernel. Those fixes were backported to NetBSD 9 branch but not to 8. The 8 kernel used by the buildbot was therefore suboptimal for testing new features. Furthermore, with the 9.0 release coming soon-ish, it became necessary to start actively testing it for regressions.

The buildbot has been upgraded to NetBSD 9 beta on 2019-11-06. Initially, the upgrade has caused LLDB to start crashing on startup. I have not been able to pinpoint the exact issue yet. However, I've established that it happens with -O3 optimization level only, and I've worked it around by switching the build to -O2. I am planning to look into the problem more once the buildbot is restored fully.

The upgrade to nb9 has caused 4 LLDB tests to start succeeding, and 6 to start failing. Namely:

********************
Unexpected Passing Tests (4):
    lldb-api :: commands/watchpoints/watchpoint_commands/condition/TestWatchpointConditionCmd.py
    lldb-api :: commands/watchpoints/watchpoint_commands/command/TestWatchpointCommandPython.py
    lldb-api :: lang/c/bitfields/TestBitfields.py
    lldb-api :: commands/watchpoints/watchpoint_commands/command/TestWatchpointCommandLLDB.py

********************
Failing Tests (6):
    lldb-shell :: Reproducer/Functionalities/TestExpressionEvaluation.test
    lldb-api :: commands/expression/call-restarts/TestCallThatRestarts.py
    lldb-api :: functionalities/signal/handle-segv/TestHandleSegv.py
    lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/StandardStartupTest.TestStopReplyContainsThreadPcs
    lldb-api :: functionalities/inferior-crashing/TestInferiorCrashingStep.py
    lldb-api :: functionalities/signal/TestSendSignal.py

I am going to start investigating the new failures shortly.

Further LLDB threading work

Fixes to register support

Enabling thread support revealed a problem in register API introspection specific to NetBSD. The API responsible for passing registers in groups to Python was unable to name some of the groups on NetBSD, and the null names have caused the TestRegistersIterator to fail. Threading support made this specifically visible by replacing a regular test failure with Python code error.

In order to resolve the problem, I had to describe all supported register sets in NetBSD register context. The code was roughly based on the Linux equivalent, modified to match register sets used by our ptrace() API. Interestingly, I had to also include MPX registers that are currently unimplemented, as otherwise LLDB implicitly put them in an anonymous group.

While at it, I've also changed the register set numbering to match the more common ordering, in order to avoid issues in the future.

Finished basic thread support patch

I've finally completed and submitted the patch for NetBSD thread support. Besides fixing a few mistakes, I've implemented thread affinity support for all relevant SIGTRAP events (breakpoints, traces, hardware watchpoints) and removed incomplete hardware breakpoint stub that caused LLDB to crash.

In its current form, this patch combines three changes essential to correct support of threaded programs:

It enables reporting of new and exited threads, and maintains debugged thread list based on that.
It modifies the signal (generic and SIGTRAP) handling functions to read the thread identifier and associate the event with correct thread(s). Previously, all events were assigned to all threads.
It updates the process resuming function to support controlling the state (running, single-stepping, stopped) of individual threads, and raising a signal either to the whole process or to a single thread. Previously, the code used only the requested action for the first thread and populated it to all threads in the process.

Proper watchpoint support in multi-threaded programs

I've submitted a separate patch to copy watchpoints to newly-created threads. This is necessary due to the design of Debug Register support in NetBSD. Quoting the ptrace(2) manpage:

debug registers are only per-LWP, not per-process globally

debug registers must not be inherited after (v)forking a process

debug registers must not be inherited after forking a thread

a debugger is responsible to set global watchpoints/breakpoints with the debug registers, to achieve this PTRACE_LWP_CREATE / PTRACE_LWP_EXIT event monitoring function is designed to be used

LLDB supports per-process watchpoints only at the moment. To fit this into NetBSD model, we need to monitor new threads and copy watchpoints to them. Since LLDB does not keep explicit watchpoint information at the moment (it relies on querying debug registers), the proposed implementation verbosely copies dbregs from the currently selected thread (all existing threads should have the same dbregs).

Fixed support for concurrent watchpoint triggers

The final problem I've been investigating was a server crash with the new code when multiple watchpoints were triggered concurrently. My final patch aims to fix handling concurrent watchpoint events.

When a watchpoint is triggered, the kernel delivers SIGTRAP with TRAP_DBREG to the debugger. The debugger investigates DR6 register of the specified thread in order to determine which watchpoint was triggered, and reports it. When multiple watchpoints are triggered simultaneously, the kernel reports that as series of successive SIGTRAPs. Normally, that works just fine.

However, on x86 watchpoint triggers are reported before the instruction is executed. For this reason, LLDB temporarily disables the breakpoint, single-steps and reenables it. The problem with that is that the GDB protocol doesn't control watchpoints per thread, so the operation disables and reenables the watchpoint on all threads. As a side effect of this, DR6 is cleared everywhere.

Now, if multiple watchpoints were triggered concurrently, DR6 is set on all relevant threads. However, after handling SIGTRAP on the first one, the disable/reenable (or more specifically, remove/readd) wipes DR6 on all threads. The handler for next SIGTRAP can't establish the correct watchpoint number, and starts looking for breakpoints. Since hardware breakpoints are not implemented, the relevant method returns an error and lldb-server eventually exits.

There are two problems to be solved there. Firstly, lldb-server should not exit in this circumstances. This is already solved in the first patch as mentioned above. Secondly, we need to be able to handle concurrent watchpoint hits independently of the clear/set packets. This is solved by this patch.

There are multiple different approaches to this problem. I've chosen to remodel clear/set watchpoint method in order to prevent it from resetting DR6 if the same watchpoint is being restored, as the alternatives (such as pre-storing DR6 on the first SIGTRAP) have more corner conditions to be concerned about.

The current design of these two methods assumes that the 'clear' method clears both the triggered state in DR6 and control bits in DR7, while the 'set' method sets the address in DR0..3, and the control bits in DR7.

The new design limits the 'clear' method to disabling the watchpoint by clearing the enable bit in DR7. The remaining bits, as well as trigger status and address are preserved. The 'set' method uses them to determine whether a new watchpoint is being set, or the previous one merely reenabled. In the latter case, it just updates DR7, while preserving the previous trigger. In the former, it updates all registers and clears the trigger from DR6.

This solution effectively prevents the disable/reenable logic of LLDB from clearing concurrent watchpoint hits, and therefore makes it possible for the SIGTRAP handler to report them correctly. If the user manually replaces the watchpoint with another one, DR6 is cleared and LLDB does not associate the concurrent trigger to the watchpoint that no longer exists.

Thread status summary

The current version of the patches fixes approximately 47 test failures, and causes approximately 4 new test failures and 2 hanging tests. There is around 7 new flaky tests, related to signals concurrent with breakpoints or watchpoints.

Future plans

The first immediate goal is to investigate and resolve test suite regressions related to NetBSD 9 upgrade. The second goal is to get the threading patches merged, and simultaneously work on resolving the remaining test failures and hangs.

When that's done, I'd like to finally move on with the remaining TODO items. Those are:

Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
Add support for i386 and aarch64 targets.
Stabilize LLDB and address breaking tests from the test suite.
Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

https://netbsd.org/donations/#how-to-donate

Threading support in LLDB continued

2019-10-05T16:30:45+00:00

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues and fixing watchpoint support. Then, I've started working on improving thread support. You can read more about that in my July 2019 report.

I've been on vacation in August, and in September I've resumed the work on LLDB. I've started by fixing new regressions in LLVM suite, then improved my previous patches and continued debugging test failures and timeouts resulting from my patches.

LLVM 8 and 9 in NetBSD

Updates to LLVM 8 src branch

I have been asked to rebase my llvm8 branch of NetBSD src tree. I've done that, and updated it to LLVM 8.0.1 while at it.

LLVM 9 release

The LLVM 9.0.0 final has been tagged in September. I have been doing the pre-release testing for it, and discovered that the following tests were hanging:

LLVM :: ExecutionEngine/MCJIT/eh-lg-pic.ll
LLVM :: ExecutionEngine/MCJIT/eh.ll
LLVM :: ExecutionEngine/MCJIT/multi-module-eh-a.ll
LLVM :: ExecutionEngine/OrcMCJIT/eh-lg-pic.ll
LLVM :: ExecutionEngine/OrcMCJIT/eh.ll
LLVM :: ExecutionEngine/OrcMCJIT/multi-module-eh-a.ll

I couldn't reproduce the problem with LLVM trunk, so I've instead focused on looking for a fix. I've came to the conclusion that the problem was fixed through adding missing linked library. I've requested backport in bug 43196 and it has been merged in r371042.

I didn't put more effort into figuring out why the lack of this linkage caused issues for us. However, as Lang Hames said on the bug, ‘adding the dependency was the right thing to do’.

LLVM 9 for NetBSD src

Afterwards, I have started working on updating my NetBSD src branch for LLVM 9. However, in middle of that I've been informed that Joerg has already finished doing that independently, so I've stopped.

Furthermore, I was informed that LLVM 9.0.0 will not make it to src, since it still lacks some fixes (most notably, adding a pass to lower is.constant and objectsize intrinsics). Joerg plans to import some revision of the trunk instead.

Buildbot regressions

Initial regressions

The first problem that needed solving was LLDB build failure caused by replacing std::once_flag with llvm::once_flag. I've came to the conclusion that the build fails because the call site in LLDB combined std::call_once with llvm::once_flag. The solution was to replace the former with llvm::call_once.

After fixing the build failure, we had a bunch of test failures on buildbot to address. Kamil helped me and tracked one of them down to a new test for stack exhaustion handling. The test author decided that it ‘is only a best-effort mitigation for the case where things have already gone wrong’, and marked it unsupported on NetBSD.

On the plus side, two of the tests previously failing on NetBSD have been fixed upstream. I've un-XFAIL-ed them appropriately. Five new test failures in LLDB were related to those tests being unconditionally skipped before — I've marked them XFAIL pending further investigation in the future.

Another set of issues was caused by enabling -fvisibility=hidden for libc++ which caused problems when building with GCC. After being pinged, the author decided to enable it only for builds done using clang.

New issues through September

During September, two new issues arose. The first one was my fault, so I'm going to cover it in appropriate section below. The second one was new thread_local test failing. Since it was a newly added test that failed on most of the supported platforms, I've just added NetBSD to the list of failing platforms.

Current buildbot status

After fixing the immediate issues, the buildbot returned to previous status. The majority of tests pass, with one flaky test repeatedly timing out. Normally, I would skip this specific test in order to have buildbot report only fresh failures. However, since it is threading-related I'm waiting to finish my threading update and reassert afterwards.

Furthermore, I have added --shuffle to lit arguments in order to randomize the order in which the tests are run. According to upstream, this reduces the chance of load-intensive tests being run simultaneously and therefore causing timeouts.

The buildbot host seems to have started crashing recently. OpenMP tests were causing similar issues in the past, and I'm currently trying to figure out whether they are the culprit again.

`__has_feature(leak_sanitizer)`

Kamil asked me to implement a feature check for leak sanitizer being used. The __has_feature(leak_sanitizer) preprocessor macro is complementary to __SANITIZE_LEAK__ used in NetBSD gcc and is used to avoid reports when leaks are known but the cost of fixing them exceeds the gain.

Progress in threading support

Fixing LLDB bugs

In the course of previous work, I had a patch for threading support in LLDB partially ready. However, the improvements have also resulted in some of the tests starting to hang. The main focus of my late work as investigating those problems.

The first issue that I've discovered was inconsistency in expressing no signal sent. In some places, LLDB used LLDB_INVALID_SIGNAL (-1) to express that, in others it used 0. So far this went unnoticed since the end result in ptrace calls was the same. However, the reworked NetBSD threading support used explicit PT_SET_SIGINFO which — combined with wrong signal parameter — wiped previously queued signal.

I've fixed C packet handler, then fixed c, vCont and s handlers to use LLDB_INVALID_SIGNAL correctly. However, I've only tested the fixes with my updated thread support, causing regression in the old code. Therefore, I've also had to fix LLDB_INVALID_SIGNAL handling in NetBSD plugin for the time being.

Thread suspend/resume kernel problem

Sadly, further investigation of hanging tests led me to the conclusion that they are caused by kernel bugs. The first bug I've noticed is that PT_SUSPEND/PT_RESUME do not cause the thread to be resumed correctly. I have written the following reproducer for it:

#include <assert.h>
#include <lwp.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>

void* thread_func(void* foo) {
    int i;
    printf("in thread_func, lwp = %d\n", _lwp_self());
    for (i = 0; i < 100; ++i) {
        printf("t2 %d\n", i);
        sleep(2);
    }
    printf("out thread_func\n");
    return NULL;
}

int main() {
    int ret;
    int pid = fork();
    assert(pid != -1);
    if (pid == 0) {
        int i;
        pthread_t t2;

        ret = ptrace(PT_TRACE_ME, 0, NULL, 0);
        assert(ret != -1);
        printf("in main, lwp = %d\n", _lwp_self());
        ret = pthread_create(&t2, NULL, thread_func, NULL);
        assert(ret == 0);
        printf("thread started\n");

        for (i = 0; i < 100; ++i) {
            printf("t1 %d\n", i);
            sleep(2);
        }

        ret = pthread_join(t2, NULL);
        assert(ret == 0);
        printf("thread joined\n");
    }

    sleep(1);
    ret = kill(pid, SIGSTOP);
    assert(ret == 0);
    printf("stopped\n");

    pid_t waited = waitpid(pid, &ret, 0);
    assert(waited == pid);
    printf("wait: %d\n", ret);

    printf("t2 suspend\n");
    ret = ptrace(PT_SUSPEND, pid, NULL, 2);
    assert(ret == 0);
    ret = ptrace(PT_CONTINUE, pid, (void*)1, 0);
    assert(ret == 0);

    sleep(3);
    ret = kill(pid, SIGSTOP);
    assert(ret == 0);
    printf("stopped\n");

    waited = waitpid(pid, &ret, 0);
    assert(waited == pid);
    printf("wait: %d\n", ret);

    printf("t2 resume\n");
    ret = ptrace(PT_RESUME, pid, NULL, 2);
    assert(ret == 0);
    ret = ptrace(PT_CONTINUE, pid, (void*)1, 0);
    assert(ret == 0);

    sleep(5);
    ret = kill(pid, SIGTERM);
    assert(ret == 0);

    waited = waitpid(pid, &ret, 0);
    assert(waited == pid);
    printf("wait: %d\n", ret);

    return 0;
}

The program should run a two-threaded subprocess, with both threads outputting successive numbers. The second thread should be suspended shortly, then resumed. However, currently it does not resume.

I believe that this caused by ptrace_startstop() altering process flags without reimplementing the complete logic as used by lwp_suspend() and lwp_continue(). I've been able to move forward by calling the two latter functions from ptrace_startstop(). However, Kamil has indicated that he'd like to make those routines use separate bits (to distinguish LWPs stopped by process from LWPs stopped by debugger), so I haven't pushed my patch forward.

Multiple thread reporting kernel problem

The second and more important problem is related to how new LWPs are reported to the debugger. Or rather, that they are not reported reliably. When many threads are started by the process in a short time (e.g. in a loop), the debugger receives reports only for some of them.

This can be reproduced using the following program:

#include <assert.h>
#include <lwp.h>
#include <pthread.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>

void* thread_func(void* foo) {
    printf("in thread, lwp = %d\n", _lwp_self());
    sleep(10);
    return NULL;
}

int main() {
    int ret;
    int pid = fork();
    assert(pid != -1);
    if (pid == 0) {
        int i;
        pthread_t t[10];

        ret = ptrace(PT_TRACE_ME, 0, NULL, 0);
        assert(ret != -1);
        printf("in main, lwp = %d\n", _lwp_self());
        raise(SIGSTOP);
        printf("main resumed\n");

        for (i = 0; i < 10; i++) {
            ret = pthread_create(&t[i], NULL, thread_func, NULL);
            assert(ret == 0);
            printf("thread %d started\n", i);
        }

        for (i = 0; i < 10; i++) {
            ret = pthread_join(t[i], NULL);
            assert(ret == 0);
            printf("thread %d joined\n", i);
        }

        return 0;
    }

    pid_t waited = waitpid(pid, &ret, 0);
    assert(waited == pid);
    printf("wait: %d\n", ret);
    assert(WSTOPSIG(ret) == SIGSTOP);

    struct ptrace_event ev;
    ev.pe_set_event = PTRACE_LWP_CREATE | PTRACE_LWP_EXIT;

    ret = ptrace(PT_SET_EVENT_MASK, pid, &ev, sizeof(ev));
    assert(ret == 0);

    ret = ptrace(PT_CONTINUE, pid, (void*)1, 0);
    assert(ret == 0);

    while (1) {
        waited = waitpid(pid, &ret, 0);
        assert(waited == pid);
        printf("wait: %d\n", ret);
        if (WIFSTOPPED(ret)) {
            assert(WSTOPSIG(ret) == SIGTRAP);

            ptrace_siginfo_t info;
            ret = ptrace(PT_GET_SIGINFO, pid, &info, sizeof(info));
            assert(ret == 0);

            struct ptrace_state pst;
            ret = ptrace(PT_GET_PROCESS_STATE, pid, &pst, sizeof(pst));
            assert(ret == 0);
            printf("SIGTRAP: si_code = %d, ev = %d, lwp = %d\n",
                    info.psi_siginfo.si_code, pst.pe_report_event, pst.pe_lwp);

            ret = ptrace(PT_CONTINUE, pid, (void*)1, 0);
            assert(ret == 0);
        } else
            break;
    }

    return 0;
}

The program starts 10 threads, and the debugger should report 10 SIGTRAP events for LWPs being started (ev = 8) and the same number for LWPs exiting (ev = 16). However, initially I've been getting as many as 4 SIGTRAPs, and the remaining 6 threads went unnoticed.

The issue is that do_lwp_create() does not raise SIGTRAP directly but defers that to mi_startlwp() that is called asynchronously as the LWP starts. This means that the former function can return before SIGTRAP is emitted, and the program can start another LWP. Since signals are not properly queued, multiple SIGTRAPs can end up being issued simultaneously and lost.

Kamil has already worked on making simultaneous signal deliver more reliable. However, he reverted his commit as it caused regressions. Nevertheless, applying it made it possible for the test program to get all SIGTRAPs at least most of the time.

The ‘repeated’ SIGTRAPs did not include correct LWP information, though. Kamil has recently fixed that by moving the relevant data from process information to signal information struct. Combined with his earlier patch, this makes my test program pass most of the time (sadly, there seem to be some more race conditions involved).

Summary of threading work

My current work-in-progress patch can be found on Differential as D64647. However, it is currently unsuitable for merging as some tests start failing or hanging as a side effect of the changes. I'd like to try to get as many of them fixed as possible before pushing the changes to trunk, in order to avoid causing harm to the build bot.

The status with the current set of Kamil's work-in-progress patches applied to the kernel includes approximately 4 failing tests and 10 hanging tests.

Other LLVM news

Manikishan Ghantasala has been working on NetBSD-specific clang-format improvements in this year's Google Summer of Code. He is continuing to work on clang-format, and has recently been given commit access to the LLVM project!

Besides NetBSD-specific work, I've been trying to improve a few other areas of LLVM. I've been working on fixing regressions in stand-alone build support and regressions in support for BUILD_SHARED_LIBS=ON builds. I have to admit that while a year ago I was the only person fixing those issues, nowadays I see more contributions submitting patches for breakages specific to those builds.

I have recently worked on fixing bad assumptions in LLDB's Python support. However, it seems that Haibo Huang has taken it from me and is doing a great job.

My most recent endeavor was fixing LLVM_DISTRIBUTION_COMPONENTS support in LLVM projects. This is going to make it possible to precisely fine-tune which components are installed, both in combined tree and stand-alone builds.

Future plans

My first goal right now is to assert what is causing the test host to crash, and restore buildbot stability. Afterwards, I'd like to continue investigating threading problems and provide more reproducers for any kernel issues we may be having. Once this is done, I'd like to finally push my LLDB patch.

Since threading is not the only goal left in the TODO, I may switch between working on it and on the remaining TODO items. Those are:

Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
Add support for i386 and aarch64 targets.
Stabilize LLDB and address breaking tests from the test suite.
Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

https://netbsd.org/donations/#how-to-donate

Adapting TriforceAFL for NetBSD, Part 3

2019-08-26T15:14:54+00:00

Prepared by Akul Pillai as part of GSoC 2019.

This is the third report summarising the work done in the third coding period for the GSoC project of Adapting TriforceAFL for NetBSD kernel syscall fuzzing.
Please also go through the first and second report.

This post also outlines the work done throughout the duration of GSoC, describes the implications of the same and future improvements to come.

Current State

As of now TriforceAFL has been made available in the form of a pkgsrc package (wip/triforceafl). This package allows you to essentially fuzz anything in QEMU’s full system emulation mode using AFL. TriforceNetBSDSyscallFuzzer is built on top of TriforceAFL, specifically to fuzz the NetBSD kernel syscalls. It has also now been made available as wip/triforcenetbsdsyscallfuzzer.
Several minor issues found in the above two packages have now been resolved, and the project restructured.

Issues found include:

The input generators would also test the the generated inputs, causing several syscalls to be executed which messed up permissions of several other directories and lead to other unwanted consequences. Testing of syscalls has now been disabled as the working ones are specified.
Directory specified for the BIOS, VGA BIOS and keymaps for QEMU in the runFuzz script was incorrect. This was because wip/triforceafl did not install the specified directory. The package has now been patched to install the pc-bios directory.
The project was structured in a manner where the host was Linux and the target was NetBSD. Since that is no longer the case and the fuzzer is meant to be run on a NetBSD machine, the project was restructured so that everything is now in one directory and there is no longer a need to move files around.

The packages should now work as intended by following the instructions outlined in the README.

The fuzzer was able to detect a few bugs in the last coding period, details can be found in the last report. During this coding period, the Fuzzer was able to detect 79 unique crashes in a period of 2 weeks running on a single machine. The kernel was built with DEBUG + LOCKDEBUG + DIAGNOSTIC. Work is underway to analyse, report and fix the new bugs and to make the process faster.

With an initial analysis on the outputs on the basis of the syscall that lead to the crash, 6 of the above crashes were unique bugs, the rest were duplicates or slight variants, of which 3 have been previously reported.
Here are the backtraces of the new bugs found (The reproducers were run with kUBSan Enabled):

BUG1:

[ 110.4035826] panic: cv_enter,172: uninitialized lock (lock=0xffffe3c1b9
fc0c50, from=ffffffff81a436e9)
[ 110.4035826] cpu0: Begin traceback...
[ 110.4035826] vpanic() at netbsd:vpanic+0x1fd
[ 110.4035826] snprintf() at netbsd:snprintf
[ 110.4035826] lockdebug_locked() at netbsd:lockdebug_locked+0x45e
[ 110.4035826] cv_timedwait_sig() at netbsd:cv_timedwait_sig+0xe7
[ 110.4035826] lfs_segwait() at netbsd:lfs_segwait+0x6e
[ 110.4035826] sys___lfs_segwait50() at netbsd:sys___lfs_segwait50+0xe2
[ 110.4035826] sys___syscall() at netbsd:sys___syscall+0x121
[ 110.4035826] syscall() at netbsd:syscall+0x1a5
[ 110.4035826] --- syscall (number 198) ---
[ 110.4035826] 40261a:
[ 110.4035826] cpu0: End traceback...
[ 110.4035826] fatal breakpoint trap in supervisor mode
[ 110.4035826] trap type 1 code 0 rip 0xffffffff8021ddf5 cs 0x8 rflags 0x282 cr2
 0x73f454b70000 ilevel 0x8 rsp 0xffff8a0068390d70
[ 110.4035826] curlwp 0xffffe3c1efc556a0 pid 709.1 lowest kstack 0xffff8a006838d
2c0
Stopped in pid 709.1 (driver) at        netbsd:breakpoint+0x5:  leave
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1fd
snprintf() at netbsd:snprintf
lockdebug_locked() at netbsd:lockdebug_locked+0x45e
cv_timedwait_sig() at netbsd:cv_timedwait_sig+0xe7
lfs_segwait() at netbsd:lfs_segwait+0x6e
sys___lfs_segwait50() at netbsd:sys___lfs_segwait50+0xe2
sys___syscall() at netbsd:sys___syscall+0x121
syscall() at netbsd:syscall+0x1a5
--- syscall (number 198) ---
40261a:

BUG2:

[ 161.4877660] panic: LOCKDEBUG: Mutex error: rw_vector_enter,296: spin lock hel
d
[ 161.4877660] cpu0: Begin traceback...
[ 161.4877660] vpanic() at netbsd:vpanic+0x1fd
[ 161.4877660] snprintf() at netbsd:snprintf
[ 161.4877660] lockdebug_abort1() at netbsd:lockdebug_abort1+0x115
[ 161.4877660] rw_enter() at netbsd:rw_enter+0x645
[ 161.4877660] uvm_fault_internal() at netbsd:uvm_fault_internal+0x1c5
[ 161.4877660] trap() at netbsd:trap+0xa71
[ 161.4877660] --- trap (number 6) ---
[ 161.4877660] config_devalloc() at netbsd:config_devalloc+0x644
[ 161.4877660] config_attach_pseudo() at netbsd:config_attach_pseudo+0x1c
[ 161.4877660] vndopen() at netbsd:vndopen+0x1f3
[ 161.4877660] cdev_open() at netbsd:cdev_open+0x12d
[ 161.4877660] spec_open() at netbsd:spec_open+0x2d0
[ 161.4877660] VOP_OPEN() at netbsd:VOP_OPEN+0xba
[ 161.4877660] vn_open() at netbsd:vn_open+0x434
[ 161.4877660] sys_ktrace() at netbsd:sys_ktrace+0x1ec
[ 161.4877660] sys___syscall() at netbsd:sys___syscall+0x121
[ 161.4877660] syscall() at netbsd:syscall+0x1a5
[ 161.4877660] --- syscall (number 198) ---
[ 161.4877660] 40261a:
[ 161.4877660] cpu0: End traceback...
[ 161.4877660] fatal breakpoint trap in supervisor mode
[ 161.4877660] trap type 1 code 0 rip 0xffffffff8021ddf5 cs 0x8 rflags 0x286 cr2
 0xfffffffffffff800 ilevel 0x8 rsp 0xffff9c80683cd4f0
[ 161.4877660] curlwp 0xfffffcbcda7d36a0 pid 41.1 lowest kstack 0xffff9c80683ca2
c0
db{0}> bt
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1fd
snprintf() at netbsd:snprintf
lockdebug_abort1() at netbsd:lockdebug_abort1+0x115
rw_enter() at netbsd:rw_enter+0x645
uvm_fault_internal() at netbsd:uvm_fault_internal+0x1c5
trap() at netbsd:trap+0xa71
--- trap (number 6) ---
config_devalloc() at netbsd:config_devalloc+0x644
config_attach_pseudo() at netbsd:config_attach_pseudo+0x1c
vndopen() at netbsd:vndopen+0x1f3
cdev_open() at netbsd:cdev_open+0x12d
spec_open() at netbsd:spec_open+0x2d0
VOP_OPEN() at netbsd:VOP_OPEN+0xba
vn_open() at netbsd:vn_open+0x434
sys_ktrace() at netbsd:sys_ktrace+0x1ec
sys___syscall() at netbsd:sys___syscall+0x121
syscall() at netbsd:syscall+0x1a5
--- syscall (number 198) ---
40261a:

BUG3:

[ 350.9942146] UBSan: Undefined Behavior in /home/ubuntu/triforce/kernel/
src/sys/kern/kern_ktrace.c:1398:2, member access within misaligned address 0x2b0
000002a for type 'struct ktr_desc' which requires 8 byte alignment
[ 351.0025346] uvm_fault(0xffffffff85b73100, 0x2b00000000, 1) -> e
[ 351.0025346] fatal page fault in supervisor mode
[ 351.0025346] trap type 6 code 0 rip 0xffffffff81b9dbf9 cs 0x8 rflags 0x286 cr2
 0x2b00000032 ilevel 0 rsp 0xffff8780684d7fb0
[ 351.0025346] curlwp 0xffffa992128116e0 pid 0.54 lowest kstack 0xffff8780684d42
c0
kernel: page fault trap, code=0
Stopped in pid 0.54 (system) at netbsd:ktrace_thread+0x1fd:     cmpq    %rbx,8(%
r12)
db{0}> bt
ktrace_thread() at netbsd:ktrace_thread+0x1fd

Reproducing Crashes

Right now the best way to reproduce a bug detected is to use the Fuzzer’s driver program itself:

./driver -tv < crash_file

The crash_file can be found in the outputs directory and is a custom file format made for the driver.
Memory allocation and Socket Creation remain to be added to the reproducer generator(genRepro) highlighted in the previous post and will be prioritised in the future.

Implications

Considering that we have a working fuzzer now, it is a good time to analyse how effective TriforceAFL is compared to other fuzzers.

Recently Syzkaller has been really effective in finding bugs in NetBSD. As shown in the below diagrams, both TriforceAFL and Syzkaller create multiple instances of the system to be fuzzed, gather coverage data, mutate input accordingly and continue fuzzing, but there are several differences in the way they work.

TriforceAFL

Syzkaller

Key differences between the two include:

KCOV
Syzkaller relies on the KCOV module for coverage data in NetBSD whereas TriforceAFL gets coverage information from its modified version of QEMU.
VMs
Syzkaller creates multiple VM’s and manages them with syz-manager. Whereas TriforceAFL simply forks the VM for each testcase.
Communication
Syzkaller uses RPC and ssh to communicate with the VMs whereas TriforceAFL uses a custom hypercall.
Hardware Acceleration
Syzkaller can use hardware acceleration to run the VMs at native speeds, whereas TriforceAFL can only utilize QEMU’s full system emulation mode, as it relies on it for coverage data.

These differences lead to very different results. To get a perspective, here are some stats from syzkaller, which can be found on the syzbot dashboard.

Bugs Found	Upstream	Fixed
57	37	20

Comparatively in 1st Weekend of Fuzzing:

	Bugs Found
Syzkaller	18
TriforceAFL	3

Compared to syzkaller, the number of bugs found by TriforceAFL in the first few days were significantly less, but nevertheless TriforceAFL was able to find variants of bugs found by syzkaller and 1 which was different with simpler reproducers.
Going by the stats provided by Syzbot and AFL, Syzkaller does ~80 execs / min whereas TriforceAFL can do ~1500 execs / min on average, Although this statistic is without any of the sanitizers enabled for TriforceAFL and kASan enabled for Syzkaller.
TriforceAFL has an advantage when it comes to the fact that it does not rely on KCOV for coverage data. This means it can get coverage data for fuzzing other interfaces easily too. This will be beneficial when we move onto Network and USB Fuzzing. Issues have been found with KCOV where in certain cases the fuzzer lost track of the kernel trace, especially in networking where after enqueuing a networking packet the fuzzer lost track as the packet was then handled by some other thread. Coverage data gathered using TriforceAFL will not be sensitive to this, considering that noise from the kernel is handled in some way to an extent.
Efforts were made in the original design of TriforceAFL to make the traces as deterministic as possible at a basic block level. To quote from this article: Sometimes QEMU starts executing a basic block and then gets interrupted. It may then re-execute the block from the start or translate a portion of the block that wasn't yet executed and execute that. This introduced non-determinism. To reduce the non-determinism, cpu-exec.c was altered to disable QEMU's "chaining" feature, and moved AFL's tracing feature to cpu_tb_exec to trace a basic block only after it has been executed to completion. Although nothing has been specifically done to reduce noise from the kernel, such as disabling coverage during interrupts.
TriforceAFL runs 1 fuzzing process per instance, whereas Syzkaller can run upto 32 fuzzing processes. But multiple fuzzing instance can be created with TriforceAFL as detailed in this document to take advantage of parallelization.
Maxime Villard was also able to rework TriforceNetBSDSyscallFuzzer to now also support compat_netbsd32 kernel. This work will be integrated as soon as possible.
On the other hand, Syzkaller has syzbot which can be thought of as a 24/7 fuzzing service. TriforceAFL does not have a such a service or a dedicated server for fuzzing.
A service like this will surely be advantageous in the future, but it is still a bit too early to set up one. To truly efficiently utilize such a service, it would be better to improve TriforceAFL in all ways possible first.

Targets to be met before a 24/7 fuzzing service is setup, include but are not limited to:

Automatic initial Analysis & Report Generation
Right now, There is nothing that can automatically perform an initial analysis to detect truly unique crashes and prepare reports with backtraces, this will be required and very helpful.
Better Reproducers
As mentioned before, the generated reproducers do not take care of Memory Allocation and Socket Creation, etc. These need to be included, if a good C reproducer is expected.
Parallel Fuzzing and Management
There is no central interface that summarises the statistics from all fuzzing instances, and manages such instances in case of anomalies/errors. Something like this would be needed for a service that will be run for longer periods of time without supervision.
Updated AFL and QEMU versions
Updated AFL and QEMU might significantly increase executions per second and lead to better mutation of inputs and also decrease the memory requirements.
Fuzzing more Interfaces
Right now we are only fuzzing syscalls, Network and USB Layers are great future prospects, at least prototype versions can be added in the near future.

Future Work

Although GSoC will be officially ending, I am looking forward to continuing the development of TriforceAFL, adding features and making it more effective.
Some improvements that can be expected include:

Fuzzing different interfaces - Network Fuzzing, USB Fuzzing
Updating AFL and QEMU versions
Better Reproducers
Syzbot like service

A new repo has been created at https://github.com/NetBSD/triforce4netbsd. Collaborative work in the future will be updated here.

Conclusion

TriforceAFL has been successfully adapted for NetBSD and all of the original goals of the GSoC proposal have been met, but the work is far from complete. Work done so far shows great potential, and incremental updates will surely make TriforceAFL a great fuzzer. I am looking forward to continuing the work and making sure that this is the case.
GSoC has been an amazing learning experience! I would like to thank Maxime Villard for taking an active interest in TriforceAFL and for assisting in testing and development. I would like to thank my mentor Kamil Rytarowski for being the guiding force throughout the project and for assisting me whenever I needed help. And finally I would like to thank Google for giving me this wonderful opportunity.

GSoC 2019 Report: Implementation of compat_netbsd32 DRM ioctl/Getting DRM applications running under compat-linux

2019-08-25T09:45:34+00:00

This article was prepared by Surya P as a part of Google Summer of Code 2019

To begin with where we left off last time, we were able to fix the suse131 package with this commit.This commit adds the GPU-specific bits to the package. And with that we had direct rendering enabled and working.I tested it out with glxinfo and glxgears applications.

Testing

In order to make sure that applications did not break with this commit,I tried Libreoffice and to no surprise everything ran as expected without any hiccups.

Then I had to make a choice between porting steam and implementing compat_netbsd32 but since steam had lot of dependencies which needed to be resolved and since implementation of compat_netbsd32 had much more priority I started with the implementation of compat_netbsd32.

Implementing compat_netbsd32 DRM ioctls - The Setup

For the Setup I downloaded i386 sets from the official NetBSD site and extracted it in the /emul directory. I ran some arbitrary programs like cat and ls from the emulated netbsd32 directory to make sure everything ran perfectly without any problems. I then tried running the 32bit glxinfo and glxgears application and to no surprise it kept segfaulting. I ktraced the application and identified the DRM ioctl that needed to be implemented.

Implementing compat_netbsd32 DRM ioctls - The Code

There were several functions which were required for the complete working of the compat_netbsd32 DRM ioctl. We implemented each and every function and had the code compiled. We then made sure that the code compiled both as a module and as well as a non module option with which the kernel can be built.I initially tested the code with 32bit glxinfo and glxgears , and the program didn't segfault and ran as expected.

Implementing compat_netbsd32 DRM ioctls - Testing

In order to test the code I built a test application leveraging the api’s provided in libdrm. It is a very simple application which initializes the DRM connection, setup and draws a gradient on screen and exits. I initially ran it against the native amd64 architecture, but to my surprise the application didn't work as expected. After some hours of debugging I realized that there can be only one DRM master and X was already a master. After exiting the X session and running the application, everything ran perfectly for both amd64 as well as i386 architectures.

What is done

The Drm Ioctls implementation of Netbsd has been tested and verified
The suse131 package has patched and updated (committed)
Compat_netbsd32 DRM ioctls has been implemented (Merged)
Subsequently DRM ioctls for emulated 32bit linux as well
Created a Test GUI Application for the code (yet to PR)

TODO

Create an ATF for the code and merge it into the tree
Read the code, look for bugs and clean it up
Port Steam and make it available in NetBSD

Conclusion

Completing the tasks listed in the TODO is of highest priority and would be carried over even if it exceeds the GSOC time period.

Last but not the least I would like to thank my mentors @christos and @maya for helping me out and guiding me throughout the process and Google for providing me with such a wonderful opportunity to work with NetBSD community.

GSoC 2019 Report: Adding NetBSD KNF to clang-format, Final

2019-08-24T06:12:37+00:00

This report was prepared by Manikishan Ghantasala as a part of Google Summer of Code 2019

This is the third and final report of the project Add KNF (NetBSD style) clang-format configuration that I have been doing as a part of Google Summer of Code (GSoC) ‘19 with the NetBSD.

You can refer to the first and second reports here:

About the project

ClangFormat is a set of tools to format C/C++/Java/JavaScript/Objective-C/Protobuf code. It is built on top of LibFormat to support workflow in various ways including a standalone tool called clang-format, and editor integrations. It supports a few built-in CodingStyles that include: LLVM, Google, Chromium, Mozilla, Webkit. When the desired code formatting style is different from the available options, the style can be customized using a configuration file. The aim of this project is to add NetBSD KNF support to clang-format and new styles to libFormat that support NetBSD’s style of coding. This would allow us to format NetBSD code by passing `-style=NetBSD` as an argument.

How to use clang-format

While using clang-format one can choose a style from the predefined styles or create a custom style by configuring specific style options.

Use the following command if you are using one of the predefined styles

clang-format filename -style=<Name of the style>

Configuring style with clang-format

clang-format supports two ways to provide custom style options: directly specify style name in the -style= command line option or use -style=file and put style configuration in a .clang-format or _clang-format file in the project’s top directory.

Check Clang-Format Style Options to know how different Style Options works and how to use them.

When specifying configuration in the -style= option, the same configuration is applied for all input files. The format of the configuration is:

-style=’{key1: value1, key2: value2, ...}'

The .clang-format file uses YAML format. An easy way to get a valid .clang-format file containing all configuration options of a certain predefined style is:

clang-format -style=llvm -dump-config > .clang-format

After making required changes to the .clang-format file it can be used as a custom Style by:

clang-format <filename> -style=file

Changes made to clang-format

The following changes were made to the clang-format as a part of adding NetBSD KNF:

New Style options added:

BitFieldDeclarationsOnePerLine

AlignConsecutiveListElements

Modifications made to existing styles:

Modified SortIncludes and IncludeCategories to support NetBSD like includes.
Modified SpacesBeforeTrailingComments to support block comments.

The new NetBSD Style Configurations:

This is the final configurations for clang-format with modified changes to support NetBSD KNF.

    
        AlignTrailingComments: true
        AlwaysBreakAfterReturnType: All
        AlignConsecutiveMacros: true
        AlignConsecutiveLists: true
        BitFieldDeclarationsOnePerLine: true
        BreakBeforeBraces: Mozilla
        ColumnLimit: 80
        ContinuationIndentWidth: 4
        Cpp11BracedListStyle: false
        FixNamespaceComments: true
        IndentCaseLabels: false
        IndentWidth: 8
        IncludeBlocks: Regroup
        IncludeCategories:
         - Regex: '^<sys/param\.h>'
           Priority: 1
           SortPriority: 0
         - Regex: '^<sys/types\.h>'
           Priority: 1
           SortPriority: 1
         - Regex: '^<sys.*/'
           Priority: 1
           SortPriority: 2
         - Regex: '^<uvm/'
           Priority: 2
           SortPriority: 3
         - Regex: '^<machine/'
           Priority: 3
           SortPriority: 4
         - Regex: '^<dev/'
           Priority: 4
           SortPriority: 5
         - Regex: '^<net.*/'
           Priority: 5
           SortPriority: 6
         - Regex: '^<protocols/'
           Priority: 5
           SortPriority: 7
         - Regex: '^<(fs|miscfs|msdosfs|nfs|ufs)/'
           Priority: 6
           SortPriority: 8
         - Regex: '^<(x86|amd64|i386|xen)/'
           Priority: 7
           SortPriority: 8
         - Regex: '^<path'
           Priority: 9
           SortPriority: 11
         - Regex: '^<[^/].*\.h'
           Priority: 8
           SortPriority: 10
         - Regex: '^\".*\.h\"'
           Priority: 10
           SortPriority: 12   
        SortIncludes: true
        SpacesBeforeCpp11BracedList: true
        SpacesBeforeTrailingComments: 4
        TabWidth: 8
        UseTab: Always

Status of each Style Option

Styles Ready to Merge:

1.Modified SortIncludes and IncludeCategories:

Patch: https://reviews.llvm.org/D64695

Styles needing revision

1.BitFieldDeclarationsOnePerLine:

Patch: https://reviews.llvm.org/D63062

Bugs: 1

2.SpacesBeforeTrailingComments supports Block Comments:

Patch: https://reviews.llvm.org/D65648

Remark: I have to discuss more on the cases that Block comments are used but where the Spaces are not to be added before them.

WIP Style

1.AlignConsecutiveListElements:

Commit: https://github.com/sh4nnu/clang/commit/4b4cd45a5f3d211008763f1c0235a22352faa81e

Bugs: 1

About Styles

BitfieldDeclarationsOnePerLine:

Patch: https://reviews.llvm.org/D63062

This style lines up BitField declarations on consecutive lines with correct indentation.

Input:

    
            unsigned int bas :3, hh : 4, jjj : 8;
            
            
            unsigned int baz:1,
                            fuz:5,
                zap:2;

Output:


            unsigned  int bas : 3, 
                        hh  : 4, 
                        jjj : 8;
            
            
            unsigned int baz:1,
                        fuz:5,
                        zap:2;

Bug: Indentation breaks in the presence of block comments in between.

Input:

        
                    unsigned  int bas : 3, /* foo */
                                  hh  : 4, /* bar */
                                  jjj : 8;

Output:

        
                    unsigned  int bas : 3, /* foo */
                        hh  : 4, /* bar */
                        jjj : 8;

Modification for SortIncludes and IncludeCategories:

Patch: https://reviews.llvm.org/D64695

Status: Accepted, and ready to land.

Clang-format has a style option named SortIncludes which sorts the includes in alphabetical order.The IncludeCategories Style allows us to define a custom order for sorting the includes.

It supports POSIX extended regular expressions to assign Categories for includes.

The SortIncludes then sorts the #includes first according to increasing category number and then lexically within each category. When IncludeBlocks is set to Regroup merge multiple #includes blocks together and sort as one. Then split into groups based on category priority.

The problem arises when you want to define the order within each category, which is not supported. In this modification a new field named SortPriority is added, which is optional.

The #includes matching the regexs are sorted according to the values of SortPriority, and Regrouping after sorting is done according to the values of Priority. If SortPriority is not defined it is set to the value of Priority as a default value.

Example

    
        IncludeCategories:
         - Regex: ‘<^c/’
           Priority: 1
           SortPriority: 0
        
         - Regex: ‘^<(a|b)/’
           Priority: 1
           SortPriority: 1
        
         - Regex: ‘^<(foo)/’
           Priority: 2
      
         - Regex: ‘.*’
           Priority: 3

Input

    
            #include "exe.h"
            #include <a/dee.h>
            #include <foo/b.h>
            #include <a/bee.h>
            #include <exc.h>
            #include <b/dee.h>
            #include <c/abc.h>
            #include <foo/a.h>

Output

    
            #include <c/abc.h>
            #include <a/bee.h>
            #include <a/dee.h>
            #include <b/dee.h>
            
            #include <foo/a.h>
            #include <foo/b.h>
            
            #include <exc.h>
            #include "exe.h"

As you can observe in the above example the #includes are grouped by different priority and were sorted by different priority. Introduction of this new patch doesn’t affect the old configurations as it will work as the same old SortIncludes if sortPriority is not defined.

Refer to Report 2 for detailed examples on this.

Modification for SpacesBeforeTrailingComments

Patch: https://reviews.llvm.org/D64695

The SpacesBeforeTrailingComments is modified to support Block Comments which was used to support only line comments. The reason for this is block comments have different usage patterns and different exceptional cases. I have tried to exclude cases where some tests doesn’t support spaces before block comments. I have been discussing in the community for getting to know which cases should be included, and which to exclude.

Cases that were excluded due to failing tests:

If it is a Preprocessor directive,
If it is followed by a LeftParenthesis
And if it is after a Template closer

AlignConsecutiveListElements

Status: Work In Progress

This is a new style that aligns elements of consecutive lists in a nested list. The Style is still in work in progress. There are few cases that needed to be covered and fix few bugs.

Input:

    
            keys[] = {
                {"all", f_all, 0 },
                { "cbreak", f_cbreak, F_OFFOK },
                {"cols", f_columns, F_NEEDARG },
                { "columns", f_columns, F_NEEDARG },
             };

Output:

    
            keys[] = { { "all",      f_all,        0 },
                       { "cbreak",   _cbreak,      F_OFFOK },
                       { "cols",     f_columns,    F_NEEDARG }, 
                       { "columns",  f_columns,    F_NEEDARG },
                     };

Work to be done:

This style option aligns list declarations that are nested inside a list, I would also like to extend this style to align individual single line list declarations that are consecutive.

The problem with this style is the case in which there can be different number of elements for each individual.

Example:

    
            keys[] =  { "all",        f_all,        0 };
            keys2[] = { "cbreak",     _cbreak,      F_OFFOK };
            keys3[] = { "cols",       f_columns,    F_NEEDARG,     7 };
            keys4[] = { "columns",    f_columns };

Future Work

Some Style Options that were introduced during this GSoC were made in order to meet all the cases in NetBSD KNF. So they may need some revisions with respect to other languages and coding styles that clang-format supports. I will continue working on this project even after the GSoC period on the style options that are yet to be merged and add new style options if necessary and will get the NetBSD Style merged with upstream which is the final deliverable for the project. I would like to take up the responsibility of maintaining the “NetBSD KNF” support to clang-format.

Summary

Even though officially the GSoC’19 coding period is over, I definitely look forward to keep contributing to this project. This summer has had me digging a lot into the code from CLANG and NetBSD for references for creating or modifying the Style Options. I am pretty much interested to work with NetBSD again, I like being in the community and I would like to improve my skills and learn more about Operating Systems by contributing to this organisation.

I would like to thank my mentors Michal and Christos for their constant support and patient guidance. A huge thanks to both the NetBSD and LLVM community who have been supportive and have helped me whenever I have had trouble. Finally a huge thanks to Google for providing me this opportunity.

Fuzzing NetBSD Filesystems via AFL. [Part 2]

2019-08-11T22:19:09+00:00

This report was written by Maciej Grochowski as a part of developing the AFL+KCOV project.

Recently I started working on Fuzzing Filesystems on NetBSD using AFL.
You can take a look at the previous post to learn more details about background of this project.
This post summarizes the work that has been done in this area, and is divided into 3 sections:

Porting AFL kernel mode to work with NetBSD
Running kernel fuzzing benchmark
Example howto fuzzing particular Filesystem

AFL Port for NetBSD

AFL is a well known fuzzer for user space programs and libraries, but with some changes it can be used for fuzzing the kernel binary itself.

For the first step to fuzz the NetBSD kernel via AFL, I needed to modify it to use coverage data provided by the kernel instead of compiled instrumentations. My initial plan was to replace the coverage data gathered via afl-as with that provided by kcov(4). In this scenario, AFL would just run a wrapper and see the real coverage from the kernel.

I also saw previous work done by Oracle in this area, where instead of running the wrapper as a binary, the wrapper code was included in a custom library (.so object).

Both approaches have some pros and cons. One thing that convinced me to use a solution based on the shared library with initialization code was the potentially easier integration with remote fork server. AFL has some constraints in the way of managing fuzzed binary, and keeping it on a remote VM is less portable than fuzzing using a shared library and avoiding introducing changes to the original binary fuzzing.
Porting AFL kernel fuzzing mode to be compatible with NetBSD kernel mainly relied on how the operating system manages the coverage data. The port can be found currently on github.

Writing a kernel fuzzing benchmark

Performance is one of the key factors of fuzzing. If performance of the fuzzing process is not good enough, it's likely that the entire solution won't be useful in practice. In this section we will evaluate our fuzzer with a practice benchmark.

One exercise that I want to perform to check the AFL kernel fuzzing in practice is similar to a password cracking benchmark. The high level idea is that a fuzzer based on coverage should be much smarter than bruteforce or random generation.

To do this, we can write a simple program that will take a text input and compare it with a hardcoded value. If the values match, then the fuzzer cracked the password. Otherwise, it will perform another iteration with a modified input.

Instead of "password cracker", I called my kernel program "lottery dev". It's a character device that takes an input and compares it with a string.

The chance to find one 6 byte combination (or the "lucky byte" combination, because of the name) are similar to winning big in the lottery: every byte contains 8 bits, thus we have 2**(8*6) => 281,474,976,710,656 combinations. The coverage based fuzzer should be able to do this much quicker and in fewer iterations, as feedback from code instrumentations will show, compared to blindly guessing.

I performed a similar test using a simple C program: the program read stdio and compared it with the hardcoded pattern. If the pattern matches, the program panics, and otherwise returns zero. Such a test took an AFL about a few hours on my local laptop to break the challenge (some important details can make it faster). The curious reader who wants to learn some basics of AFL should try to do run a similar test on their machine.

I ran the fuzzer on my lottery dev for several days, and after almost a week it was still not able to find the combination. So something was fundamentally not right. The kernel module with the wrapper code can be found here.

Measuring Coverage for a particular function

In the previous article, I mentioned that the NetBSD kernel seems to be 'more verbose' in terms of coverage reporting. I ran my lottery dev wrapper code (the code that writes a given input to the char device) to check the coverage data using standard kcov(4) without the AFL module. My idea was to check the ratio between entries of my code that I wanted to track and other kernel functions that can be considered noise from other subsystems. These operations are caused by the context services, such as Memory Management, File Systems or Power Management, that are executed in the same process.

To my surprise, there was a lot of data, but I could not find any of functions from lottery dev. I quickly noticed that the amount of addresses is equal to the size of kcov(4) buffer, so maybe my data didn't fit in the buffer inside kernel space?

I changed the size of the coverage buffer to make it significantly larger and recompiled the kernel. With this change I reran the test. Now, with the buffer being large enough, I collected the data and printed top 20 entries with a number of occurrences. There were 30578 entries in total.

1544 /usr/netbsd/src/sys/uvm/uvm_page.c:847
1536 /usr/netbsd/src/sys/uvm/uvm_page.c:869
1536 /usr/netbsd/src/sys/uvm/uvm_page.c:890
1536 /usr/netbsd/src/sys/uvm/uvm_page.c:880
1536 /usr/netbsd/src/sys/uvm/uvm_page.c:858
1281 /usr/netbsd/src/sys/arch/amd64/compile/obj/GENERIC/./machine/cpu.h:70
1281 /usr/netbsd/src/sys/arch/amd64/compile/obj/GENERIC/./machine/cpu.h:71
 478 /usr/netbsd/src/sys/kern/kern_mutex.c:840
 456 /usr/netbsd/src/sys/arch/x86/x86/pmap.c:3046
 438 /usr/netbsd/src/sys/kern/kern_mutex.c:837
 438 /usr/netbsd/src/sys/kern/kern_mutex.c:835
 398 /usr/netbsd/src/sys/kern/kern_mutex.c:838
 383 /usr/netbsd/src/sys/uvm/uvm_page.c:186
 308 /usr/netbsd/src/sys/lib/libkern/../../../common/lib/libc/gen/rb.c:129
 307 /usr/netbsd/src/sys/lib/libkern/../../../common/lib/libc/gen/rb.c:130
 307 /usr/netbsd/src/sys/uvm/uvm_page.c:178
 307 /usr/netbsd/src/sys/uvm/uvm_page.c:1568
 231 /usr/netbsd/src/sys/lib/libkern/../../../common/lib/libc/gen/rb.c:135
 230 /usr/netbsd/src/sys/uvm/uvm_page.c:1567
 228 /usr/netbsd/src/sys/kern/kern_synch.c:416

It should not be a surprise that the coverage data does not much help our fuzzing with AFL. Most of the information that the fuzzer sees is related to UVM page management and machine-dependent code.

I decided to remove instrumentation from these most common functions to check the difference. Using the attribute no_instrument_function tells the compiler to not put instrumentation for coverage tracing inside these functions.

Unfortunately, after recompiling the kernel, the most common functions did not disappear from the list. As I figured out, the support in GCC 7 may not be fully in place.

GCC 8 for help

To solve this issue, I decided to work on reusing GCC 8 for building the NetBSD kernel. After fixing basic build warnings, I got my basic kernel working. This still needs more work to get kcov(4) fully functional. Hopefully, in the next report, I will be able to share these results.

Fuzzing Filesystem

Given what we already know, we can run Filesystem fuzzing. As a target I chose FFS as it is a default FS that is delivered with NetBSD.

The reader may ask the question: why would you run coverage based fuzzer if the data is not 100% accurate? So here is the trick: For coverage based fuzzers, it is usually recommended to leave the input format as is, as genetic algorithms can do a pretty good job here. There is great post on Michal Zalewski's Blog about this process applied for the JPEG format: "Pulling JPEGs out of thin air". But what will AFL do if we provide an input format that is already correct? We already know what a valid FS image should look like (or we can simply just generate one). As it turns out, AFL will start performing operations on the input in a similar way as mutation fuzzers do - another great source that explains this process can be found here: "Binary fuzzing strategies: what works, what doesn't"

Writing mount wrapper

As we discussed in the previous paragraph, to fuzz the kernel itself, we need some code to run operations inside the kernel. We will call it a wrapper, as it wraps the operations of every cycle of fuzzing. The first step to write a wrapper for AFL is to describe it as a sequence of operations. Bash style scripting is usually good enough to do that.
We need to have an input that will be modified by the fuzzer, and be able to mount it. NetBSD comes with vnd(4) that allows exposing regular files as block devices. The simplest sequence can be described as:

# Expose file from tmpfs as block device
vndconfig vnd0 /tmp/rand.tmp

# Create a new FS image on the blk dev that we created
newfs /dev/vnd0

# Mount our fresh FS
mount /dev/vnd0 /mnt

# Check if FS works fine
echo "FFS mounted!" > /mnt/test

# Undo mount
umount /mnt

# Last undo step
vndconfig -u vnd0

From bash to C and system calls

At this point, the reader has probably figured out that a shell script won't be the best approach for fuzzer usage. We need to change it to C code and use proper syscall/libc interfaces.

vndconfig uses the opendisk(3) combined with vnd_ioctl. mount(2) is a simple system call which can operate directly after file is added to vnd(4)

Below is an example conceptual code for mounting an FS:

	// Structure required by mount()
	struct ufs_args ufs_args;

	// VNConfigure step
	rv = run_config(VND_CONFIG, dev, fpath);
	if (rv) 
		printf("VND_CONFIG failed: rv: %d\n", rv);

	// Mount FS
	if (mount(FS_TYPE, fs_name, mntflags, &ufs_args, 8) == -1) {
		printf("Mount failed: %s", strerror(errno));
	} else {
		// Here FS is mounted
		// We can perform any other operations on it
	
		// Umount FS
		if (unmount(fs_name, 0) == -1) printf("#: Umount failed!\n");
	}

	// VNC-unconfigure
	rv = run_config(VND_UNCONFIG, dev, fpath);
	if (rv) {
		printf("VND_UNCONFIG failed: rv: %d\n", rv);
	}

The complete code can be found here

Ready to fuzz FFS! aka Running FS Fuzzing with a predifined corpus

The first thing that we need to do is to have a wrapper that provides mount/umount functionality. In the previous section, we have already shown how that can be done. For now, we will be fuzzing the same kernel that we are running on. Isn't that dangerous? Taking a saw to the branch we are sitting on? Of course it is! In this exercise I want to illustrate an idea from a technical perspective so the curious reader is able to understand it better and do any modifications by their own. The take away from this exercise is that the fuzzing target is the kernel itself, the same binary that is running the fuzzing process.

Let's come back to the wrapper code. We've already discussed how it works. Now we need to compile it as a shared library. This is not obvious, but should be easy to understand after we already brought this sawing-off metaphor.

To compile the so object:

gcc -fPIC -lutil  -g -shared ./wrapper_mount.c -o wrapper_mount.so

Now we need to create the input corpus, for the first attempt we will use a large enough empty file.

dd if=/dev/zero of=./in/test1 bs=10k count=8

And finally run. The @@ tells AFL to put here the name of input file that will be used for fuzzing.

./afl-fuzz -k -i ./in -o ./out -- /mypath/wrapper_mount.so @@

Now, as we described earlier, we need a proper FS image to allow AFL perform mutations on it. The only difference is the additional newfs(8) command.

# We need a file, big enough to fit FS image but not too big
dd if=/dev/zero of=./in/test1 bs=10k count=8

# A block is already inside fuzzer ./in
vndconfig vnd0 ./in/test1

# Create new FFS filesystem
newfs /dev/vnd0

vndconfig -u vnd0

Now we are ready for another run!

./afl-fuzz -k -i ./in -o ./out -- /mypath/wrapper_mount.so @@



                  american fuzzy lop 2.35b (wrapper_mount.so)

┌─ process timing ─────────────────────────────────────┬─ overall results ─────┐
│        run time : 0 days, 0 hrs, 0 min, 17 sec       │  cycles done : 0      │
│   last new path : none seen yet                      │  total paths : 1      │
│ last uniq crash : none seen yet                      │ uniq crashes : 0      │
│  last uniq hang : none seen yet                      │   uniq hangs : 0      │
├─ cycle progress ────────────────────┬─ map coverage ─┴───────────────────────┤
│  now processing : 0 (0.00%)         │    map density : 17.28% / 17.31%       │
│ paths timed out : 0 (0.00%)         │ count coverage : 3.53 bits/tuple       │
├─ stage progress ────────────────────┼─ findings in depth ────────────────────┤
│  now trying : trim 512/512          │ favored paths : 1 (100.00%)            │
│ stage execs : 15/160 (9.38%)        │  new edges on : 1 (100.00%)            │
│ total execs : 202                   │ total crashes : 0 (0 unique)           │
│  exec speed : 47.74/sec (slow!)     │   total hangs : 0 (0 unique)           │
├─ fuzzing strategy yields ───────────┴───────────────┬─ path geometry ────────┤
│   bit flips : 0/0, 0/0, 0/0                         │    levels : 1          │
│  byte flips : 0/0, 0/0, 0/0                         │   pending : 1          │
│ arithmetics : 0/0, 0/0, 0/0                         │  pend fav : 1          │
│  known ints : 0/0, 0/0, 0/0                         │ own finds : 0          │
│  dictionary : 0/0, 0/0, 0/0                         │  imported : n/a        │
│       havoc : 0/0, 0/0                              │ stability : 23.66%     │
│        trim : n/a, n/a                              ├────────────────────────┘
┴─────────────────────────────────────────────────────┘             [cpu:  0%]

Future work

Support for the NetBSD kernel fuzzing was developed as a part of the AFL FileSystems Fuzzing project, which aims to improve quality of filesystems and catch various issues.
The very next thing that I have on my todo list is to provide support for kernel tracing on GCC 8 to turn off coverage data from other functions that generate a lot of noise.
During the FFS fuzzing, I found a few issues that I need to analyze in detail.
Last but not least, for the next report I plan to show a remote setup of AFL running on a VM, reporting crashes, and being remotely rebooted by the master controller.

GSoC 2019 Report: Adding NetBSD KNF to clang-format, Part 2

2019-08-07T17:07:10+00:00

This report was prepared by Manikishan Ghantasala as a part of Google Summer of Code 2019

This report encloses the progress of the project Add KNF (NetBSD style) clang-format configuration during the second coding period of GSoC 2019.

Clang-format

Clang-format is a powerful code formatter which is a part of clang. Clang-format formats the code either by a configuration file .clang-format or can be chosen from some predefined coding styles namely LLVM, Google, Chromium, Mozilla, WebKit.

The final goal of the project is to add a new style NetBSD along with them by patching the libFormat to support the missing styles and add the configuration according to NetBSD KNF.


    clang-format -style=NetBSD

Style options introduced in the first coding period:

BitFieldDeclarationsOnePerLine
SortNetBSDIncludes

You can also take a look at the first report to learn more about these style options.

Work done in the second coding period

I have worked on the following styles during this second coding period.

Withdrawn SortNetBSDIncludes and modified existing sortIncludes.
Modified spacesBeforeTralingComments to support block comments.
New style option alignConsecutiveListElements introduced.

Sortincludes:

The native SortIncludes sorts the includes/headers in alphabetical order. And we also have IncludeCategories which allows setting custom priorities to group the includes matching via a regex after sorting.

Example:

Configuration:

    
    IncludeCategories:
        -Regex:	        ^<(a|b|c)/
         Priority:	1

        -Regex:	        ^<(foo)/
         Priority:	2

        -Regex:	        .*
         Priority:	3

Input

    
        #include exe.h
        #include gaz.h
        #include <a/dee.h>
        #include <foo/b.h>
        #include <a/bee.h>
        #include <exc.h>
        #include <b/dee.h>
        #include <c/abc.h>
        #include <foo/a.h>

Output

    
        #include <a/bee.h>
        #include <a/dee.h>
        #include <b/dee.h>
        #include <c/abc.h>

        #include <foo/a.h>
        #include <foo/b.h>

        #include <exc.h>
        #include gaz.h>

Modified SortIncludes

The new sortIncludes supports to give custom priority for sorting in addition to grouping priority. Now the IncludeCategories have a new field named SortPriority along with Priority to set the priority while sorting the includes, and the default priority will be alphabetical. The usage and working of the new IncludeCategories style shown in the following example.

Example

    
    IncludeCategories:
        -Regex:	        <^c/
         Priority:	1
         SortPriority:	0

        -Regex:	        ^<(a|b)/
         Priority:	1
         SortPriority:	1

        -Regex:	        ^<(foo)/
         Priority:	2

        -Regex:	        .*
         Priority:	3

Input

    
        #include exe.h
        #include <a/dee.h>
        #include <foo/b.h>
        #include <a/bee.h>
        #include <exc.h>
        #include <b/dee.h>
        #include <c/abc.h>
        #include <foo/a.h>

Output

    
        #include <c/abc.h>
        #include <a/bee.h>
        #include <a/dee.h>
        #include <b/dee.h>
        
        #include <foo/a.h>
        #include <foo/b.h>
        
        #include <exc.h>
        #include gaz.h

As we observe in the above example, the includes having the same Priority are grouped, and SortPriority defines the sort priority.

The patch was accepted and ready to merge, and you can find the patch here -> SortIncludesPatch. This patch also introduces the NetBSD style to clang-format with configurations to supported styles.

Spaces Before Trailing Comments

This option is also a native style option in clang-format which enables a user to decide the number of spaces to be given before trailing comments (// - comments) but not block comments (/* - comments). The reason for this is that block comments have different usage patterns and different exceptional cases. The following is an example of the native style option.

    
    SpacesBeforeTrailingComments: 3

    void f() {
    	if (true) {  //foo
            f();     //bar
    	}            //foo1
    }

Modifications to spaceBeforeTrailingComments:

I am working on modifying this style option to support block comments by covering cases where block comments are used. There are some cases yet to be covered. Once the patch is ready, this is the deliverable

The initial revision can be found here -> patch

    
    SpacesBeforeTrailingComments: 3


    void f() {
        if (true) {  /*foo */
            f();     /*bar */
    	}            /*foo1*/
    }

AlignConsecutiveListElements

AlignConsecutiveListElements is a new style option that I am going to introduce to clang-format. This style option aligns elements in consecutive definitions, declarations of lists. The style is not yet ready to patch, and I have to modify a lot of functionalities in alignTokens() which aligns tokens to support this style.

Example:

Input

    
    keys[] = {
        	{"all", f_all, 0 },
        	{ "cbreak", f_cbreak, F_OFFOK },
        	{"cols", f_columns, F_NEEDARG },
        	{ "columns", f_columns, F_NEEDARG },
        };

Output

    
    keys[] = {
        	{ "all",        f_all,      0 },
        	{ "cbreak",     _cbreak,    F_OFFOK },
        	{ "cols",       f_columns,  F_NEEDARG },
        	{ "columns",    f_columns,  F_NEEDARG },
        };

The blocker to this style is the nested structure for list declarations, alignTokens() should be equipped to parse the nested list declarations to support this style option. I will make sure this style option is available by the next report.

Further plans

For the next phase, I will make all the style options that were modified or introduced till now during the first two phases mergeable to upstream along with required unit tests. With these style options ready, I will test the new clang-format with NetBSD style patched across NetBSD source and check for bugs. After the testing will try to fix the bugs and get the NetBSDStyle ready for the final evaluation.

Summary

In the final coding period, the main focus will be on testing the new/modified style options and fix them. Add any missing styles for NetBSD and get the NetBSDStyle by the final evaluation.

I want to thank my mentors Michal, Christos and developers of both LLVM and NetBSD for supporting to complete this project.

GSoC 2019 Report Update: Incorporating the memory-hard Argon2 hashing scheme into NetBSD

2019-08-06T17:00:33+00:00

This report was prepared by Jason High as a part of Google Summer of Code 2019

Introduction

As a memory hard hashing scheme, Argon2 attempts to maximize utilization over multiple compute units, providing a defense against both Time Memory Trade-off (TMTO) and side-channel attacks. In our first post, we introduced our GSOC project's phase 1 to integrate the Argon2 reference implementation into NetBSD. Having successfully completed phase 1, here we briefly discuss parameter tuning as it relates to password management and performance.

Parameter Tuning

Both the reference paper [1] and the forthcoming RFC [2] provide recommendations on how to determine appropriate parameter values. While there are no hard-and-fast rules, the general idea is to maximize resource utilization while keeping performance, measured in execution run-time, within a tolerable bound. We summarize this process as follows

Determine the Argon2 variant to use
Determine the appropriate salt length
Determine the appropriate tag length
Determine the acceptable time cost
Determine the maximum amount of memory to utilize
Determine the appropriate degree of parallelism

Step 1

All three Argon2 variants are available in NetBSD. First, argon2i is a slower variant using data-independent memory access suitable for password hashing and password-based key derivation. Second, argon2d is a faster variant using data-dependent memory access, but is only suitable for application with no threats from side-channel attacks. Lastly, argon2id runs argon2i on the first half of memory passes and argon2d for the remaining passes. If you are unsure of which variant to use, it is recommended that you use argon2id.[1][2]

Step 2-3

Our current implementation uses a constant 32-byte hash length (defined in crypt-argon2.c) and a 16-byte salt length (defined in pw_gensalt.c). Both of these values are on the high-end of the recommendations.

Steps 4-6

We paramaterize Argon2 on the remaining three variables: time (t), memory (m), and parallelism (p). Time t is defined as the amount of required computation and is specified as the number of iterations. Memory m is defined as the amount of memory utilized, specified in Kilobytes (KB). Parallelism p defines the number of independent threads. Taken together, these three parameters form the knobs with which Argon2 may be tuned.

Recommended Default Parameters

For default values, [2] recommends both argon2id and argon2d be used with a time t cost of 1 and memory m set to the maximum available memory. For argon2i, it is recommended to use a time t cost of 3 for all reasonable memory sizes (reasonable is not well-defined). Parallelism p is a factor of workload partitioning and is typically recommended to use twice the number of cores dedicated to hashing.

Evaluation and Results

Given the above recommendations, we evaluate Argon2 based on execution run-time. Targeting password hashing, we use an execution time cost of 500ms to 1sec. This is slightly higher than common recommendations, but is largely a factor of user tolerance. Our test machine has a 4-core i5-2500 CPU @ 3.30GHz running NetBSD 8. To evaluate the run-time performance of Argon2 on your system, you may use either the argon2(1) or libargon2. Header argon2.h provides sufficient documentation, as well as the PHC git repo at [4]. For those wanting to use argon2(1), an example of using argon2(1) is below

m2# echo -n 'password'|argon2 somesalt -id -p 2 -t 1 -m 19
Type:           Argon2id
Iterations:     1
Memory:         524288 KiB
Parallelism:    2
Hash:           7b9618bf35b02c00cfef32cb4455206dc400b140116710a6c02732e068021609
Encoded:        $argon2id$v=19$m=524288,t=1,p=2$c29tZXNhbHQ$e5YYvzWwLADP7zLLRFUgbcQAsUARZxCmwCcy4GgCFgk
0.950 seconds
Verification ok

In order to establish a performance baseline, we first evaluate the run-time of all three variants using the recommended default parameters without parallelism. Our objective is to maximize memory m while constraining the execution cost between 500ms to 1sec. While we graph all three variants for comparison, our target is variant argon2id. We loop over Argon2 with monotonically increasing values of m for all three variants. Graphing our results below, we determine the maximum memory m value within our bound is 2¹⁹. However, to follow common suggestions [3], we chose 2²⁰ (1048576 Bytes) with the assumption that increased parallelism will bring the execution time cost down within acceptable limits.

Figure 1

Having established our performance baseline to obtain a best guess for our memory m value, we can turn our attention to parallelism. As touched upon earlier, common suggestions for parallelism p is twice the dedicated core count. Threads in Argon2 form computational lanes which allow work on the memory matrix to be partitioned into independent slices. As such, intuition tells us that by increasing parallelism we should see a decrease in our overall run-time until we are computationally bound. In the graph below, we see the affects of increasing parallelism p over our selected t and m values.

Figure 2

Our baseline memory m value initially exceed our desired upper bound of 1sec without parallelism. Fortunately, we found that increasing the thread count sufficiently parallelizes the work until it starts to settle around p=8. To see if we can further increase our baseline memory m value, we can look at following the graph for argon2id with t=1, p=8. We note that our initial baseline value of m=20 is the maximum m value falling within our bound.

Figure 3

Having all three parameters, we can validate the run-time execution easily using argon2(1) to confirm performance is within our defined bounds. Once you are satisfied, you may use passwd.conf(5) to apply the parameters globally or on a per-user basis. Our first post includes an example of adding the appropriate stanza to passwd.conf(5).

m2# echo -n 'password'|argon2 somesalt -id -m 20 -p 8 -t 1  
Type:           Argon2id
Iterations:     1
Memory:         524288 KiB
Parallelism:    8
Hash:           c62dbebec4a2da3a37dcfa2d82bd2f55541fce80992cd2c1cb887910e859589f
Encoded:        $argon2id$v=19$m=524288,t=1,p=8$c29tZXNhbHQ$xi2+vsSi2jo33Potgr0vVVQfzoCZLNLBy4h5EOhZWJ8
0.858 seconds
Verification ok

Summary

Argon2 is well-known for its high resource requirements. Fortunately, we are able to tune Argon2 so that we can achieve reasonable performance on NetBSD. In this post, we consider an approach to selecting appropriate parameter values for tuning Argon2 for password hashing. There are no hard-and-fast rules. We have found that it is easiest to start with a baseline value of memory m, then tweaking the values of parallelism p in order to achieve the desired performance.

Remaining Tasks

For the third and final portion of Google Summer of Code, we will focus on clean-up, documentation, and finalizing all imported code and build scripts. If you have any questions and/or comments, please let us know.

References

[1] https://github.com/P-H-C/phc-winner-argon2/blob/master/argon2-specs.pdf
[2] https://tools.ietf.org/html/draft-irtf-cfrg-argon2-04#section-4
[3] https://argon2-cffi.readthedocs.io/en/stable/parameters.html
[4] https://github.com/P-H-C/phc-winner-argon2

Work-in-progress threading support in LLDB

2019-08-02T17:05:41+00:00

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support, extending NetBSD's ptrace interface to cover more register types and fix compat32 issues, and lately fixing watchpoint support. You can read more about that in my June 2019 report.

My July's work has been focused on improving support for NetBSD threads in LLDB. This involved a lot of debugging and fighting hanging tests, and I have decided to delay committing the results until I manage to provide fixes for all the immediate issues.

Buildbot updates

During July, upstream has made two breaking changes to the build system:

Automatic switching to libc++abi when present in ebuild tree was initially removed in D63883. I needed to force it explicitly because of this. However, upstream has eventually reverted the change.
LLDB has enabled Python 3 support, and started requiring SWIG 2+ in the process (D64782). We had to upgrade SWIG on the build host, and eventually switched to Python 3 as well.

As a result of earlier watchpoint fixes, a number of new tests started running. Due to lacking multithreading support, I had to XFAIL a number of LLDB tests in r365338.

A few days later, upstream has fixed the issue causing TestFormattersSBAPI to fail. I un-XFAILED it in r365991.

The breaking xfer:libraries-svr4:read change has been reapplied and broke NetBSD process plugin again. And I've reapplied my earlier fix as r366889.

Lit maintainers have broken NetBSD support in tests by starting to use env -u VAR syntax in r366980. The -u switch is not specified by POSIX, and not supported by NetBSD env(1). In order to fix the problem, I've changed FileCheck's behavior to consider empty envvar as equivalent to disabled (r367122), and then switched lit to set both envvars to empty instead (r367123).

Finally, I've investigated a number of new test failures by the end of the month:

New functionalities/signal/handle-abrt test was added in r366580. Since trampolines are not properly supported at the moment, I've marked it XFAIL in r367228.
Two functionalities/exec tests started failing since upstream fixed @skipIfSanitized that previously caused the test to be skipped unconditionally (r366903). Since it's not a regression, I've marked it XFAIL in r367228.
Same happened for one of the python_api/hello_world tests. It was clearly related to another failing test, so I've marked it XFAIL in r367285.
Two tools/lldb-vscode tests were failing since upstream compared realpath'd path with normal path, and our build path happens to include symlinks as one of the parent directories. I've fixed the test to compare realpath in r367291. While at it, I've replaced weird os.path.split(...)[0] with clearer os.path.dirname(...) as suggested by Pavel Labath, in r367290.

NetBSD ptrace() interfaces for thread control

NetBSD currently provides two different methods for thread-related operations.

The legacy method consists of the following requests:

PT_CONTINUE with negative data argument. It is used to resume execution of a single thread while suspending all other threads.
PT_STEP with positive data argument. It is used to single-step the specified thread, while all other threads continue execution.
PT_STEP with negative data argument. It is used to single-step the specified thread, while all other threads remain suspended.

This means that using those methods, you can effectively either:

run all threads, and optionally send signal to the process,
run one thread, while keeping other threads suspended,
single-step one thread, with all other threads either running or being suspended as a whole.

Furthermore, it is impossible to combine single-stepping with syscall tracing via PT_SYSCALL.

The new method introduced by Kamil Rytarowski during his ptrace(2) work is more flexible, and includes the following requests:

PT_RESUME that sets the specified thread to continue running after PT_CONTINUE.
PT_SUSPEND that sets the specified thread to remain suspended after PT_CONTINUE.
PT_SETSTEP that enables single-stepping for the specified thread after PT_CONTINUE.
PT_CLEARSTEP that disables single-stepping for the specified thread.

Using the new API, it is possible to control both execution and single- stepping per thread, and to combine syscall tracing with that. It is also possible to deliver a single signal either to the whole process or to one of the threads.

Implementing threading in LLDB NetBSD process plugin

When I started my work, the support for threads in the NetBSD plugin was minimal. Technically, the code had structures needed to keep the threads and filled it in at start. However, it did not register new or terminated threads, and it did not support per-thread execution control.

The first change necessary was therefore to implement support for reporting new and terminated threads. I've prepared an initial patch in D65555. With this patch enabled, the thread list command now correctly reports the list of threads at any moment.

The second change necessary is to fix process resuming routine to support multiple threads properly. The routine is passed a data structure containing requested action for each thread. The old code simply took the action for the first thread, and applied it to the whole process. D64647 is my work-in-progress attempt at using the new ptrace calls to apply correct action for each thread.

However, the patch is currently buggy as it assumed that LLDB should provide explicit eStateSuspended action for each thread that is supposed to be supposed. The current LLDB implementation, on the other hand, assumes that thread should be suspended if no action is specified for it. I am currently discussing with upstream whether the current approach is correct, or should be changed to the explicit eStateSuspended usage.

The third change necessary is that we need to explicitly copy debug registers to newly created threads, in order to enable watchpoints on them. However, I haven't gotten to writing a patch for this yet.

Fixing nasty `process interrupt` bug

While debugging my threading code, I've hit a nasty bug in LLDB. After issuing process interrupt command from remote LLDB session, the server terminated. After putting a lot of effort into debugging why the server terminates with no obvious error, I've discovered that it's terminating because… the client has disconnected.

Further investigation with help of Pavel Labath uncovered that the client is silently disconnecting because it expects a packet indicating that the process has stopped and times out waiting for it. In order to make the server send this packet, NetBSD process plugin needed to explicitly mark process as stopped in the SIGSTOP handler. I've fixed it in r367047.

Future plans

The initial 6 months of my LLDB contract have passed. I am currently taking a month's break from the work, then I will resume it for 3 more months. During that time, I will continue working on threading support and my remaining goals.

The remaining TODO items are:

Add support to backtrace through signal trampoline and extend the support to libexecinfo, unwind implementations (LLVM, nongnu). Examine adding CFI support to interfaces that need it to provide more stable backtraces (both kernel and userland).
Add support for i386 and aarch64 targets.
Stabilize LLDB and address breaking tests from the test suite.
Merge LLDB with the base system (under LLVM-style distribution).

This work is sponsored by The NetBSD Foundation

https://netbsd.org/donations/#how-to-donate

Enchancing Syzkaller Support for NetBSD, Part 2

2019-08-02T16:07:05+00:00

Prepared by Siddharth Muralee(@R3x) as a part of Google Summer of Code’19

As a part of Google Summer of Code’19, I am working on improving the support for Syzkaller kernel fuzzer. Syzkaller is an unsupervised coverage-guided kernel fuzzer, that supports a variety of operating systems including NetBSD. This report details the work done during the second coding period.

You can also take a look at the first report to learn more about the initial support that we added.

Network Packet Injection

As a part of improving the fuzzing support for the NetBSD kernel. We decided to add support for fuzzing the Network stack. This feature already exists for operating systems such as Linux and OpenBSD.

Motivation

The aim is to fuzz the Network stack by sending packets with malformed/random data and see if it causes any kernel anomalies. We aim to send packets in such a way that the code paths in the kernel which would usually get triggered normally (during ordinary usage of the networking system) would also get triggered here. This is achieved using the TAP/TUN interface.

TAP/TUN Interface

TAP/TUN interface is a software-only interface - which means there are no hardware links involved. This makes it an ideal option to be used for interacting with the kernel networking code.

Userspace programs can create TAP/TUN interfaces and then write properly formatted data into it which will then be sent to the kernel. Also reading from the TAP/TUN interface would give us the data that the interface is sending out.

Basic Design

We create a virtual interface using TAP/TUN and send packets through it. We add two syscalls - syz_emit_ethernet and syz_extract_tcp_res. The former does the job of sending a packet into the kernel and the latter receives the response.

We need the response from the kernel to be read because we need the TCP acknowledgement number to be used for the next packet that we send. So syz_extract_tcp_res also extracts the acknowledgement number from the reply the kernel sent. This article explains the concept of TCP acknowledgement and sequence numbers very well.

Parallelizing Network fuzzing

In the syzkaller config you can define the number of processes that syzkaller can run in a single VM instance. And since we can have multiple instances of the fuzzer(executor) running at the same time we need to make sure that there is no collision between the interfaces. For solving this we create a seperate interface per fuzzer and assign it with a different IP address (both IPv4 and IPv6) to create an isolated network.

Filesystem Image fuzzing

A very less explored area in fuzzing is Filesystem image fuzzing. Syzkaller supports filesystem fuzzing only for the Linux kernel. We are currently working on porting the existing support that Linux has and then improve it.

Motivation

The aim is to fuzz filesystem specific code by mounting custom images and then perform operations on these images. This would lead to execution of the kernel components of filesystem code and allow us to find out potential bugs.

Existing Design

As I mentioned in the previous report - syzkaller generats inputs based on a pseudo formal grammar. This allows us to also write grammar to generate filesystem images on the fly. This is what the current implementation does - generate random images, write the segments into memory and then mount the resulting image.

Miscellaneous work

I have also been fine tuning the syzkaller fuzzer whenever necessary. This involves adding new features and fixing issues.

Coverage Display

Syzkaller has a utility which uses the coverage from KCoV and marks the corresponding lines on the source code and displays the same. This feature wasn’t working for some of the operating systems.

The issue was that syzkaller was trying to strip off the common prefix on all the file paths retrieved from Kcov and then appending the directory where the source code is present to access the files.

Stripping the common prefix made sense for Linux since its files were distributed across multiple folders in the src/ directory. This created issues for NetBSD since almost all of the kernel files are present in src/sys/ which lead to `sys` also being removed from the resultant path leading to an invalid file name.

I worked on revamping the syz-manager such that it removes prefix computation altogether and take the prefix we need to strip as a part of the config for syz-manager. Then we add the filepath of the kernel sources to get the path to the file.

The coverage for NetBSD can be viewed on the NetBSD dashboard.

TODO

The Filesystem fuzzing code isn’t upstream as of now and will be done shortly.

I have added a basic support for fuzzing both the Filesystem and the Network stack. Now there are a lot of improvements to be done for the same. This mainly involves adding more detailed and improved descriptions.

Relevant Links

Syzkaller Dashboard for NetBSD

Syzkaller repository on Github

NetBSD docs on setting up syzkaller

Summary

So far, we have found around 70 unique crashes with syzkaller. During the final coding period - I would be working on improving the support for the Filesystem fuzzing.

Last but not least, I want to thank my mentors, @kamil and @cryo for their useful suggestions and guidance. I would also like to thank Dmitry Vyukov, Google for helping with any issues faced with regard to Syzkaller. Finally, thanks to Google to give me a good chance to work with NetBSD community.

Adapting TriforceAFL for NetBSD, Part 2

2019-08-02T15:02:39+00:00

Prepared by Akul Pillai as part of GSoC 2019.

I have been working on adapting TriforceAFL for NetBSD kernel syscall fuzzing. This blog post summarizes the work done until the second evaluation.

For work done during the first coding period, check out this post.

Input Generation

For a feedback driven mutation based fuzzer that is TriforceAFL, fuzzing can be greatly improved by providing it with proper input test cases. The fuzzer can then alter parts of the valid input leading to more coverage and hopefully more bugs. The TriforceNetBSDSyscallFuzzer itself was a working fuzzer at the end of the first evaluation, but it was missing some proper input generation for most of the syscalls.
A greater part of the time during this coding period was spent adding and testing basic templates for a majority of NetBSD syscalls, scripts have also been added for cases where more complex input generation was required. This should now allow the fuzzer to find bugs it previously could not have.

Templates for 160 of the 483 syscalls in NetBSD have been added, below is the complete list:

1 exit, 2 fork, 3 read, 4 write, 5 open, 6 close, 7 compat_50_wait4, 8 compat_43_ocreat, 9 link, 10 unlink, 12 chdir, 13 fchdir, 14 compat_50_mknod, 15 chmod, 16 chown, 17 break, 19 compat_43_olseek, 20 getpid, 22 unmount, 23 setuid, 24 getuid, 25 geteuid, 26 ptrace, 33 access, 34 chflags, 35 fchflags, 36 sync, 37 kill, 39 getppid, 41 dup, 42 pipe, 43 getegid, 44 profil, 45 ktrace, 47 getgid, 49 __getlogin, 50 __setlogin, 51 acct, 55 compat_12_oreboot, 56 revoke, 57 symlink, 58 readlink, 59 execve, 60 umask, 61 chroot, 62 compat_43_fstat43, 63 compat_43_ogetkerninfo, 64 compat_43_ogetpagesize, 66 vfork, 73 munmap, 78 mincore, 79 getgroups, 80 setgroups, 81 getpgrp, 82 setpgid, 83 compat_50_setitimer, 86 compat_50_getitimer, 89 compat_43_ogetdtablesize, 90 dup2, 95 fsync, 96 setpriority, 97 compat_30_socket, 100 getpriority, 106 listen, 116 compat_50_gettimeofday, 117 compat_50_getrusage, 120 readv, 121 writev, 122 compat_50_settimeofday, 123 fchown, 124 fchmod, 126 setreuid, 127 setregid, 128 rename, 131 flock, 132 mkfifo, 134 shutdown, 135 socketpair, 136 mkdir, 137 rmdir, 140 compat_50_adjtime, 147 setsid, 161 compat_30_getfh, 165 sysarch, 181 setgid, 182 setegid, 183 seteuid, 191 pathconf, 192 fpathconf, 194 getrlimit, 195 setrlimit, 199 lseek, 200 truncate, 201 ftruncate, 206 compat_50_futimes, 207 getpgid, 209 poll, 231 shmget, 232 compat_50_clock_gettime, 233 compat_50_clock_settime, 234 compat_50_clock_getres, 240 compat_50_nanosleep, 241 fdatasync, 242 mlockall, 243 munlockall, 247 _ksem_init, 250 _ksem_close, 270 __posix_rename, 272 compat_30_getdents, 274 lchmod, 275 lchown, 276 compat_50_lutimes, 289 preadv, 290 pwritev, 286 getsid, 296 __getcwd, 306 utrace, 344 kqueue, 157 compat_20_statfs, 158 compat_20_fstatfs, 416 __posix_fadvise50, 173 pread, 174 pwrite, 197 mmap, 462 faccessat, 463 fchmodat, 464 fchownat, 461 mkdirat, 459 mkfifoat, 460 mknodat, 468 openat, 469 readlinkat, 458 renameat, 470 symlinkat, 471 unlinkat, 453 pipe2, 467 utimensat

A separate script (targ/gen2.py) generating tricker input cases was added for the following:

104 bind, 105 setsockopt, 118 getsockopt, 98 connect, 30 accept, 31 getpeername, 32 getsockname, 133 sendto, 29 recvfrom, 21 compat_40_mount, 298 compat_30_fhopen
299 compat_30_fhstat, 300 compat_20_fhstatfs, 93 compat_50_select, 373 compat_50_pselect, 345 compat_50_kevent, 92 fcntl, 74 mprotect, 203 mlock, 273 minherit, 221 semget, 222 semop, 202 __sysctl

Reproducibility

The fuzzer uses the simplest way to reproduce a crash which is by storing the exact input for the test case that resulted in a crash. This input can then be passed to the driver program, which will be able to parse the input and execute the syscalls that were executed in the order that they were executed.
A better prototype reproducer generator has now been added to the fuzzer that provides us with human readable and executable C code. This C code can be compiled and executed to reproduce the crash.
To generate the reproducers simply run the genRepro script from the targ directory. If everything goes right, the C reproducers will now be available in targ/reproducers, in separate files as follows:


// id:000009,sig:00,src:005858+002155,op:splice,rep:8
#include <sys/syscall.h>
#include <unistd.h>

int main() {
    __syscall( SYS_mincore, 0x/* removed from the report due to security concerns */, 0x/* ... */, 0x/* ... */);
    return 0;
}

// id:000010,sig:00,src:005859+004032,op:splice,rep:2
#include <sys/syscall.h>
#include <unistd.h>

int main() {
    __syscall( SYS_mprotect, 0x/* ... */, 0x/* ... */, 0x/* ... */);
    __syscall( SYS_mincore, 0x/* ... */, 0x/* ... */, 0x/* ... */);
    return 0;
}

// id:000011,sig:00,src:005859+004032,op:splice,rep:4
#include <sys/syscall.h>
#include <unistd.h>

int main() {
    __syscall( SYS_mprotect, 0x/* ... */, 0x/* ... */, 0x/* ... */);
    __syscall( SYS_mincore, 0x/* ... */, 0x/* ... */, 0x/* ... */);
    return 0;
}

The reproducers currently do not include the allocated memory and such, so not all reproducers will work. More improvements are to come, but this will hopefully make analysis of the crashes easier.

Fuzzing

The fuzzer was run for ~4 days. We were getting ~50 execs/sec on a single core. Please note that this is using qemu with softemu (TCG) as hardware acceleration cannot be used here. During this period the fuzzer detected 23 crashes that it marked as unique. Not all of these were truly unique, there is the scope of adding a secondary filter here to detect truly unique crashes. Most of them were duplicates of the crashes highlighted below:

compat_43_osendmsg - tcp_output: no template

call 114 - compat_43_osendmsg
arg 0: argStdFile 4 - type 12
arg 1: argVec64 77d0549cc000 - size 4
arg 2: argNum 8003

read 83 bytes, parse result 0 nrecs 1
syscall 114 (4, 77d0549cc000, 8003)

[ 191.8169124] panic: tcp_output: no template
[ 191.8169124] cpu0: Begin traceback...
[ 191.8269174] vpanic() at netbsd:vpanic+0x160
[ 191.8269174] snprintf() at netbsd:snprintf
[ 191.8269174] tcp_output() at netbsd:tcp_output+0x2869
[ 191.8385864] tcp_sendoob_wrapper() at netbsd:tcp_sendoob_wrapper+0xfe
[ 191.8469824] sosend() at netbsd:sosend+0x6e3
[ 191.8469824] do_sys_sendmsg_so() at netbsd:do_sys_sendmsg_so+0x231
[ 191.8469824] do_sys_sendmsg() at netbsd:do_sys_sendmsg+0xac
[ 191.8569944] compat_43_sys_sendmsg() at netbsd:compat_43_sys_sendmsg+0xea
[ 191.8569944] sys___syscall() at netbsd:sys___syscall+0x74
[ 191.8655484] syscall() at netbsd:syscall+0x181
[ 191.8655484] --- syscall (number 198) ---
[ 191.8655484] 40261a:
[ 191.8655484] cpu0: End traceback...
[ 191.8655484] fatal breakpoint trap in supervisor mode
[ 191.8655484] trap type 1 code 0 rip 0xffffffff8021ddf5 cs 0x8 rflags 0x202 cr2
 0x7f7795402000 ilevel 0x4 rsp 0xffffc58032d589f0
[ 191.8655484] curlwp 0xfffff7e8b1c63220 pid 700.1 lowest kstack 0xffffc58032d55
2c0
Stopped in pid 700.1 (driver) at        netbsd:breakpoint+0x5:  leave

mincore - uvm_fault_unwire_locked: address not in map

call 78 - mincore
arg 0: argNum d000000
arg 1: argNum 7600000000000000
arg 2: argNum 1b0000000000
read 65 bytes, parse result 0 nrecs 1
syscall 78 (d000000, 7600000000000000, 1b0000000000)

[ 141.0578675] panic: uvm_fault_unwire_locked: address not in map
[ 141.0578675] cpu0: Begin traceback...
[ 141.0691345] vpanic() at netbsd:vpanic+0x160
[ 141.0691345] snprintf() at netbsd:snprintf
[ 141.0774205] uvm_fault_unwire() at netbsd:uvm_fault_unwire
[ 141.0774205] uvm_fault_unwire() at netbsd:uvm_fault_unwire+0x29
[ 141.0774205] sys_mincore() at netbsd:sys_mincore+0x23c
[ 141.0884435] sys___syscall() at netbsd:sys___syscall+0x74
[ 141.0884435] syscall() at netbsd:syscall+0x181
[ 141.0884435] --- syscall (number 198) ---
[ 141.0996065] 40261a:
[ 141.0996065] cpu0: End traceback...
[ 141.0996065] fatal breakpoint trap in supervisor mode
[ 141.0996065] trap type 1 code 0 rip 0xffffffff8021ddf5 cs 0x8 rflags 0x202 cr2
 0x761548094000 ilevel 0 rsp 0xffff870032e48d90
[ 141.0996065] curlwp 0xffff829094e51b00 pid 646.1 lowest kstack 0xffff870032e45
2c0
Stopped in pid 646.1 (driver) at        netbsd:breakpoint+0x5:  leave

extattrctl - KASSERT fail

call 360 - extattrctl
arg 0: argBuf 7b762b514044 from 2 bytes
arg 1: argNum ff8001
arg 2: argFilename 7b762b515020 - 2 bytes from /tmp/file0
arg 3: argNum 0
arg 4: argNum 0
arg 5: argNum 2100000000
read 59 bytes, parse result 0 nrecs 1
syscall 360 (7b762b514044, ff8001, 7b762b515020, 0, 0, 2100000000)

[ 386.4528838] panic: kernel diagnostic assertion "fli->fli_trans_cnt == 0" fail
ed: file "src/sys/kern/vfs_trans.c", line 201
[ 386.4528838] cpu0: Begin traceback...
[ 386.4528838] vpanic() at netbsd:vpanic+0x160
[ 386.4648968] stge_eeprom_wait.isra.4() at netbsd:stge_eeprom_wait.isra.4
[ 386.4724138] fstrans_lwp_dtor() at netbsd:fstrans_lwp_dtor+0xbd
[ 386.4724138] exit1() at netbsd:exit1+0x1fa
[ 386.4724138] sys_exit() at netbsd:sys_exit+0x3d
[ 386.4832968] syscall() at netbsd:syscall+0x181
[ 386.4832968] --- syscall (number 1) ---
[ 386.4832968] 421b6a:
[ 386.4832968] cpu0: End traceback...
[ 386.4832968] fatal breakpoint trap in supervisor mode
[ 386.4944688] trap type 1 code 0 rip 0xffffffff8021ddf5 cs 0x8 rflags 0x202 cr2
 0xffffc100324bd000 ilevel 0 rsp 0xffffc10032ce9dc0
[ 386.4944688] curlwp 0xfffff6278e2fc240 pid 105.1 lowest kstack 0xffffc10032ce6
2c0
Stopped in pid 105.1 (driver) at        netbsd:breakpoint+0x5:  leave

pkgsrc Package

Lastly, the TriforceNetBSDSyscallFuzzer has now been made available in the form of a pkgsrc package in pkgsrc/wip as triforcenetbsdsyscallfuzzer. The package will require wip/triforceafl which was ported earlier.
All other changes mentioned can be found in the github repo.

script(1) recording

A typescript recording of a functional TriforceAFL fuzzer setup and execution is available here. Download it and replay it with script -p.

Future Work

Work that remains to be done include:

Restructuring of files
The file structure needs to be modified to suit the specific case of the Host and Target being the same OS. Right now, files are separated into Host and Target directories, this is not required.
Testing with Sanitizers enabled
Until now the fuzzing done was without using KASAN or kUBSAN. Testing with them enabled and fuzzing with them will be of major focus in the third coding period.
Improving the 'reproducer generator'
There is some scope of improvement for the prototype that was added. Incremental updates to it are to be expected.
Analysis of crash reports and fixing bugs
Documentation

Summary

So far, the TriforceNetBSDSyscallFuzzer has been made available in the form of a pkgsrc package with the ability to fuzz most of NetBSD syscalls. In the final coding period of GSoC. I plan to analyse the crashes that were found until now. Integrate sanitizers, try and find more bugs and finally wrap up neatly with detailed documentation.

Last but not least, I would like to thank my mentor, Kamil Rytarowski for helping me through the process and guiding me. It has been a wonderful learning experience so far!

GSoC 2019 Report: Incorporating the memory-hard Argon2 hashing scheme into NetBSD

2019-07-09T10:13:28+00:00

This report was prepared by Jason High as a part of Google Summer of Code 2019

Argon2 is a modern memory-hard hashing scheme designed by Biryukov et al.[1] Compared to currently supported hashing algorithms in NetBSD, memory-hard Argon2 provides improved resistance against Time Memory Trade-off (TMTO) and side-channel attacks. In our project, we are working to incorporate Argon2 into the local password management framework of NetBSD.

Phase 1 goals and work completed

Phase 1 of the project focused on incorporating the Argon2 reference implementation into NetBSD. As such, we focused on building the associated libraries and integrating the functionality into the existing password management framework. Our initial phase 1 goals were as follows

Integrate Argon2 reference code into the existing build framework
Support automated building and installation of argon2 binary and libraries
Extend the existing password management framework to support Argon2 encoding

Towards these goals, we have added the Argon2 reference code into the external source tree and created the necessary build scripts. This work allows us to successfully add Argon2 into our system by adding MKARGON2=yes to /usr/share/mk/bsd.own.mk. After successfully building and installation, we have the following

/usr/bin/argon2
/lib/libargon2.so     
/lib/libargon2.so.1   
/lib/libargon2.so.1.0

We then extended the functionality of pwhash(1) and libcrypt(3) to support Argon2 encoding. Currently we support all three Argon2 variants, although not all variants are recommended (see [1][2]). We support the following standard parameters: execution time (t), memory utiltized (m), and degree of parallelism (p). Salt length is currently fixed at the recommended 16 bytes.[1]

With our phase 1 goals successfully completed, we have the following functionality available. The argon2(1) binary allows us to easily validate parameters and encodings

m2# echo -n password|argon2 somesalt -id -p 3 -m 8
Type:           Argon2id
Iterations:     3
Memory:         256 KiB
Parallelism:    3
Hash:           97f773f68715d27272490d3d2e74a2a9b06a5bca759b71eab7c02be8a453bfb9
Encoded:        $argon2id$v=19$m=256,t=3,p=3$c29tZXNhbHQ$l/dz9ocV0nJySQ09LnSiqbBqW8p1m3Hqt8Ar6KRTv7k
0.000 seconds
Verification ok

Argon2 support has been added to pwhash(1) using the -A flag, using the form -A variant[params], where variant is one of the following: argon2i, argon2d, or argon2id. [params] is a comma-delimited list of the following: p=%d, m=%d, or t=%d (see man pwhash(1)). For example, to create an encoding of 'password' using the argon2id variant, we may execute the following

m2# pwhash -A argon2id password
$argon2id$v=19$m=4096,t=3,p=1$.SJJCiU575MDnA8s$+pjT4JsF2eLNQuLPEyhRA5LCFGQWAKsksIPl5ewTWNY

To encode 'password' using the argon2id variant with explicit specification for both parallelism and memory, we execute

m2# pwhash -Aargon2id,p=3,m=8192  password 
$argon2id$v=19$m=8192,t=3,p=3$gGs/lLnXIESuSl4H$fGuqUn2PeNeoCFqV3ASvNdkXLZ2A1wZTb2s7LTe4SE0

We support local password hashing using passwd.conf(5). We accept the same parameters as pwhash(1). For example

m1# grep -A1 testuser /etc/passwd.conf 
testuser:
        localcipher = argon2i,t=6,m=4096,p=1

With the above configuration in place, we are able to support standard password management. For example

m1# id testuser 
uid=1001(testuser) gid=100(users) groups=100(users)

m1# grep testuser /etc/master.passwd                                                                                          
testuser:$argon2i$v=19$m=4096,t=6,p=1$MpbO25MF2m4Y/aQT$9STuNmQLMSgYBVoQiXyDLGcb+DSHysJOQh1spI6qEuE:1001:100::0:0::/home/testuser:/sbin/nologin

m1# passwd testuser
Changing password for testuser.
New Password:
Retype New Password:

m1# grep testuser /etc/master.passwd  
testuser:$argon2i$v=19$m=4096,t=6,p=1$PDd65qr6JU0Pfnpr$8YOMYcwINuKHoxIV8Q0FJHG+RP82xtmAuGep26brilU:1001:100::0:0::/home/testuser:/sbin/nologin

Plans for next phase

Phase 2 will focus on code cleanup and incorporation of any improvements suggested during review. We are also extending our ATF test-set and will begin our performance evaluation. Primary deliverables for phase 2 will be a performance evaluation.

Summary

References

[1] Biryukov, Alex, Daniel Dinu, and Dmitry Khovratovich. "Argon2: new generation of memory-hard functions for password hashing and other applications." 2016 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2016.
[2] Alwen, Joël, and Jeremiah Blocki. "Towards practical attacks on argon2i and balloon hashing." 2017 IEEE European Symposium on Security and Privacy (EuroS&P). IEEE, 2017.

Write your own fuzzer for NetBSD kernel! [Part 1]

2019-07-02T22:29:52+00:00

This report was written by Maciej Grochowski as a part of developing the AFL+KCOV project.

How Fuzzing works? The dummy Fuzzer.

The easy way to describe fuzzing is to compare it to the process of unit testing a program, but with different input. This input can be random, or it can be generated in some way that makes it unexpected form standard execution perspective.

The simplest 'fuzzer' can be written in few lines of bash, by getting N bytes from /dev/rand, and putting them to the program as a parameter.

Coverage and Fuzzing

What can be done to make fuzzing more effective? If we think about fuzzing as a process, where we place data into the input of the program (which is a black box), and we can only interact via input, not much more can be done.

However, programs usually process different inputs at different speeds, which can give us some insight into the program's behavior. During fuzzing, we are trying to crash the program, thus we need additional probes to observe the program's behaviour.

Additional knowledge about program state can be exploited as a feedback loop for generating new input vectors. Knowledge about the program itself and the structure of input data can also be considered. As an example, if the input data is in the form of HTML, changing characters inside the body will probably cause less problems for the parser than experimenting with headers and HTML tags.

For open source programs, we can read the source code to know what input takes which execution path. Nonetheless, this might be very time consuming, and it would be much more helpful if this can be automated. As it turns out, this process can be improved by tracing coverage of the execution.

AFL (American Fuzzy Lop) is one of the first successful fuzzers. It uses a technique where the program is compiled with injected traces for every execution branch instruction. During the program execution, every branch is counted, and the analyzer builds a graph out of execution paths and then explores different "interesting" paths.

Now, fuzzing has become a mainstream technique, and compilers provide an option to embed fuzzing hooks at compilation time via switches.

The same process can be applied to the kernel world. However, it would be quite hard to run another program on the same machine outside of the kernel to read these counters. Because of that, they usually are made available inside the kernel.

To illustrate how that is done, we can compile a hello world program written in C for tracing the Program Counter (PC).

gcc main.c -fsanitize-coverage=trace-pc

/usr/local/bin/ld: /tmp/ccIKK7Eo.o: in function `handler':
main.c:(.text+0xd): undefined reference to `__sanitizer_cov_trace_pc'
/usr/local/bin/ld: main.c:(.text+0x1b): undefined reference to `__sanitizer_cov_trace_pc'

The compiler added additional references to the __sanitizer_cov_trace_pc, but we didn't implement them, or link with something else to provide the implementation. If we grep the head of the NetBSD kernel sources for the same function, we will find within sys/kern/subr_kcov.c an implementation: kcov(4).

Which Fuzzer should I choose?

In recent years, AFL has grown into an industry standard. Many projects have integrated it into their development process. This has caused many different bugs and issues to be found and fixed in a broad spectrum of projects (see AFL website for examples). As this technique has become mainstream, many people have started developing custom fuzzers. Some of them were just modified clones of AFL, but there were also many different and innovative approaches. Connecting a custom fuzzer or testing some unusual execution path is no longer considered as just a hackathon project, but part of security research.

I personally believe that we are still in the early state of fuzzing. A lot of interesting work and research is already available, but we cannot explain or prove why one way is better than another one, or how the reference fuzzer should work, and what are its technical specifications.

Many approaches have been developed to do efficient fuzzing, and many bugs have been reported, but most of the knowledge comes still from empirical experiments and comparison between different techniques.

Modular kcov inside the kernel

Coverage metrics inside kernel became a standard even before the fuzzing era. A primary use-case of coverage is not fuzzing, but testing, and measuring test coverage. While code coverage is well understood, kernel fuzzing is still kind of a Wild West, where most of the projects have their own techniques. There are some great projects with a large community around them, like Honggfuzz and Syzkaller. Various companies and projects manitain several fuzzers for kernel code. This shows us that as a kernel community, we need to be open and flexible for different approaches, that allow people interested in fuzzing to do their job efficiently. In return, various fuzzers can find different sets of bugs and improve the overall quality of our kernel.

In the past, Oracle made some effort to upstream an interface for AFL inside Linux kernel (see the patch here. However, the patches were rejected via the kernel community for various reasons.

We did our own research on the needs of fuzzers in context of kcov(4) internals, and quickly figured out that per-fuzzer changes in the main code do not scale up, and can leave unused code inside the kernel driver.

In NetBSD, we want to be compatible with AFL, Hongfuzz, Syzkaller and few other fuzzers, so keeping all fuzzer specific data inside one module would be hard to maintain.

One idea that we had was to keep raw coverage data inside the kernel, and process it inside the user space fuzzer module. Unfortunately, we found that current coverage verbosity in the NetBSD kernel is higher than in Linux, and more advanced traces can have thousand of entries. One of the main requirements for fuzzers is performance. If the fuzzer is slow, even if it is smarter than others, it will most likely will find fewer bugs. If it is significantly slower, then it is not useful at all. We found that storing raw kernel traces in kcov(4), copying the data into user-space, and transfoming it into the AFL format, is not an option. The performance suffers, and the fuzzing process becomes very slow, making it not useful in practice.

We decided to keep AFL conversion of the data inside the kernel, and not introduce too much complexity to the coverage part. As a current proof of concept API, we made kcov more modular, allowing different modules to implement functionality outside of the core requirements. The current code can be viewed here or on the GitHub.

KCOV Modules

As we mentioned earlier, coverage data available in the kernel is generated during tracing by one of the hooks enabled by the compiler. Currently, NetBSD supports PC and CMP tracing. The Kcov module can gather this data during the trace, convert it and expose to the user space via mmap. To write our own coverage module for new PoC API, we need to provide such operations as: open, free, enable, disable, mmap and handling traces.

This can be done via using kcov_ops structure:

static struct kcov_ops kcov_mod_ops = {
	.open = kcov_afl_open,
	.free = kcov_afl_free,
	.setbufsize = kcov_afl_setbufsize,
	.enable = kcov_afl_enable,
	.disable = kcov_afl_disable,
	.mmap = kcov_afl_mmap,
	.cov_trace_pc = kcov_afl_cov_trace_pc,
	.cov_trace_cmp = kcov_afl_cov_trace_cmp
};

During load or unload, the module must to run kcov_ops_set or kcov_ops_unset. After set, default kcov_ops are overwritten via the module. After unset, they are returned to the default.

Porting AFL as a module

The next step would be to develop a sub-module compatible with the AFL fuzzer.

To do that, the module would need to expose a buffer to user space, and from kernelspace would need to keep information about the 64kB SHM region, previous PC, and thread id. The thread id is crucial, as usually fuzzing runs few tasks. This data is gathered inside the AFL context structure:

typedef struct afl_ctx {
	uint8_t *afl_area;
	struct uvm_object *afl_uobj;
	size_t afl_bsize;
	uint64_t afl_prev_loc;
	lwpid_t lid;
} kcov_afl_t;

The most important part of the integration is to translate the execution shadow, a list of previous PCs along the execution path, to the AFL compatible hash-map, which is a pair of (prev PC, PC). That can be done according to the documentation of AFL by this method:

++afl->afl_area[(afl->afl_prev_loc ^ pc) & (bsize-1)];
afl->afl_prev_loc = pc;

In our implementation, we use a trick by Quentin Casasnovas of Oracle to improve the distribution of the counters, by storing the hashed PC pairs instead of raw.

The rest of operations, like open, mmap, and enable, can be reviewed in the GitHub repository together with the testing code that dumps 64kB of SHM data.

Debug your fuzzer

Everyone knows that kernel debugging is more complicated than programs running in the user space. Many tools can be used for doing that, and there is always a discussion about usability vs complexity of the setup. People tend to be divided into two groups: those that prefer to use a complicated setup like kernel debugger (with remote debugging), and those for which tools like printf and other simple debug interfaces are sufficient enough.

Enabling coverage brings even more complexity to kernel debugging. Everyone's favourite printf also becomes traced, so putting it inside the trace function will result in a stack overflow. Also, touching any kcov internal structures becomes very tricky and, should be avoided if possible.

A debugger is still a sufficient tool. However, as we mentioned earlier, trace functions are called for every branch, which can be translated to thousand or even tens of thousand break points before any specific condition will occur.

I am personally more of a printf than gdb guy, and in most cases, the ability to print variables' contents is enough to find the issues. For validating my AFL kcov plugin, I found out that debugcon_printf written by Kamil Rytarowski is a great tool.

Example of debugcon_printf

To illustrate that idea, lets say that we want to print every PC trace that comes to our AFL submodule.

The most intuitive way would be put printf("#:%p\n", pc) at very beginning of the kcov_afl_cov_trace_pc, but as mentioned earlier, such a trick would end up with a kernel crash whenever we enable tracing with our module. However, if we switch printf to the debugcon_printf, and add a simple option to our QEMU:

-debugcon file:/tmp/qemu.debug.log -global isa-debugcon.iobase=0xe9

we can see on our host machine that all traces are written to the file qemu.debug.log.

kcov_afl_cov_trace_pc(void *priv, intptr_t pc) {
	kcov_afl_t *afl = priv;

	debugcon_printf("#:%x\n", pc);

	++afl->afl_area[(afl->afl_prev_loc ^ pc) & (afl->afl_bsize-1)];
	afl->afl_prev_loc = _long_hash64(pc, BITS_PER_LONG);

	return;
}

Future work

The AFL submodule was developed as part of the AFL FileSystems Fuzzing project to simplify the fuzzing of different parts of the NetBSD kernel.

I am using it currently for fuzzing different filesystems. In a future article I plan to show more practical examples.

Another great thing to do will be to refactor KLEAK, which is using PC trace data and is disconnected from kcov. A good idea would be to rewrite it as a kcov module, to have one unified way to access coverage data inside NetBSD kernel.

Summary

In this article, we familiarized the reader with the technique of fuzzing, starting from theoretical background up to the level of kernel fuzzing.

Based on these pieces of information, we demonstrated the purpose of the a modular coverage framework inside the kernel and an example implementation of submodule that can be consumed by AFL.

More details can be learned via downloading and trying the sample code shown in the example.

At the end of this article, I want to thank Kamil, for such a great idea for a project, and for allowing me to work on NetBSD development.

GSoC 2019 Report: Adding NetBSD KNF to clang-format, Part 1

2019-06-29T05:40:53+00:00

Prepared by Manikishan Ghantasala (shannu) as a part of Google Summer of Code 2019.

Greetings everyone, I am Manikishan an Undergraduate pursuing my Bachelors Degree in Computer Science from Amrita Vishwa Vidyapeetham, Amritapuri, Kerala, India. I have been very interested in working on the lower level development such as Operating Systems, Kernels, and Compilers. I have also worked on building a computer from scratch. The project is named From Nand To Tetris . It had helped me elevate my interest in the field of Operating Systems and to apply for this organization. I am very grateful to be a part of this program and also would like to thank the community and my mentors for granting me the opportunity and being supportive at all times.

Regarding the first evaluation, it has been quite interesting working on Add KNF (NetBSD style) in clang-format project. I love the NetBSD community and look forward to continue. It has helped me to learn a lot during this period. It has been challenging and amazing so far.

This is a blog post about the work I have done prior to the first evaluation period.

What is clang-format?

Clang-format is a set of tools that format code built upon LibFormat. It supports some of the coding styles such as LLVM, Google, Chromium, Mozilla, Webkit which can be chosen by -style=<StyleName>. There is another option of writing a config file named .clang-format with custom style rules. My project is to add NetBSD KNF support to clang-format.

With the added support to clang-format, it will be able to format the code according to the NetBSD Style when run with -style=NetBSD.

Getting familiar to LLVM Source

For the first week, I went on exploring LLVM source to find out implementations for the similar style rules I have listed in my proposal. I have managed to figure out a way for the same with the help of my supportive mentors. I have implemented two styles:

BitFieldDeclarationsOnePerLine
SortNetBSDIncludes

in the first phase.

About BitFieldDeclarationsOnePerLine

This rule lines up BitField declarations on consecutive lines with correct indentation.

Example:

Input:


          unsigned int bas :3,  hh : 4, jjj : 8;


          unsigned int baz:1,
                       fuz:5,
                   zap:2;

Output:


          unsigned int bas:3,
                       hh:4,
                       jjj:8;
          
          
          unsigned int baz:1,
                       fuz:5,
                       zap:2;

Submitted a patch regarding this in the differential for review.

-> Patch: https://reviews.llvm.org/D63062

There is a bug in the implementation where the indentation breaks when there is a comment in between the bitfields, which I kept aside for getting on to go with the next style and will fix it in the coming weeks.

About SortNetBSDIncludes

Clang-format has a native SortIncludes style, which sorts all the headers in alphabetical order where NetBSD headers follow a special order due to dependencies between each header. After discussing with my mentors and in the tech-toolchain we have come up with more precise order in which we should follow for headers:

<sys/param.h>
<sys/types.h>
<sys/*> -- kernel headers
<uvm/*> -- vm headers
<net*/*> -- network protocol headers
<*fs/*> -- filesystem headers
<dev/*> -- device driver headers
<protocols/.h>
<machine/*>
<[arch]/*>
< /usr includes next>
<paths.h>
User include files

I have made more smarter version with Regex with necessary changes from the approach I had before which was a harcoded one, the patch for this will be up by this weekend.

Plan for the next phase

I was a bit behind the proposed schedule because understanding the LLVM source took more time than expected. I am confident that I will be speeding up my progress in the upcoming phase and complete as many styles possible. The final plan is to add support for NetBSD in clang-format which can be used by --style=NetBSD.

Summary

In the coming weeks, I will fix and add all the missing features to existing styles and new ones, and concentrate on optimizing and testing following all guidelines of NetBSD and make sure a formatted code doesn't cause a build fail.

Lastly, I would like to thank my wonderful mentors Michal and Christos for helping me to the process and guiding me whenever needed.

Enhancing Syzkaller support for NetBSD, Part 1

2019-06-27T18:12:32+00:00

Prepared by Siddharth Muralee(@R3x) as a part of Google Summer of Code 2019

As a part of Google Summer of Code 19, I am working on improving the support for Syzkaller kernel fuzzer. Syzkaller is an unsupervised coverage-guided kernel fuzzer, that supports a variety of operating systems including NetBSD. This report details the work done during the first coding period.

Syzkaller

Initially, Syzkaller was developed with Linux kernel fuzzing in mind, but now it's being extended to support other OS kernels as well. The main target of the Syzkaller fuzzer is the system call layer.

Thanks to Google, we now have a 24/7 continuous fuzzing instance on a Google Cloud engine for NetBSD managed by Syzbot, a sub-system of the Syzkaller fuzzer. Syzbot maintains a dashboard where it reports all the bugs that were found. Syzbot also maintains a mailing list to report bugs found.

Syzbot is currently fuzzing NetBSD-HEAD which it updates from the Github mirror.

You can go through the Syzkaller documentation or take a look at my slides from Troopers 19 for learning more about Syzkaller.

Automated Image creation

Syzkaller required NetBSD images with ssh-keys for continous fuzzing. Due to frequent updates to HEAD sometimes images become unusable. So I decided to automate the process of creating NetBSD images with ssh-keys and appropriate settings.

Initial attempts with packer by Hashicorp failed because the boot sequence was controlled by waits and not output matching. So I wrote a shell script which basically adds a wrapper around anita to do the job.

Kcov(4) support

Kernel Code Coverage(KCov) is a compiler instrumented feature which helps us in finding the code paths executed inside the kernel for a certain set of system calls. An initial port was done by me with modifications by @kamil and @maxv.

Syzkaller uses coverage for modifying and mutating the arguments of syscalls. Coverage information for NetBSD is publicly available.

Sanitizers Support

Sanitizers are compiler instrumented tools to improve code correctness. Currently, NetBSD supports Kernel Address Sanitizer(KASAN) and Kernel Undefined behaviour Sanitizer(KUBSAN).

We use the Sanitizers to increase the chances of finding bugs. Syzkaller now compiles kernels with the Sanitizers.

Report Generation and Symbolization

Syzkaller logs the console and in the event of a crash it records the log to find out details about the crash, these details are then used to classify the crash and create a report.

For better crash reports, we decided to enable a ddb(4) shell on event of a kernel panic. This allowed us to print backtraces, details of locks and processes.

Also I added support for better Symbolization of the reports. Symbolization is adding more details in the crash report to make them easier for developers to go through. Currently we have added file names and line numbers for functions in the crash based on the kernel object (netbsd.gdb).

Initial backtrace :

do_ptrace() at netbsd:do_ptrace+0x33d
sys_ptrace() at netbsd:sys_ptrace+0x71
sys_syscall() at netbsd:sys_syscall+0xf5

After Symbolization :

do_ptrace() at netbsd:do_ptrace+0x33d sys/kern/sys_ptrace_common.c:1430
sys_ptrace() at netbsd:sys_ptrace+0x71 sys/kern/sys_ptrace.c:218
sys_syscall() at netbsd:sys_syscall+0xf5 sy_call sys/sys/syscallvar.h:65 [inline]

Syscall Coverage Improvements

Syzkaller uses a pseudo-formal grammar for system calls. It uses the same to build programs to fuzz the kernel. The files related to the same are stored at sys/netbsd in the syzkaller repo. These files were a rough copy from the linux files with whatever was unable to compile removed.

We had to review all the existing syscall descriptions, find the missing ones and add them.

I wrote a python script which would help in finding out existing syscall descriptions and match them with the NetBSD description. The script also finds missing syscalls and logs them.

I have listed the system calls that are currently fuzzed with Syzkaller.

We are currently working on adding and improving descriptions for the more important syscalls. If you would like to see a NetBSD system call fuzzed you can reach out to us.

Summary

During the last month, I was focusing on improving support for NetBSD. I have managed to complete the tasks that we had planned for the first evaluation.

For the next coding period (28th June - 22nd July) I will be working on adding support for fuzzing the Network layer.

XSAVE and compat32 kernel work for LLDB

2019-06-05T16:46:07+00:00

Upstream describes LLDB as a next generation, high-performance debugger. It is built on top of LLVM/Clang toolchain, and features great integration with it. At the moment, it primarily supports debugging C, C++ and ObjC code, and there is interest in extending it to more languages.

In May, I was primarily continuing the work on new ptrace interface. Besides that, I've found and fixed a bug in ptrace() compat32 code, pushed LLVM buildbot to ‘green’ status and found some upstream LLVM regressions. More below.

Buildbot status update

Traditionally, let's start with buildbot updates. The buildbot is providing continuous integration for a number of LLVM projects on NetBSD, including LLDB, clang and clang's runtime libraries. Its available at: http://lab.llvm.org:8011/builders/netbsd-amd64.

Previously, the most significant problem in using buildbot was flakiness of LLDB tests which resulted in frequent false positives. I was able to finally reduce this effect through reducing the number of parallel test runs for LLDB tests. To avoid slowing down other test suites, I have used a sed hack that overrides job count directly in the specific lit invocation.

Additionally, I have fixed a few regressions during the period, notably:

worked around missing nexttowardl() in NetBSD 8 causing libc++ test failure, by using std::nextafter() in the problematic test: r360673,
fixed compiler path test to work correctly without specific linker being available: r360761,
fixed inferring source paths in libunwind that prevented the tests from finding libc++: r361931,
removed test case that relied on read() attempt from a directory producing very specific error message: r362404 (NB: this failed because NetBSD permits reading from directory descriptors).

Those fixes permitted the buildbot to become green for a short period of time. Sadly, shortly afterwards one of AMDGPU tests started failing and we are still trying to find the cause.

Adding register read/write tests to ATF tests

Last month, I have implemented a number of register reading/writing tests for LLDB. This month I've introduced matching tests inside NetBSD's ATF test suite. This provides the ability to test NetBSD's ptrace implementation directly on the large variety of platforms and kernels supported by NetBSD. With the dynamic development of NetBSD, running LLDB tests everywhere would not be feasible.

While porting the tests, I've made a number of improvements, some of them requested specifically by LLDB upstream. Those include:

starting to use better input/output operands for assembly, effectively reducing the number of direct register references and redundant code: r359978,
using more readable/predictable constants for register data, read part: r360041, write part: r360154,
using %0 and %1 operands to reference memory portably between i386 and amd64: r360148.

The relevant NetBSD commits for added tests are (using the git mirror):

general-purpose register reading tests: 7a58d92435a9,
fix to the above: split tests for reading i386 gp registers as not to require MMX: 06be77bbafa6,
r8..r15 amd64 register reading tests: 86f6b1d4dab6,
mm & xmm register reading tests: 3ef02e1666ae,
general-purpose register writing tests: 95bfedcb6a89,
mm & xmm register writing tests: 2c8335920f61.

While working on this, I've also noticed that struct fpreg and struct xmmregs are not fully specified on i386. In bbc3f184d470, I've added the fields needed to make use of those structures convenient.

Fixing compat32: request mapping and debug registers

Kamil has asked me to look into PR#54233 indicating problems with 32-bit application debugging on amd64. While the problem in question most likely combines multiple issues, one specifically related to my work was missing PT_*DBREGS support in compat32.

While working on this, I've found out that the functions responsible for implementing those requests were not called at all. After investigating, I've came to the following conclusion. The i386 userland code has passed PT_* request codes corresponding to i386 headers to the compat32 layer. The compat32 layer has passed those codes unmodified to the common kernel code and compared them to PT_* constants available in kernel code which happened to be amd64 constants.

This worked fine for low requests numbers that happened to match on both architectures. However, i386 adds two additional requests (PT_*XMMREGS) after PT_SETFPREGS, and therefore all remaining requests are offset.

To solve this, I've created a request code mapping function that converts i386 codes coming from userland to the matching amd64 values used in the kernel. For the time being, this supports only requests common to both architectures, and therefore PT_*XMMREGS can't be implemented without further hacking it.

Once I've managed to fix compat32, I went ahead to implement PT_*DBREGS in compat32. Kamil has made an initial implementation in the past but it was commented out and lacked input verification. However, I've chosen to change the implementation a bit and reuse x86_dbregs_read() and x86_dbregs_write() functions rather than altering pcb directly. I've also added the needed value checks for PT_SETDBREGS.

Both changes were committed to /usr/src:

Initial XSAVE work

In the previous report, I have been considering which approach to take in order to provide access to the additional FPU registers via ptrace. Eventually, the approach to expose the raw contents of XSAVE area got the blessing, and I've started implementing it.

However, this approach proved impractical. The XSAVE area in standard format (which we are using) consists of three parts: FXSAVE-compatible legacy area, XSAVE header and zero or more extended components. The offsets of those extended components turned out to be unpredictable and potentially differing between various CPUs. The architecture developer's manual indicates that the relevant offsets can be obtained using CPUID calls.

Apparently both Linux and FreeBSD did not take this into consideration when implementing their API, and they effectively require the caller to issue CPUID calls directly. While such an approach could be doable in NetBSD, it would prevent core dumps from working correctly on a different CPU. Therefore, it would be necessary to perform the calls in kernel instead, and include the results along with XSAVE data.

However, I believe that doing so would introduce unnecessary complexity for no clear gain. Therefore, I proposed two alternative solutions. They were to either:

copy XSAVE data into custom structure with predictable indices, or
implement separate PT_* requests for each component group, with separate data structure each.

Comparison of the two proposed solutions

Both solutions are roughly equivalent. The main difference between them is that the first solution covers all extended registers (and is future-extensible) in one request call, while the second one requires new pair of requests for each new register set.

I personally prefer the former solution because it reduces the number of ptrace calls needed to perform typical operations. This is especially relevant when reading registers whose contents are split between multiple components: YMM registers (whose lower bits are in SSE area), and lower ZMM registers (whose lower bits are YMM registers).

Example code reading the ZMM register using a single request solution would look like:

struct xstate xst;
struct iovec iov;
char zmm_reg[64];

iov.iov_base = &xst;
iov.iov_len = sizeof(xst);

ptrace(PT_GETXSTATE, child_pid, &iov, 0);

// verify that all necessary components are available
assert(xst.xs_xstate_bv & XCR0_SSE);
assert(xst.xs_xstate_bv & XCR0_YMM_Hi128);
assert(xst.xs_xstate_bv & XCR0_ZMM_Hi256);

// combine the values
memcpy(&zmm_reg[0], &xst.xs_fxsave.fx_xmm[0], 16);
memcpy(&zmm_reg[16], &xst.xs_ymm_hi128.xs_ymm[0], 16);
memcpy(&zmm_reg[32], &xst.xs_zmm_hi256.xs_zmm[0], 32);

For comparison, the equivalent code for the other variant would roughly be:

#if defined(__x86_64__)
struct fpreg fpr;
#else
struct xmmregs fpr;
#endif
struct ymmregs ymmr;
struct zmmregs zmmr;
char zmm_reg[64];

#if defined(__x86_64__)
ptrace(PT_GETFPREGS, child_pid, &fpr, 0);
#else
ptrace(PT_GETXMMREGS, child_pid, &fpr, 0);
#endif
ptrace(PT_GETYMMREGS, child_pid, &ymmr, 0);
ptrace(PT_GETZMMREGS, child_pid, &zmmr, 0);

memcpy(&zmm_reg[0], &fpr.fxstate.fx_xmm[0], 16);
memcpy(&zmm_reg[16], &ymmr.xs_ymm_hi128.xs_ymm[0], 16);
memcpy(&zmm_reg[32], &zmmr.xs_zmm_hi256.xs_zmm[0], 32);

I've submitted a patch set implementing the first solution, as it was easier to convert to from the initial approach. If the feedback indicates the preference of the other solution, a conversion to it should also be easier to the other way around. It is available on tech-kern mailing list: [PATCH 0/2] PT_{GET,SET}XSTATE implementation, WIP v1.

The initial implementation should support getting and setting x87, SSE, AVX and AVX-512 registers (i.e. all types currently enabled in the kernel). The tests cover all but AVX-512. I have tested it on native amd64 and i386, and via compat32.

Future plans

The most immediate goal is to finish the work on XSAVE. This includes responding to any feedback received, finding AVX-512 hardware to test it on, writing tests for AVX-512 registers and eventually committing the patches to the NetBSD kernel. Once this is done, I need to extend XSAVE support into core dumps, and implement userland-side of both into LLDB.

Besides that, the next items on TODO are:

Adding support for debug registers (moved from last month's TODO).
Adding support to backtrace through signal trampoline.
Working on i386 and aarch64 LLDB port.

In the meantime, Kamil's going to continue working on improving fork and thread support kernel-side, preparing it for my work LLDB-side.

This work is sponsored by The NetBSD Foundation

http://netbsd.org/donations/#how-to-donate

LLDB: extending CPU register inspection support

2019-05-02T20:56:28+00:00

In February, I have started working on LLDB, as contracted by the NetBSD Foundation. So far I've been working on reenabling continuous integration, squashing bugs, improving NetBSD core file support and updating NetBSD distribution to LLVM 8 (which is still stalled by unresolved regressions in inline assembly syntax). You can read more about that in my Mar 2019 report.

In April, my main focus was on fixing and enhancing the support for reading and writing CPU registers. In this report, I'd like to shortly summarize what I have done, what I have learned in the process and what I still need to do.

Buildbot status update

Last month I reported a temporary outage of buildbot service. I am glad to follow up on that and inform you that the service has been restored and the results of CI testing are once again available at: http://lab.llvm.org:8011/builders/netbsd-amd64. While the tests are currently failing, they still serve as useful source of information on potential issues and regressions.

The new discoveries include update on flaky tests problem. It turned out that the flaky markings I've tried to use to workaround it does not currently work with the lit test runner. However, I am still looking for a good way of implementing this. I will probably work on it more when I finish my priority tasks. It is possible that I will just skip the most problematic instead for the time being instead.

Additionally, the libc++ test suite identified that NetBSD is missing the nexttowardl() function. Kamil noticed that and asked me if I could implement it. From quick manpage reading, I've came to the conclusion that nexttowardl() is equivalent to nextafterl(), and appropriately implemented it as an alias: 517c7caa3d9643 in src.

Fixing MM register support

The first task in my main TODO was to fix a bug in reading/writing MM registers that was identified earlier. The MM registers were introduced as part of MMX extensions to x86, and they were designed as overlapping with the earlier ST registers used by x87 FPU. For this reason, they are returned by the ptrace() call as a single fx_87_ac array whose elements are afterwards to work on both kinds of registers.

The bug in question turned out to be mistaken use of fx_xmm instead of fx_87_ac. As a result, the values of mm0..mm7 registers were mapped to subsets of xmm0..xmm7 registers, rather than the correct set of st(0)..st(7) registers. The fix for the problem in question landed as r358178.

However, the fix itself was the easier part. The natural consequence of identifying a problem with the register was to add a regression test for it. This in turn triggered a whole set of events that deserve a section of their own.

Adding tests for register operations

Initially, the test for MM and XMM registers consisted of a simple program written in pure amd64 assembler that wrote known patterns to the registers in question, then triggered SIGTRAP via int3, and a lit test that run LLDB in order to execute the program, read registers and compare their values to expected patterns. However, following upstream recommendations it quickly evolved.

Firstly, upstream suggested replacing the assembly file with inline assembly in C or C++ program, in order to improve portability between platforms. As a result, I ended up learning how to use GCC extended inline assembly syntax (whose documentation is not exactly the most straightfoward to use) and created a test case that works fine both for i386 and amd64, and in a wide range of platforms supported by LLDB.

Secondly, it was necessary to restrict the tests into native runs on i386 and amd64 hardware. I discovered that lit partially provides for this, by defining native feature whenever LLDB is being built as native executable (vs cross-compiling). It also defined a few platform-related features, so it seemed only natural to extend them to provide explicit target-x86 and target-x86_64 features, corresponding to i386 and amd64 targets. This was done in r358177.

Thirdly, upstream asked me to add tests also for other register types, as well as for writing registers. This overlapped with our need to test new register routines for NetBSD, so I've focused on them.

The main problem in adding more tests was that I needed to verify whether the processor supported specific instruction sets. For the time being, it seemed reasonable to assume that every possible user of LLDB would have at least SSE, and to filter tests specific to long mode on amd64 platform. However, adding tests for registers introduced by AVX extension required explicit check.

I have discussed the problem with Pavel Labath of LLDB upstream, and considered multiple options. His suggestion was to make the test program itself run cpuid instruction, and exit with a specific status if the needed registers are not supported. Then I could catch this status from dotest.py test and mark the test as unsupported. However, I really preferred using plain lit over dotest.py (mostly because it naturally resembled LLDB usage), and wanted to avoid duplicating cpuid code in multiple tests.

However, lit does not seem to support translating a specific exit status into 'unsupported'. The 'lit way' of solving this is to determine whether the necessary feature is available up front, and make the test depend on it. Of course, the problem was how to check supported CPU extensions from within lit.

Firstly, I've considered the possibility of determining cpuinfo from within Python. This would be the most trivial option, however Python stdlib does not seem to provide appropriate functions and I wanted to avoid relying on external modules.

Secondly, I've considered the possibility of running clang from within lit in order to build a simple test program running cpuid, and using it to fill the supported features.

Finally, I've arrived at the simpler idea of making lit-cpuid, an additional utility program built as part of LLDB. This program uses very nice cpuid API exposed by LLVM libraries in order to determine the available extensions and print them for lit's use. This landed as r359303 and opened the way for more register tests.

To this moment, I've implemented the following tests:

tests for mm0..mm7 64-bit MMX registers and xmm0..xmm7 128-bit SSE registers mentioned above, common to i386 and amd64; read: r358178, write: r359681.
tests for the 8 general purpose registers: *AX..*DX, *SP, *BP, *SI, *DI, in separate versions for i386 (32-bit registers) and amd64 (64-bit registers); read: r359438, write: r359441.
tests for the 8 additional 64-bit general purpose registers r8..r15, and 8 additional 128-bit xmm8..xmm15 registers introduced in amd64; read: r359210, write: r359682.
tests for the 256-bit ymm0..ymm15 registers introduced by AVX, in separate versions for i386 (where only ymm0..ymm7 are available) and amd64; read: r359304, write: r359783.
tests for the 512-bit zmm0..zmm31 registers introduced by AVX-512, in separate versions for i386 (where only zmm0..zmm7 are available) and amd64; read: r359439, write: r359797.
tests for the xmm16..xmm31 and ymm16..ymm31 registers that were implicitly added by AVX-512 (xmm, ymm and zmm registers overlap/extend their predecessors); read: r359780, write: r359797.

Fixing memory reading and writing routine

The general-purpose register tests were initially failing on NetBSD. More specifically, the test worked correctly to the point of reading registers but afterwards lldb indicated a timeout and terminated the program instead of resuming it.

While investigating this, I've discovered that it is caused by overwriting RBP. Curious enough, it happened only when large values were written to it. I've 'bisected' it to an approximate max value that still worked fine, and Kamil has identified it to be close to vm.maxaddress.

GDB did not suffer from this issue. I've discussed it with Pavel Labath and he suggested it might be related to unwinding. Upon debugging it further, I've noticed that lldb-server is apparently calling ptrace() in an infinite loop, and this is causing communications with the CLI process (LLDB is using client-server model internally) to timeout. Ultimately, I've pinpointed it to memory reading routine not expecting read to set piod_len to 0 bytes (EOF). Apparently, this is exactly what happens when you try to read past max virtual memory address.

I've made a patch for this. While reviewing it, Kamil also noticed that the routines are not summing up results of multiple split read/write calls as well. I've addressed both issues in r359572

Necessary extension of ptrace interface

At the moment, NetBSD implements 4 requests related to i386/amd64 registers:

PT_[GS]ETREGS which covers general-purpose registers, IP, flags and segment registers,
PT_[GS]ETFPREGS which covers FPU registers (and xmm0..xmm15 registers on amd64),
PT_[GS]ETDBREGS which covers debug registers,
PT_[GS]ETXMMREGS which covers xmm0..xmm15 registers on i386 (not present on amd64).

The interface is missing methods to get AVX and AVX-512 registers, namely ymm0..ymm15 and zmm0..zmm31. Apparently there's struct xsave_ymm for the former in kernel headers but it is not used anywhere. I am considering different options for extending this.

Important points worth noting are that:

YMM registers extend XMM registers, and therefore overlap with them. The existing struct xsave_ymm seems to use that, and expect only the upper half of YMM register to be stored there, with the lower half being accessible via XMM. Similar fact holds for ZMM vs YMM.
AVX-512 increased the register count from 16 to 32. This means that there are 16 new XMM registers that are not accessible via current API.

This also opens questions about future extensibility of the interface. After all, we are not only seeing new register types added but also an increase in number of registers of existing types. What I'd really like to avoid is having an increasingly cluttered interface.

How are other systems solving it?

Linux introduced PT_[GS]ETREGSET request that accepts a NT_* constant identifying register set to operate on, and iovec structure containing buffer location and size. For x86, the constants equivalent to older PT_* requests are available, and a NT_X86_XSTATE that uses full XSAVE area. The interface supports operating on complete XSAVE area only, and requires the caller to identify the correct size for the CPU used beforehand.

FreeBSD introduced PT_[GS]ETXSTATE request that operates on full or partial XSAVE data. If the buffer provided is smaller than necessary, it is partially filled. Additionally, PT_GETXSTATE_INFO is provided to get the buffer size for the CPU used.

A similar solution would be convenient for future extensions, as the caller would be able to implement them without having the kernel updated. Its main disadvantage is that it requires all callers to implement XSAVE area format parsing. Pavel Labath also suggested that we could further optimize it by supplying an offset argument, in order to support partial XSAVE area transfer.

An alternative is to keep adding new requests for new register types, i.e. PT_[GS]ETYMMREGS for YMM, and PT_[GS]ETZMMREGS for ZMM. In this case, it is necessary to further discuss the data format used. It could either be the 'native' XSAVE format (i.e. YMM would contain only upper halves of the registers, ZMM would contain upper halves of zmm0..zmm15 and complete data of zmm16..zmm31), or more conveniently to clients (at the cost of data duplication) whole registers. If the latter, another question is posed: should we provide a dedicated interface for xmm16..xmm31 (and ymm16..ymm31) then, or infer them from zmm16..zmm31 registers?

Future plans

My work continues with the two milestones from last month, plus a third that's closely related:

Add support for FPU registers support for NetBSD/i386 and NetBSD/amd64.
Support XSAVE, XSAVEOPT, ... registers in core(5) files on NetBSD/amd64.
Add support for Debug Registers support for NetBSD/i386 and NetBSD/amd64.

The most important point right now is deciding on the format for passing the remaining registers, and implementing the missing ptrace interface kernel-side. The support for core files should follow using the same format then.

Userland-side, I will work on adding matching ATF tests for ptrace features and implement LLDB side of support for the new ptrace interface and core file notes. Afterwards, I will start working on improving support for the same things on 32-bit (i386) executables.

This work is sponsored by The NetBSD Foundation

http://netbsd.org/donations/#how-to-donate

From Zero to NVMM

2019-04-09T20:07:21+00:00

Six months ago, I told myself I would write a small hypervisor for an old x86 AMD CPU I had. Just to learn more about virtualization, and see how far I could go alone on my spare time. Today, it turns out that I've gone as far as implementing a full, fast and flexible virtualization stack for NetBSD. I'd like to present here some aspects of it.

Design Aspects

General Considerations

In order to achieve hardware-accelerated virtualization, two components need to interact together:

A kernel driver that will switch machine's CPU to a mode where it will be able to safely execute guest instructions.
A userland emulator, which talks to the kernel driver to run virtual machines.

Simply said, the emulator asks the kernel driver to run virtual machines, and the kernel driver will run them until a VM exit occurs. When this happens, the kernel driver returns to the emulator, telling it along the way why the VM exit occurred. Such exits can be IO accesses for instance, that a virtual machine is not allowed to perform, and that require the emulator to virtualize them.

The NVMM Design

NVMM provides the infrastructure needed for both the kernel driver and the userland emulators.

The kernel NVMM driver comes as a kernel module that can be dynamically loaded into the kernel. It is made of a generic machine-independent frontend, and of several machine-dependent backends. In practice, it means that NVMM is not specific to x86, and could support ARM 64bit for example. During initialization, NVMM selects the appropriate backend for the system. The frontend handles everything that is not CPU-specific: the virtual machines, the virtual CPUs, the guest physical address spaces, and so forth. The frontend also has an IOCTL interface, that a userland emulator can use to communicate with the driver.

When it comes to the userland emulators, NVMM does not provide one. In other words, it does not re-implement a Qemu, a VirtualBox, a Bhyve (FreeBSD) or a VMD (OpenBSD). Rather, it provides a virtualization API via the libnvmm library, which allows to effortlessly add NVMM support in already existing emulators. This API is meant to be simple and straightforward, and is fully documented. It has some similarities with WHPX on Windows and HVF on MacOS.

Fig. A: General overview of the NVMM design.

The Virtualization API: An Example

The virtualization API is installed by default on NetBSD. The idea is to provide an easy way for applications to use NVMM to implement services, which can go from small sandboxing systems to advanced system emulators.

Let's put ourselves in the context of a simple C application we want to write, to briefly showcase the virtualization API. Note that this API may change a little in the future.

Creating Machines and VCPUs

In libnvmm, each machine is described by an opaque nvmm_machine structure. We start with:

#include <nvmm.h>
...
	struct nvmm_machine mach;
	nvmm_machine_create(&mach);
	nvmm_vcpu_create(&mach, 0);

This creates a machine in 'mach', and then creates VCPU number zero (VCPU0) in this machine. This VM is associated with our process, so if our application gets killed or exits bluntly, NVMM will automatically destroy the VM.

Fetching and Setting the VCPU State

In order to operate our VM, we need to be able to fetch and set the state of its VCPU0, that is, the content of VCPU0's registers. Let's say we want to set the value '123' in VCPU0's RAX register. We can do this by adding four more lines:

	struct nvmm_x64_state state;
	nvmm_vcpu_getstate(&mach, 0, &state, NVMM_X64_STATE_GPRS);
	state.gprs[NVMM_X64_GPR_RAX] = 123;
	nvmm_vcpu_setstate(&mach, 0, &state, NVMM_X64_STATE_GPRS);

Here, we fetch the GPR component of the VCPU0 state (GPR stands for General Purpose Registers), we set RAX to '123', and we put the state back into VCPU0. We're done.

Allocating Guest Memory

Now is time to give our VM some memory, let's say one single page. (What follows is a bit technical.)

The VM has its own MMU, which translates guest virtual addresses (GVA) to guest physical addresses (GPA). A secondary MMU (which we won't discuss) is set up by the host to translate the GPAs to host physical addresses. To give our single page of memory to our VM, we need to tell the host to create this secondary MMU.

Then, we will want to read/write data in the guest memory, that is to say, read/write data into our guest's single GPA. To do that, in NVMM, we also need to tell the host to associate the GPA we want to read/write with a host virtual address (HVA) in our application. The big picture:

Fig. B: Memory relations between our application and our VM.

In Fig. B above, if the VM wants to read data at virtual address 0x4000, the CPU will perform a GVA→GPA translation towards the GPA 0x3000. Our application is able to see the content of this GPA, via its virtual address 0x2000. For example, if our application wants to zero out the page, it can simply invoke:

	memset((void *)0x2000, 0, PAGE_SIZE);

With this system, our application can modify guest memory, by reading/writing to it as if it was its own memory. All of this sounds complex, but comes down to only the following four lines of code:

	uintptr_t hva = (uintptr_t)mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
	gpaddr_t gpa = 0x3000;
	nvmm_hva_map(&mach, hva, PAGE_SIZE);
	nvmm_gpa_map(&mach, hva, gpa, PAGE_SIZE, PROT_READ|PROT_WRITE);

Here we allocate a simple HVA in our application via mmap. Then, we turn this HVA into a special buffer that NVMM will be able to use. Finally, we tell the host to link the GPA (0x3000) towards the HVA. From then on, the guest is allowed to touch what it perceives as being a simple physical page located at address 0x3000, and our application can directly modify the content of this page by reading and writing into the address pointed to by 'hva'.

Running the VM

The final step is running the VM for real. This is achieved with a VCPU Loop, which runs our VCPU0 and processes the different exit reasons, typically in the following form:

	struct nvmm_exit exit;
	while (1) {
		nvmm_vcpu_run(&mach, 0, &exit);
		switch (exit.reason) {
		case NVMM_EXIT_NONE:
			break; /* nothing to do */
		case ... /* completed as needed */
		}
	}

The nvmm_vcpu_run function blocks, and runs the VM until an exit or a rescheduling occurs.

Full Code

We're done now: we know how to create a VM and give it VCPUs, we know how to modify the registers of the VCPUs, we know how to allocate and modify guest memory, and we know how to run a guest.

Let's sum it all up in one concrete example: a calculator that runs inside a VM. This simple application receives two 16bit ints as parameters, launches a VM that performs the addition of these two ints, fetches the result, and displays it.

Full code: calc-vm.c

That's about it, we have our first NVMM-based application in less than 100 lines of C code, and it is an example of how NetBSD's new virtualization API can be used to easily implement VM-related services.

Advanced Use of the Virtualization API

Libnvmm can go farther than just providing wrapper functions around IOCTLs. Simply said, certain exit reasons are very complex to handle, and libnvmm provides assists that can emulate certain guest operations on behalf of the userland emulator.

Libnvmm embeds a comprehensive machinery, made of three main components:

The MMU Walker: the component in charge of performing a manual GVA→GPA translation. It basically walks the MMU page tree of the guest; if the guest is running in x86 64bit mode for example, it will walk the four layers of pages in the guest to obtain a GPA.
The Instruction decoder: fetches and disassembles the guest instructions that cause MMIO exits. The disassembler uses a Finite State Machine. The result of the disassembly is summed up in a structure that is passed to the instruction emulator, possibly several times consecutively.
The instruction emulator: as its name indicates, it emulates the execution of an instruction. Contrary to many other disassemblers and hypervisors, NVMM makes a clear distinction between the decoder and the emulator.

An NVMM-based application can therefore avoid the burden of implementing these components, by just leveraging the assists provided in libnvmm.

Security Aspects

NVMM can be used in security products, such as sandboxing systems, to provide contained environments. Without elaborating more on my warplans, this is a project I've been thinking about for some time on NetBSD.

One thing you may have noticed from Fig. A, is that the complex emulation machinery is not in the kernel, but in userland. This is an excellent security property of NVMM, because it reduces the risk for the host in case of bug or vulnerability – the host kernel remains unaffected –, and also has the advantage of making the machinery easily fuzzable. Currently, this property is not found in other hypervisors such as KVM, HAXM or Bhyve, and I hope we'll be able to preserve it as we move forward with more backends.

Another security property of NVMM is that the assists provided by libnvmm are invoked only if the emulator explicitly called them. In other words, the complex machinery is not launched automatically, and an emulator is free not to use it if it doesn't want to. This can limit the attack surface of applications that create limited VMs, and want to keep things simple and under control as much as possible.

Finally, NVMM naturally benefits from the modern bug detection features available in NetBSD (KASAN, KUBSAN, and more), and from NetBSD's automated test framework.

Performance Aspects

Contrary to other pseudo-cross-platform kernel drivers such as VirtualBox or HAXM, NVMM is well integrated into the NetBSD kernel, and this allows us to optimize the context switches between the guests and the host, in order to avoid expensive operations in certain cases.

Another performance aspect of NVMM is the fact that in order to implement the secondary MMU, NVMM uses NetBSD's pmap subsystem. This allows us to have pageable guest pages, that the host can allocate on-demand to limit memory consumption, and can then swap out when it comes under memory pressure.

It also goes without saying that NVMM is fully MP-safe, and uses fine-grained locking to be able to run many VMs and many VCPUs simultaneously.

On the userland side, libnvmm tries to minimize the processing cost, by for example doing only a partial emulation of certain instructions, or by batching together certain guest IO operations. A lot of work has been done to try to reduce the number of syscalls an emulator would have to make, in order to increase the overall performance on the userland side; but there are several cases where it is not easy to keep a clean design.

Hardware Support

As of this writing, NVMM supports two backends, x86-SVM for AMD CPUs and x86-VMX for Intel CPUs. In each case, NVMM can support up to 128 virtual machines, each having a maximum of 256 VCPUs and 128GB of RAM.

Emulator Support

Armed with our full virtualization stack, our flexible backends, our user-friendly virtualization API, our comprehensive assists, and our swag NVMM logo, we can now add NVMM support in whatever existing emulator we want.

That's what was done in Qemu, with this patch, which shall soon be upstreamed. It uses libnvmm to provide hardware-accelerated virtualization on NetBSD.

It is now fully functional, and can run a wide variety of operating systems, such as NetBSD (of course), FreeBSD, OpenBSD, Linux, Windows XP/7/8.1/10, among others. All of that works equally across the currently supported NVMM backends, which means that Qemu+NVMM can be used on both AMD and Intel CPUs.

Fig. C: Example, Windows 10 running on Qemu+NVMM, with 3 VCPUs, on a host that has a quad-core AMD CPU.

Fig. D: Example, Fedora 29 running on Qemu+NVMM, with 8 VCPUs, on a host that has a quad-core Intel CPU.

The instructions on how to use Qemu+NVMM are available on this page.

What Now

All of NVMM is available in NetBSD-current, and will be part of the NetBSD 9 release.

Even if perfectly functional, the Intel backend of NVMM is younger than its AMD counterpart, and it will probably receive some more performance and stability improvements.

There still are, also, several design aspects that I haven't yet settled, because I haven't yet decided the best way to fix them.

Overall, I expect new backends to be added for other architectures than x86, and I also expect to add NVMM support in more emulators.

That's all, ladies and gentlemen. In six months of spare time, we went from Zero to NVMM, and now have a full virtualization stack that can run advanced operating systems in a flexible, fast and secure fashion.

Not bad

NetBSD Blog

Project Report: Add support for chdir(2) support in posix_spawn(3)

This post was written by Piyush Sachdeva:

Abstract

Implementation

User-Land

Kernel-Space

Testing & Documentation

Documentation:

Issues

Thanks

wifi project status update

Support for chdir(2) in posix_spawn(3)

This post was written by Piyush Sachdeva:

Support for chdir(2) in posix_spawn(3)

The fork() & exec() shenanigans

posix_spawn()

About my Project

Notes

References

aiomixer, X/Open Curses and ncurses, and other news

X/Open Curses and ncurses

Rewriting aiomixer

Other happenings

The GNU GDB Debugger and NetBSD (Part 2)

Signal conversions

Threading support

ELF symbol resolver

SVR4 psABI parsers of AUXV entries

Process information (info proc)

Event handling

Syscall entry/exit tracing

Threading events

Other changes

Plan for the next milestone

Improving libossaudio, and the future of OSS in NetBSD

Audacity/PortAudio's OSS usage is strange

Incompatibility 1 – SNDCTL_DSP_SPEED

Incompatibility 2 – SNDCTL_DSP_CHANNELS

Incompatibility 3 – SNDCTL_DSP_SETTRIGGER

Incompatibility 4 – SNDCTL_DSP_SETPLAYVOL

The future of libossaudio in NetBSD?

OSS aside...

Wifi renewal restarted

LLDB work concluded

LLDB integration

Pending tasks

Notes on backtracing through signal trampoline

Addressing breaking tests

This work is sponsored by The NetBSD Foundation

LLDB now works on i386

Build bot failure report

LLDB i386 support

Onto the mysterious UserArea

Native i386 support

More on LLDB/GDB packet compatibility

i386 outside NetBSD

Future plans

This work is sponsored by The NetBSD Foundation

GSoC 2019 Final Report: Incorporating the memory-hard Argon2 hashing scheme into NetBSD

Introduction

Incorporating the Argon2 Reference Implementation

Testing

Conclusion

Clang build bot now uses two-stage builds, and other LLVM/LLDB news

LLDB changes

Test updates, minor fixes

Threading support pushed

Build bot redesign

Recap of the problems

Two-stage builds

Compiler wrappers

Compiler-rt status and tests

Making tests work with PaX features

Clang/LLD dependent libraries feature

Introduction to the feature

Use of LLVM to approach static library dependency problem

Why it broke NetBSD?

Future plans

This work is sponsored by The NetBSD Foundation

Process information (`info proc`)

`__has_feature(leak_sanitizer)`