NetBSD Blog

Bookmarks

Feeds

GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 1

June 13, 2018 posted by Kamil Rytarowski

Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018

During the Google Summer of Code 2018, I'm working on the project of integrating libFuzzer for the userland applications. The libFuzzer is a fuzzing engine based on the coverage information provided by the SanitizerCoverage in LLVM. It can repeatedly generate mutations of input data and test them until it finds the potential bugs. In this post, I'm going to share what I have done in the first month of this summer.

For the first month, I mainly tried to apply the sanitizers to the userland applications. Sanitizers (such as MemorySanitizer, AddressSanitizer, and etc.) are helpful to the fuzzing process because they can detect various types of run-time errors like uninitialized reads, out-of-bounds accesses, use-after-free and so on. I tried to apply MemorySanitizer as a start and there were three steps to finish this:

Import new version LLVM as an external toolchain
Add new interceptors for userland applications
Enable MemorySanitizer for userland applications and test them

Compile New Version LLVM Statically with `EXTERNAL_TOOLCHAIN`

Using a new version of LLVM toolchain is necessary because the LLVM in NetBSD trunk is old and there are some changes in the new version. However, updating the toolchain in the src will introduce extra work for this project, so we decided to use the EXTERNAL_TOOLCHAIN parameter provided by NetBSD to work with the new version.

During this period, I chose to use a pure-LLVM userland to avoid potential problems. This means that we should replace the libc++ instead of libstdc++ library for the userland programs. As a result, I used -DSANITIZER_CXX_ABI=libc++ and -DCLANG_DEFAULT_CXX_STDLIB=libc++ flags to eliminate some compilation errors while compiling the LLVM toolchain.

Another compiling issue is related to the sanitizers. Whenever there is failed check with sanitizers, the program will abort with backtrace information like this:

    ==15299==WARNING: MemorySanitizer: use-of-uninitialized-value
        #0 0x41c837 in main /home/zhengy/free.c:6:3
        #1 0x41c580 in ___start (//./a.out+0x41c580)

    SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/zhengy/free.c:6:3 in main
    Exiting

The backtrace is generated with the support of llvm-symbolizer. However, if we compile some dynamic libraries, which are needed by llvm-symbolizer, with sanitizers (because some userland programs with sanitizers also need them), then it will not available for generating a readable backtrace anymore:

    ==1623==WARNING: MemorySanitizer: use-of-uninitialized-value
        #0 0x41c837  (//./a.out+0x41c837)
        #1 0x41c580  (//./a.out+0x41c580)

    SUMMARY: MemorySanitizer: use-of-uninitialized-value (//./a.out+0x41c837)
    Exiting

So, to remove the dependencies of the sanitized dynamic libraries for llvm-symbolizer and other LLVM tools, we chose to compile the whole LLVM toolchain statically. For this purpose, we found that the static building behavior of LLVM on NetBSD is not workable, so we need to do some subtle modification to the cmake file. But this modification still needs some correctness confirmation from the LLVM community.

After all of these preparations, I wrote a shell script to automatically do the jobs of preparing external LLVM toolchains, compiling the NetBSD from source and finally generate a chroot(8)-able environment to work with sanitizers and libFuzzer.

With this environment, I first tried to run the test cases from both the LLVM and the NetBSD. For the LLVM part, I found that some libFuzzer cases were not working. But finally, we found that this resulted from the improper usages of sem_open(3) interface in the libFuzzer and so I submitted a patch to fix this.

For the NetBSD part, it worked well with the existing ATF(7) test cases for the AddressSanitizer and UndefinedBehaviorSanitizer. To test the MemorySanitizer, ThreadSanitizer, and libFuzzer, I added some test cases for them.

Add New Interceptors

Some libraries (such as libc, libm, and libpthread) and syscalls cannot be applied properly with sanitizers. This will introduce some troubles because we will lack information with these unsanitized interfaces. Fortunately, sanitizers can provide wrappers, namely interceptors, for these interfaces to manually provide some information. However, the set of interceptors is quite incomplete and thus need some effort to add some unsanitized functions needed by userland applications. As a summary, I added interceptors for the following interfaces:

strtonum(3) family: strtonum(3), strtoi(3), strtou(3)
vis(3) family: vis(3), nvis(3), strvis(3) and etc.
getmntinfo(3)
puts(3), fputs(3)
Hash interfaces: sha1(3), md2(3), md4(3), md5(3), rmd160(3) and sha2(3)
getvfsstat(2)
nl_langinfo(3)
fparseln(3)
unvis(3) family: unvis(3), strunvis(3) and etc.
statvfs(2) family: statvfs(2), fstatvfs(2) and etc.
mount(2) and unmount(2)
fseek(3) family: fseek(3), ftell(3), rewind(3) and etc.
cdbr(3) family: cdbr_open(3), cdbr_get(3), cdbr_find(3) and etc.
setvbuf(3) family: setbuf(3), setbuffer(3), setlinebuf(3), setvbuf(3)
mi_vector_hash(3)

Most of these interceptors are easy to add, we only need to leverage the interceptor interfaces provided by the compiler-rt and do the pre- and post- function call check. As an example, I choose the interceptor of strvis(3) to illustrate the implementation:

    INTERCEPTOR(int, strvis, char *dst, const char *src, int flag) {
      void *ctx;
      COMMON_INTERCEPTOR_ENTER(ctx, strvis, dst, src, flag);
      if (src)
        COMMON_INTERCEPTOR_READ_RANGE(ctx, src, REAL(strlen)(src) + 1);
      int len = REAL(strvis)(dst, src, flag);
      if (dst)
        COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dst, len + 1);
      return len;
    }

The strvis(3) interface will transform the representation of string stored in src and then return it with dst. So, its interceptor wants to tell the sanitizers two messages:

strvis(3) will read the string in src (COMMON_INTERCEPTOR_READ_RANGE interface)
strvis(3) will write a string to dst (COMMON_INTERCEPTOR_WRITE_RANGE interface)

So, with interceptors, the sanitizers can obtain information of unsanitized interfaces. There are three unsolved issues with interceptors:

Interceptors with FILE type: the FILE type is implemented as a structure and contains some pointers inside. This means that we should check these pointers one by one in the interceptors. However, the FILE type is common among different OSs and their implementations vary a lot. So, for different OSs, we should write different conditions. What's worse, there are some interceptors (such as fopen) implemented by others skipping the checks for FILE. This will introduce some incompatible problems if we enforce the check with other interfaces (like fputs). For example, the fopen is the interface to initialize the FILE type, if we skip marking the returned FILE pointer as initialized (with COMMON_INTERCEPTOR_WRITE_RANGE), we will get an error in the interceptor of fputs after we enforce the check of this pointer (with COMMON_INTERCEPTOR_READ_RANGE).
mount(2) interface: The mount(2) interface requires data parameter for different file systems. This parameter can be different types, such as struct ufs_args, struct nfs_args and so on. These types usually contain pointers, so we need to check them one by one. However, there are around 34 different struct xxx_args types in NetBSD, so it will be quite hard to add and maintain them in compiler-rt repository.
getchar(3) and putchar(3) family interfaces: these interfaces will be defined by macros with some compiler conditions, so their implementation will be complicated.

Enable the `Sanitizers` for the Userland with `MKSANITIZER`

After adding interceptors, we can then enable the sanitizers for userland applications. To ship the sanitizers to the user, Christos Zoulas prepared the MKSANITIZER framework, dedicated for building the whole sanitizer userland with a dedicated sanitizer (including UndefinedBehaviorSanitizer, Control Flow Integrity, MemorySanitizer, ThreadSanitizer, SafeStack, LeakSanitizer and etc).

Based on this framework, Kamil Rytarowski used the NetBSD building parameters like MKSANITIZER=yes USE_SANITIZER=undefined HAVE_LLVM=yes and managed to enable the UndefinedBehaviorSanitizer option for the whole userland. There is the ongoing effort on upstreaming local patches, fixing detected bugs. It is planned to follow up this with the remaining sanitizer options.

I also tried to enable the MemorySanitizer for the userland programs and here is the result. If you have any insights or suggestions, please feel free to comment on it. Applying the MemorySanitizer option also helped to improve the interceptors and integrate MKSANITIZER. The MemorySanitizer is sensitive to the interceptor issues and so actually this job was twisted with the process of adding and improving the interceptors. With the MemorySanitizer, I also find out two bugs with top(1) program. You can refer to this post to learn about it.

There are also some unsolved issues with some applications. As shown in the sheet, I divide them into five categories:

DEADLYSIGNAL: mainly happening when sending CTRL-C to programs
IOCTL: ioctl(2)-related errors
GETC, PUTC, FFLUSH: stdio(3)-related errors
REALLOC: realloc(3)-related errors
Compilation errors: conflict symbols between programs and base libraries

The challenging of GETC, PUTC, FFLUSH category has been mentioned above, it mainly results from lacking the interceptors of these interfaces. The other categories are still remained to be investigated.

Summary

In the last month, I have a good start of working with LLVM and NetBSD and successfully build some userland programs with MemorySanitizer. All of these jobs mentioned above are based on the forked repositories instead of the official ones. If you have interests in them, please refer to these repositories: NetBSD source, pkgsrc-wip, LLVM, clang, and compiler-rt. Next, I will switch to the integration work of libFuzzer and try to run some programs as a trial.

Last but not least, I want to thank my mentors, Christos Zoulas and Kamil Rytarowski, they help me a lot with so many good suggestions and assistance. I also want to thank Matthew Green and Joerg Sonnenberger for their help with LLVM-related suggestions. Finally, thanks to Google to give me a good chance to work with NetBSD community.

[0 comments]

« =?iso-8859-8-i?Q?... | Main | GSoC 2018 Reports:... »