GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 1
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018
During the Google
Summer of Code 2018, I'm working on the project
of integrating
libFuzzer
for the userland
applications. The libFuzzer
is a fuzzing engine
based on the coverage information provided by
the SanitizerCoverage
in LLVM. It can repeatedly generate mutations of input data and test
them until it finds the potential bugs. In this post, I'm going to
share what I have done in the first month of this summer.
For the first month, I mainly tried to apply the sanitizers to the
userland applications. Sanitizers (such as
MemorySanitizer
,
AddressSanitizer
,
and etc.) are helpful to the fuzzing process because they can
detect various types of run-time errors like uninitialized reads,
out-of-bounds accesses, use-after-free and so on. I tried to
apply MemorySanitizer
as a start and there were three
steps to finish this:
- Import new version LLVM as an external toolchain
- Add new interceptors for userland applications
- Enable
MemorySanitizer
for userland applications and test them
Compile New Version LLVM Statically
with EXTERNAL_TOOLCHAIN
Using a new version of LLVM toolchain is necessary because the LLVM
in NetBSD trunk is old and there are some changes in the new
version. However, updating the toolchain in the src
will introduce extra work for this project, so we decided to use
the EXTERNAL_TOOLCHAIN
parameter provided by NetBSD to
work with the new version.
During this period, I chose to use a pure-LLVM userland to avoid
potential problems. This means that we should replace
the libc++
instead of libstdc++
library
for the userland programs. As a result, I
used -DSANITIZER_CXX_ABI=libc++
and -DCLANG_DEFAULT_CXX_STDLIB=libc++
flags to
eliminate some compilation errors while compiling the LLVM
toolchain.
Another compiling issue is related to the sanitizers. Whenever there is failed check with sanitizers, the program will abort with backtrace information like this:
==15299==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x41c837 in main /home/zhengy/free.c:6:3 #1 0x41c580 in ___start (//./a.out+0x41c580) SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/zhengy/free.c:6:3 in main ExitingThe backtrace is generated with the support of
llvm-symbolizer
. However, if we compile some dynamic
libraries, which are needed by llvm-symbolizer
, with
sanitizers (because some userland programs with sanitizers also need
them), then it will not available for generating a readable
backtrace anymore:
==1623==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x41c837 (//./a.out+0x41c837) #1 0x41c580 (//./a.out+0x41c580) SUMMARY: MemorySanitizer: use-of-uninitialized-value (//./a.out+0x41c837) ExitingSo, to remove the dependencies of the sanitized dynamic libraries for
llvm-symbolizer
and other LLVM tools, we chose to
compile the whole LLVM toolchain statically. For this purpose, we
found that the static building behavior of LLVM on NetBSD is not
workable, so we need to do
some subtle
modification to the cmake file. But this modification still
needs some correctness confirmation from the LLVM community.
After all of these preparations, I
wrote a
shell script to automatically do the jobs of preparing external
LLVM toolchains, compiling the NetBSD from source and finally
generate a chroot(8)
-able environment to work with
sanitizers and libFuzzer
.
With this environment, I first tried to run the test cases from both
the LLVM and the NetBSD. For the LLVM part, I found that some
libFuzzer
cases were not working. But finally, we found
that this resulted from the improper usages
of sem_open(3)
interface in the libFuzzer and so I
submitted a patch to
fix this.
For the NetBSD part, it worked well with the
existing ATF(7)
test cases for
the AddressSanitizer
and UndefinedBehaviorSanitizer
.
To test
the MemorySanitizer
, ThreadSanitizer
,
and libFuzzer
, I added some test cases for them.
Add New Interceptors
Some libraries (such as libc
, libm
,
and libpthread
) and syscalls cannot be applied properly
with sanitizers. This will introduce some troubles because we will
lack information with these unsanitized interfaces. Fortunately,
sanitizers can provide wrappers, namely interceptors, for these
interfaces to manually provide some information. However, the set of
interceptors is quite incomplete and thus need some effort to add
some unsanitized functions needed by userland applications. As a
summary, I added interceptors for the following interfaces:
- strtonum(3) family: strtonum(3), strtoi(3), strtou(3)
- vis(3) family: vis(3), nvis(3), strvis(3) and etc.
- getmntinfo(3)
- puts(3), fputs(3)
- Hash interfaces: sha1(3), md2(3), md4(3), md5(3), rmd160(3) and sha2(3)
- getvfsstat(2)
- nl_langinfo(3)
- fparseln(3)
- unvis(3) family: unvis(3), strunvis(3) and etc.
- statvfs(2) family: statvfs(2), fstatvfs(2) and etc.
- mount(2) and unmount(2)
- fseek(3) family: fseek(3), ftell(3), rewind(3) and etc.
- cdbr(3) family: cdbr_open(3), cdbr_get(3), cdbr_find(3) and etc.
- setvbuf(3) family: setbuf(3), setbuffer(3), setlinebuf(3), setvbuf(3)
- mi_vector_hash(3)
Most of these interceptors are easy to add, we only need to leverage
the interceptor interfaces provided by the compiler-rt and do the
pre- and post- function call check. As an example, I choose the
interceptor
of strvis(3)
to illustrate the implementation:
INTERCEPTOR(int, strvis, char *dst, const char *src, int flag) { void *ctx; COMMON_INTERCEPTOR_ENTER(ctx, strvis, dst, src, flag); if (src) COMMON_INTERCEPTOR_READ_RANGE(ctx, src, REAL(strlen)(src) + 1); int len = REAL(strvis)(dst, src, flag); if (dst) COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dst, len + 1); return len; }The
strvis(3)
interface will transform the
representation of string stored in src
and then return
it with dst
. So, its interceptor wants to tell the
sanitizers two messages:
strvis(3)
will read the string insrc
(COMMON_INTERCEPTOR_READ_RANGE
interface)strvis(3)
will write a string todst
(COMMON_INTERCEPTOR_WRITE_RANGE
interface)
So, with interceptors, the sanitizers can obtain information of unsanitized interfaces. There are three unsolved issues with interceptors:
- Interceptors with
FILE
type: theFILE
type is implemented as a structure and contains some pointers inside. This means that we should check these pointers one by one in the interceptors. However, theFILE
type is common among different OSs and their implementations vary a lot. So, for different OSs, we should write different conditions. What's worse, there are some interceptors (such asfopen
) implemented by others skipping the checks forFILE
. This will introduce some incompatible problems if we enforce the check with other interfaces (likefputs
). For example, thefopen
is the interface to initialize theFILE
type, if we skip marking the returnedFILE
pointer as initialized (withCOMMON_INTERCEPTOR_WRITE_RANGE
), we will get an error in the interceptor offputs
after we enforce the check of this pointer (withCOMMON_INTERCEPTOR_READ_RANGE
). -
mount(2)
interface: The mount(2) interface requiresdata
parameter for different file systems. This parameter can be different types, such asstruct ufs_args
,struct nfs_args
and so on. These types usually contain pointers, so we need to check them one by one. However, there are around 34 differentstruct xxx_args
types in NetBSD, so it will be quite hard to add and maintain them in compiler-rt repository. -
getchar(3)
andputchar(3)
family interfaces: these interfaces will be defined by macros with some compiler conditions, so their implementation will be complicated.
Enable the Sanitizers
for the Userland
with MKSANITIZER
After adding interceptors, we can then enable the sanitizers for
userland applications. To ship the sanitizers to the user, Christos
Zoulas prepared the MKSANITIZER
framework, dedicated
for building the whole sanitizer userland with a dedicated sanitizer
(including UndefinedBehaviorSanitizer
, Control
Flow
Integrity
, MemorySanitizer
, ThreadSanitizer
,
SafeStack
, LeakSanitizer
and etc).
Based on this framework, Kamil Rytarowski used the NetBSD building
parameters like MKSANITIZER=yes USE_SANITIZER=undefined
HAVE_LLVM=yes
and managed to enable
the UndefinedBehaviorSanitizer
option for the whole
userland. There is the ongoing effort on upstreaming local patches,
fixing
detected bugs. It is planned to follow up this with the
remaining sanitizer options.
I also tried to enable the MemorySanitizer
for the
userland programs
and here
is the result. If you have any insights or suggestions, please feel
free to comment on it. Applying the MemorySanitizer
option also helped to improve the interceptors and
integrate MKSANITIZER
. The MemorySanitizer
is sensitive to the interceptor issues and so actually this job was
twisted with the process of adding and improving the
interceptors. With the MemorySanitizer
, I also find out
two bugs with top(1)
program. You can refer
to this
post to learn about it.
There are also some unsolved issues with some applications. As shown in the sheet, I divide them into five categories:
DEADLYSIGNAL
: mainly happening when sendingCTRL-C
to programsIOCTL
:ioctl(2)
-related errorsGETC, PUTC, FFLUSH
: stdio(3)-related errorsREALLOC
:realloc(3)
-related errors- Compilation errors: conflict symbols between programs and base libraries
GETC, PUTC, FFLUSH
category has been
mentioned above, it mainly results from lacking the interceptors of
these interfaces. The other categories are still remained to be
investigated.
Summary
In the last month, I have a good start of working with LLVM and
NetBSD and successfully build some userland programs
with MemorySanitizer
. All of these jobs mentioned above
are based on the forked repositories instead of the official
ones. If you have interests in them, please refer to these
repositories: NetBSD
source, pkgsrc-wip,
libFuzzer
and
try to run some programs as a trial.
Last but not least, I want to thank my mentors, Christos Zoulas and Kamil Rytarowski, they help me a lot with so many good suggestions and assistance. I also want to thank Matthew Green and Joerg Sonnenberger for their help with LLVM-related suggestions. Finally, thanks to Google to give me a good chance to work with NetBSD community.
[0 comments]