GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 1
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018
 During the Google
  Summer of Code 2018, I'm working on the project
  of integrating
  libFuzzer for the userland
  applications. The libFuzzer is a fuzzing engine
  based on the coverage information provided by
  the SanitizerCoverage
  in LLVM. It can repeatedly generate mutations of input data and test
  them until it finds the potential bugs. In this post, I'm going to
  share what I have done in the first month of this summer.
  For the first month, I mainly tried to apply the sanitizers to the
  userland applications. Sanitizers (such as
  MemorySanitizer,
  AddressSanitizer,
  and etc.)  are helpful to the fuzzing process because they can
  detect various types of run-time errors like uninitialized reads,
  out-of-bounds accesses, use-after-free and so on. I tried to
  apply MemorySanitizer as a start and there were three
  steps to finish this:
  
- Import new version LLVM as an external toolchain
- Add new interceptors for userland applications
- Enable MemorySanitizerfor userland applications and test them
Compile New Version LLVM Statically
with EXTERNAL_TOOLCHAIN
  Using a new version of LLVM toolchain is necessary because the LLVM
  in NetBSD trunk is old and there are some changes in the new
  version. However, updating the toolchain in the src
  will introduce extra work for this project, so we decided to use
  the EXTERNAL_TOOLCHAIN parameter provided by NetBSD to
  work with the new version.
  During this period, I chose to use a pure-LLVM userland to avoid
  potential problems. This means that we should replace
  the libc++ instead of libstdc++ library
  for the userland programs. As a result, I
  used -DSANITIZER_CXX_ABI=libc++
  and -DCLANG_DEFAULT_CXX_STDLIB=libc++ flags to
  eliminate some compilation errors while compiling the LLVM
  toolchain.
Another compiling issue is related to the sanitizers. Whenever there is failed check with sanitizers, the program will abort with backtrace information like this:
    ==15299==WARNING: MemorySanitizer: use-of-uninitialized-value
        #0 0x41c837 in main /home/zhengy/free.c:6:3
        #1 0x41c580 in ___start (//./a.out+0x41c580)
    SUMMARY: MemorySanitizer: use-of-uninitialized-value /home/zhengy/free.c:6:3 in main
    Exiting
  
  The backtrace is generated with the support
  of llvm-symbolizer. However, if we compile some dynamic
  libraries, which are needed by llvm-symbolizer, with
  sanitizers (because some userland programs with sanitizers also need
  them), then it will not available for generating a readable
  backtrace anymore:
  
    ==1623==WARNING: MemorySanitizer: use-of-uninitialized-value
        #0 0x41c837  (//./a.out+0x41c837)
        #1 0x41c580  (//./a.out+0x41c580)
    SUMMARY: MemorySanitizer: use-of-uninitialized-value (//./a.out+0x41c837)
    Exiting
  
  So, to remove the dependencies of the sanitized dynamic libraries
  for llvm-symbolizer and other LLVM tools, we chose to
  compile the whole LLVM toolchain statically. For this purpose, we
  found that the static building behavior of LLVM on NetBSD is not
  workable, so we need to do
  some subtle
  modification to the cmake file. But this modification still
  needs some correctness confirmation from the LLVM community.
  After all of these preparations, I
  wrote a
  shell script to automatically do the jobs of preparing external
  LLVM toolchains, compiling the NetBSD from source and finally
  generate a chroot(8)-able environment to work with
  sanitizers and libFuzzer.
  With this environment, I first tried to run the test cases from both
  the LLVM and the NetBSD. For the LLVM part, I found that some
  libFuzzer cases were not working. But finally, we found
  that this resulted from the improper usages
  of sem_open(3) interface in the libFuzzer and so I
  submitted a patch to
  fix this.
  For the NetBSD part, it worked well with the
  existing ATF(7) test cases for
  the AddressSanitizer
  and UndefinedBehaviorSanitizer.
  To test
  the MemorySanitizer, ThreadSanitizer,
  and libFuzzer, I added some test cases for them.
Add New Interceptors
  Some libraries (such as libc, libm,
  and libpthread) and syscalls cannot be applied properly
  with sanitizers. This will introduce some troubles because we will
  lack information with these unsanitized interfaces. Fortunately,
  sanitizers can provide wrappers, namely interceptors, for these
  interfaces to manually provide some information. However, the set of
  interceptors is quite incomplete and thus need some effort to add
  some unsanitized functions needed by userland applications. As a
  summary, I added interceptors for the following interfaces:
  
- strtonum(3) family: strtonum(3), strtoi(3), strtou(3)
- vis(3) family: vis(3), nvis(3), strvis(3) and etc.
- getmntinfo(3)
- puts(3), fputs(3)
- Hash interfaces: sha1(3), md2(3), md4(3), md5(3), rmd160(3) and sha2(3)
- getvfsstat(2)
- nl_langinfo(3)
- fparseln(3)
- unvis(3) family: unvis(3), strunvis(3) and etc.
- statvfs(2) family: statvfs(2), fstatvfs(2) and etc.
- mount(2) and unmount(2)
- fseek(3) family: fseek(3), ftell(3), rewind(3) and etc.
- cdbr(3) family: cdbr_open(3), cdbr_get(3), cdbr_find(3) and etc.
- setvbuf(3) family: setbuf(3), setbuffer(3), setlinebuf(3), setvbuf(3)
- mi_vector_hash(3)
  Most of these interceptors are easy to add, we only need to leverage
  the interceptor interfaces provided by the compiler-rt and do the
  pre- and post- function call check. As an example, I choose the
  interceptor
  of strvis(3)
  to illustrate the implementation:
  
    INTERCEPTOR(int, strvis, char *dst, const char *src, int flag) {
      void *ctx;
      COMMON_INTERCEPTOR_ENTER(ctx, strvis, dst, src, flag);
      if (src)
        COMMON_INTERCEPTOR_READ_RANGE(ctx, src, REAL(strlen)(src) + 1);
      int len = REAL(strvis)(dst, src, flag);
      if (dst)
        COMMON_INTERCEPTOR_WRITE_RANGE(ctx, dst, len + 1);
      return len;
    }
  
  The strvis(3) interface will transform the
  representation of string stored in src and then return
  it with dst. So, its interceptor wants to tell the
  sanitizers two messages:
  - strvis(3)will read the string in- src(- COMMON_INTERCEPTOR_READ_RANGEinterface)
- strvis(3)will write a string to- dst(- COMMON_INTERCEPTOR_WRITE_RANGEinterface)
So, with interceptors, the sanitizers can obtain information of unsanitized interfaces. There are three unsolved issues with interceptors:
- Interceptors with FILEtype: theFILEtype is implemented as a structure and contains some pointers inside. This means that we should check these pointers one by one in the interceptors. However, theFILEtype is common among different OSs and their implementations vary a lot. So, for different OSs, we should write different conditions. What's worse, there are some interceptors (such asfopen) implemented by others skipping the checks forFILE. This will introduce some incompatible problems if we enforce the check with other interfaces (likefputs). For example, thefopenis the interface to initialize theFILEtype, if we skip marking the returnedFILEpointer as initialized (withCOMMON_INTERCEPTOR_WRITE_RANGE), we will get an error in the interceptor offputsafter we enforce the check of this pointer (withCOMMON_INTERCEPTOR_READ_RANGE).
- 
      mount(2)interface: The mount(2) interface requiresdataparameter for different file systems. This parameter can be different types, such asstruct ufs_args,struct nfs_argsand so on. These types usually contain pointers, so we need to check them one by one. However, there are around 34 differentstruct xxx_argstypes in NetBSD, so it will be quite hard to add and maintain them in compiler-rt repository.
- 
      getchar(3)andputchar(3)family interfaces: these interfaces will be defined by macros with some compiler conditions, so their implementation will be complicated.
Enable the Sanitizers for the Userland
with MKSANITIZER
  After adding interceptors, we can then enable the sanitizers for
  userland applications. To ship the sanitizers to the user, Christos
  Zoulas prepared the MKSANITIZER framework, dedicated
  for building the whole sanitizer userland with a dedicated sanitizer
  (including UndefinedBehaviorSanitizer, Control
  Flow
  Integrity, MemorySanitizer, ThreadSanitizer,
  SafeStack, LeakSanitizer
  and etc).
  Based on this framework, Kamil Rytarowski used the NetBSD building
  parameters like MKSANITIZER=yes USE_SANITIZER=undefined
  HAVE_LLVM=yes and managed to enable
  the UndefinedBehaviorSanitizer option for the whole
  userland. There is the ongoing effort on upstreaming local patches,
  fixing
  detected bugs. It is planned to follow up this with the
  remaining sanitizer options.
  I also tried to enable the MemorySanitizer for the
  userland programs
  and here
  is the result. If you have any insights or suggestions, please feel
  free to comment on it. Applying the MemorySanitizer
  option also helped to improve the interceptors and
  integrate MKSANITIZER. The MemorySanitizer
  is sensitive to the interceptor issues and so actually this job was
  twisted with the process of adding and improving the
  interceptors. With the MemorySanitizer, I also find out
  two bugs with top(1) program. You can refer
  to this
  post to learn about it.
There are also some unsolved issues with some applications. As shown in the sheet, I divide them into five categories:
- DEADLYSIGNAL: mainly happening when sending- CTRL-Cto programs
- IOCTL:- ioctl(2)-related errors
- GETC, PUTC, FFLUSH: stdio(3)-related errors
- REALLOC:- realloc(3)-related errors
- Compilation errors: conflict symbols between programs and base libraries
GETC, PUTC, FFLUSH category has been
  mentioned above, it mainly results from lacking the interceptors of
  these interfaces. The other categories are still remained to be
  investigated.
Summary
  In the last month, I have a good start of working with LLVM and
  NetBSD and successfully build some userland programs
  with MemorySanitizer. All of these jobs mentioned above
  are based on the forked repositories instead of the official
  ones. If you have interests in them, please refer to these
  repositories: NetBSD
  source, pkgsrc-wip,
  libFuzzer and
  try to run some programs as a trial.
Last but not least, I want to thank my mentors, Christos Zoulas and Kamil Rytarowski, they help me a lot with so many good suggestions and assistance. I also want to thank Matthew Green and Joerg Sonnenberger for their help with LLVM-related suggestions. Finally, thanks to Google to give me a good chance to work with NetBSD community.
[0 comments]
![[NetBSD Logo]](/tnf/resource/NetBSD-headerlogo.png)
