GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 3

August 12, 2018 posted by Kamil Rytarowski

Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018

This is the final report of the project of integrating libFuzzer for the userland applications, here are the former two parts of this project:

For the last month of GSoC 2018, there two kinds of contributions:

  • Fuzzed some functions (instead of the whole program) from libraries and applications
  • Honggfuzz related work

Fuzzing Functions with libFuzzer

In previous work, we mainly focus on the fuzzing of whole programs, such as expr(1), sed(1), ping(8) and so on. However, fuzzing these applications as a whole usually needs significant modifications for various kinds of reasons:

  1. Collocation of main functions and other target functions
  2. Getting inputs from command line or network
  3. Complex options provided to the users

For the first problem, we cannot solve it without splitting them into separate files or using some macro tricks such as "ifdef". Under the second situation, the original program may write some lines of code to handle the input sources. So we must either wrap the input buffers provided by the libFuzzer into the format the programs expect or we need to transform the buffers into internal data structures. As for the third case, it may be better to avoid it by manually trying different options because fuzzing options blindly can easily result in meaningless test cases.

For the first two cases, honggfuzz can probably handle them elegantly and we will discuss it in the next section. But in this section, we will focus on fuzzing single function with libFuzzer.

Fuzzing regex(3) Functions

The regex(3) functions we have fuzzed includes the regcomp(3) and regexec interfaces. The regcomp(3) is used to compile the pattern we used to match strings; while the regexec(3) matches the strings with the compiled pattern. We have fuzzed 6 versions of regex(3) interfaces, they come from different libraries or applications:

  1. agrep version
  2. cvs version
  3. diffutils version
  4. grep version
  5. libc version
  6. vi version

For all of these versions, we have found some potential bugs. In the following part of this section, I will introduce what are these bugs. For the links given in the following cases, the "crash-XXXX" files are the input files to reproduce the bug, the "output-XXXX" files are corresponding expected outputs and the Makefile will generate the program to reproduce the bugs.

Bug in agrep Version regcomp(3)

The potential bug for this version appears around these lines:

for (i = 0; i < TRE_PARAM_LAST; i++)
  params[i] = lit->u.params[i];
The "params" field of the lit->u is set to NULL, so it will trigger a SIGSEGV. The further reason for why it is NULL is still unknown yet. You can reproduce this with files in this link.

Bug in cvs Version regcomp(3)

This is a potential bug to result in unterminated recursion. With the files from this link, this version of regcomp(3) will repeatedly call the calc_eclosure_iter function until it runs out of the stack memory.

Bug in diffutils and grep Version regcomp(3)

For these two versions of regcomp(3), they both use a macro named EXTRACT_NUMBER_AND_INCR, and finally, this macro will use this line to do left shift:

(destination) += SIGN_EXTEND_CHAR (*((source) + 1)) << 8;

So, it is possible that the result of SIGN_EXTEND_CHAR (*((source) + 1)) will be a negative number and the left shift operation might be an undefined behavior. To reproduce these two bugs, you can refer to the links for diffutils and grep.

Bug in libc Version regexec(3)

There would be a buffer-overflow bug with the heap memory in the libc regexec(3). This potential bug appears here:

1. for (;;) {
2.     /* next character */
3.     lastc = c;
4.     c = (p == m->endp) ? OUT : *p;
5.     if (EQ(st, fresh))
6.         coldp = p;

The pointer p starts from the matched string and it will be increased in every round of this loop. However, it seems that this loop fails to break even when the p points to the next character after the end of the matched string. So at the line 4, the dereference of pointer p will trigger an overflow error. This potential bug can be reproduced with the files from this link.

Fuzzing Checksum Functions

The checksum algorithms we have fuzzed are:

All these algorithms except the crc are implemented in the libc. For these algorithms implemented in the libc, the interfaces are quite similar. These interfaces can be divided into two categories, the first one is "update-style", which includes "XXXInit", "XXXUpdate" and "XXXFinal". The "XXX" is the name of checksum algorithm. The "Init" function is used for initializing the context, the "Update" function is used for executing the checksum process incrementally and the "Final" function is used for extracting the results. The second one is "data-style", which only uses "XXXData" interface. This interface is used to directly calculate the checksum from a complete buffer. For the crc algorithms, we have fuzzed the implementations from kernel and the cksum(1)

For the checksum algorithms, there has been no bug found during the fuzzing process.

Fuzzing libutil(3)

The libutil(3) contains various system-dependant used in some system daemons. During this project, we have chosen these functions from this library:

Bug in the strspct

Among all these functions, I have only found one potential bug in the strspct function. The potential bug of strspct appears around these lines:

if (numerator < 0) {
    numerator = -numerator;

From these lines, we can find that the numerator variable is negated. So when we assign this variable with the minimum integer, it is possible that this integer will overflow. You can reproduce this bug with the files under this directory, where crash-XXXXX is the input file, output-XXXXX is the expected output, and the Makefile is used to compile the binary which can accept the input file to reproduce the bug.

Fuzzing bozohttpd(8)

The main target function we have fuzzed for bozohttpd is the "bozo_process_request". However, we cannot fuzz it barely, because there are several dependencies to fuzz it. Specifically, this function needs a "bozo_httpreq_t" type to be processed. So we need to introduce the "bozo_read_request" to get a request and the "bozo_clean_request" to clean the request. To feed the data through the "bozo_read_request", we also need to mock some interfaces from the "ssl-bozo.c" The source for fuzzing bozohttpd(8) can be found here.

Bug in bozohttpd(8)

The potential bug found in the bozohttpd(8) is around these lines:

1.  val = bozostrnsep(&str, ":", &len);
2.  debug((httpd, DEBUG_EXPLODING,
3.         "read_req2: after bozostrnsep: str ``%s'' val ``%s''",
4.         str, val));
5.  if (val == NULL || len == -1) {
6.         (void)bozo_http_error(httpd, 404, request,
7. 	                         "no header");
8.         goto cleanup;
9.  }
10. while (*str == ' ' || *str == '\t')
11.        len--, str++;

The bug appears at the line 10, where the "str" is NULL and it tirggers a SIGSEGV with the input of this input file: here. Please notice that this file contains some non-printable characters.

The reason for this bug is that str is changed by the bozostrnsep function in the first line, however, the following lines only check whether val is NULL bug ignore the str variable. The possible workaround might be adding the check for this variable after calling bozostrnsep.

Honggfuzz Related Work

Fuzzing ping(8) with LD_PRELOAD and HF_ITER

In this last post, we have fuzzed the ping(8) with honggfuzz with plenty of modifications. This is because we need to modify the behaviors of the socket interfaces to get inputs from the honggfuzz. With the suggestions from Robert Swiecki in this pull request, we have finished a fuzzing implementation without any modification to the original source of ping(8).

The LD_PRELOAD environment variable can be used to load a list of libraries in advance. This means that we can use it to shadow the implementations of some functions in these libraries. The HF_ITER interface is used to get the inputs actively from the honggfuzz. So, if we combine these two together, we can re-implement the socket interfaces in some library and this implementation will retrieve the inputs with the HF_ITER interface. After that, we can load this library with LD_PRELOAD and then we can shadow the socket interfaces for ping(8). You can find the detailed implementation of this library in this link.

Similar to this idea, Kamil Rytarowski also suggests me to implement a fuzzing mode for honggfuzz to fuzz programs with inputs from the command line. The basic idea is that we can implement a library to replace the command line with inputs from HF_ITER interface. Currently, we have finished a simple implementation but it seems that we have encountered some problems with exec(3) interface because it might drop the information or states for fuzzing.

Adding "Only-Printable" Mode for honggfuzz

The libFuzzer provides -only-ascii option to provide only-printable inputs for fuzzed programs. This option is useful for some programs such as the expr(1), sed(1) and so on. So we have added the only-printable mode for the honggfuzz to finish similar tasks. This implementation has been merged by the official repository in a pull request.


The GSoC 2018 project of "Integrating libFuzzer for the Userland Applications" has finally ended. During this period, I have been more and more proficient with different fuzzing tools and the NetBSD system. At the same time, I also feel so good to contribute something to the community especially when some commits or suggestions have been accepted. With the help of Kamil Rytarowski, I fortunately also get a chance to give a talk about this project in the EuroBSDcon 2018.

Thanks to my mentors, Kamil Rytarowski and Christos Zoulas for their patient guidance during this summer. Besides, I also want to thank Robert Swiecki for his great suggestions on fuzzing with honggfuzz, thank Kamil Frankowicz for his help on fuzzing programs with both AFL and honggfuzz, thank Matthew Green and Joerg Sonnenberger for their help for working with LLVM on NetBSD. Finally, thanks to Google Summer of Code and the NetBSD Community for this chance to work with you in this unforgettable summer!



Post a Comment:
  • HTML Syntax: NOT allowed