GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 3
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018
This is the final report of the project
of integrating
libFuzzer
for the userland applications, here are
the former two parts of this project:
- GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 1
- GSoC 2018 Reports: Integrate libFuzzer with the Basesystem, Part 2
For the last month of GSoC 2018, there two kinds of contributions:
- Fuzzed some functions (instead of the whole program) from libraries and applications
- Honggfuzz related work
Fuzzing Functions with libFuzzer
In previous work, we mainly focus on the fuzzing of whole programs,
such
as expr(1)
, sed(1)
, ping(8)
and
so on. However, fuzzing these applications as a whole usually needs
significant modifications for various kinds of reasons:
- Collocation of
main
functions and other target functions - Getting inputs from command line or network
- Complex options provided to the users
For the first problem, we cannot solve it without splitting them
into separate files or using some macro tricks such as
"ifdef
". Under the second situation, the original program
may write some lines of code to handle the input sources. So we must
either wrap the input buffers provided by the libFuzzer
into the format the programs expect or we need to transform the
buffers into internal data structures. As for the third case, it may
be better to avoid it by manually trying different options because
fuzzing options blindly can easily result in meaningless test
cases.
For the first two cases, honggfuzz
can probably handle
them elegantly and we will discuss it in the next section. But in this
section, we will focus on fuzzing single function
with libFuzzer
.
Fuzzing regex(3)
Functions
The regex(3)
functions we have fuzzed includes
the regcomp(3)
and regexec
interfaces. The
regcomp(3)
is used to compile the pattern we used to
match strings; while the regexec(3)
matches the strings
with the compiled pattern. We have fuzzed 6 versions
of regex(3)
interfaces, they come from different
libraries or applications:
For all of these versions, we have found some potential bugs. In
the following part of this section, I will introduce what are these
bugs. For the links given in the following cases, the
"crash-XXXX
" files are the input files to reproduce the
bug, the "output-XXXX
" files are corresponding expected
outputs and the Makefile
will generate the program to
reproduce the bugs.
Bug in agrep
Version regcomp(3)
The potential bug for this version appears around these lines:
for (i = 0; i < TRE_PARAM_LAST; i++) params[i] = lit->u.params[i];The "
params
" field of the lit->u
is set to
NULL
, so it will trigger a SIGSEGV
. The
further reason for why it is NULL
is still unknown yet.
You can reproduce this with files in
this link.
Bug in cvs
Version regcomp(3)
This is a potential bug to result in unterminated recursion. With
the files from
this link,
this version of regcomp(3)
will repeatedly call
the calc_eclosure_iter
function until it runs out of the stack memory.
Bug in diffutils
and grep
Version regcomp(3)
For these two versions of regcomp(3)
, they both use a
macro named EXTRACT_NUMBER_AND_INCR
, and finally, this
macro will use this line to do left shift:
(destination) += SIGN_EXTEND_CHAR (*((source) + 1)) << 8;
So, it is possible that the result of SIGN_EXTEND_CHAR
(*((source) + 1))
will be a negative number and the left
shift operation might be an undefined behavior. To reproduce these
two bugs, you can refer to the links
for diffutils
and grep
.
Bug in libc
Version regexec(3)
There would be a buffer-overflow bug with the heap memory in
the libc
regexec(3)
. This potential bug
appears here:
1. for (;;) { 2. /* next character */ 3. lastc = c; 4. c = (p == m->endp) ? OUT : *p; 5. if (EQ(st, fresh)) 6. coldp = p;
The pointer p
starts from the matched string and it
will be increased in every round of this loop. However, it seems that
this loop fails to break even when the p points to the next character
after the end of the matched string. So at the line 4, the dereference
of pointer p
will trigger an overflow error. This
potential bug can be reproduced with the files from
this link.
Fuzzing Checksum Functions
The checksum algorithms we have fuzzed are:
All these algorithms except the crc
are implemented in
the libc
. For these algorithms implemented in
the libc
, the interfaces are quite similar. These
interfaces can be divided into two categories, the first one is
"update-style", which includes "XXXInit
",
"XXXUpdate
" and "XXXFinal
". The
"XXX
" is the name of checksum algorithm. The
"Init
" function is used for initializing the context, the
"Update
" function is used for executing the checksum
process incrementally and the "Final
" function is used
for extracting the results. The second one is "data-style", which only
uses "XXXData" interface. This interface is used to directly calculate
the checksum from a complete buffer. For the crc
algorithms, we have fuzzed the implementations from kernel and
the cksum(1)
For the checksum algorithms, there has been no bug found during the fuzzing process.
Fuzzing libutil(3)
The libutil(3)
contains various system-dependant used
in some system daemons. During this project, we have chosen these
functions from this library:
Bug in the strspct
Among all these functions, I have only found one potential bug in
the strspct
function. The potential bug
of strspct
appears
around these
lines:
if (numerator < 0) { numerator = -numerator; sign++; }
From these lines, we can find that the numerator
variable is negated. So when we assign this variable with the minimum
integer, it is possible that this integer will overflow. You can
reproduce this bug with the files under
this directory,
where crash-XXXXX
is the input
file, output-XXXXX
is the expected output, and the
Makefile
is used to compile the binary which can accept
the input file to reproduce the bug.
Fuzzing bozohttpd(8)
The main target function we have fuzzed for bozohttpd
is the "bozo_process_request
". However, we cannot fuzz it
barely, because there are several dependencies to fuzz
it. Specifically, this function needs a "bozo_httpreq_t
"
type to be processed. So we need to introduce the
"bozo_read_request
" to get a request and the
"bozo_clean_request
" to clean the request. To feed the
data through the "bozo_read_request
", we also need to
mock some interfaces from the
"ssl-bozo.c
"
The source for fuzzing bozohttpd(8)
can be
found here.
Bug in bozohttpd(8)
The potential bug found in the bozohttpd(8)
is
around these
lines:
1. val = bozostrnsep(&str, ":", &len); 2. debug((httpd, DEBUG_EXPLODING, 3. "read_req2: after bozostrnsep: str ``%s'' val ``%s''", 4. str, val)); 5. if (val == NULL || len == -1) { 6. (void)bozo_http_error(httpd, 404, request, 7. "no header"); 8. goto cleanup; 9. } 10. while (*str == ' ' || *str == '\t') 11. len--, str++;
The bug appears at the line 10, where the "str
"
is NULL
and it tirggers a SIGSEGV
with the
input of this input
file: here. Please
notice that this file contains some non-printable characters.
The reason for this bug is that str
is changed by
the bozostrnsep
function in the first line, however, the
following lines only check whether val
is NULL
bug ignore the str
variable. The
possible workaround might be adding the check for this variable after
calling bozostrnsep
.
Honggfuzz
Related Work
Fuzzing ping(8)
with LD_PRELOAD
and HF_ITER
In this last post, we have fuzzed the ping(8)
with honggfuzz
with plenty of modifications. This is
because we need to modify the behaviors of the socket interfaces to
get inputs from the honggfuzz
. With the suggestions
from Robert Swiecki in
this pull
request, we have finished a fuzzing implementation without any
modification to the original source of ping(8)
.
The LD_PRELOAD
environment variable can be used to
load a list of libraries in advance. This means that we can use it to
shadow the implementations of some functions in these
libraries. The HF_ITER
interface is used to get the
inputs actively from the honggfuzz
. So, if we combine
these two together, we can re-implement the socket interfaces in some
library and this implementation will retrieve the inputs with
the HF_ITER
interface. After that, we can load this
library with LD_PRELOAD
and then we can shadow the socket
interfaces for ping(8)
. You can find the detailed
implementation of this library in
this link.
Similar to this idea, Kamil Rytarowski also suggests me to
implement a fuzzing mode for honggfuzz
to fuzz programs
with inputs from the command line. The basic idea is that we can
implement a library to replace the command line with inputs
from HF_ITER
interface. Currently, we have
finished a
simple implementation but it seems that we have encountered some
problems
with exec(3)
interface because it might drop the information or states for
fuzzing.
Adding "Only-Printable" Mode for honggfuzz
The libFuzzer
provides -only-ascii
option
to provide only-printable inputs for fuzzed programs. This option is
useful for some programs such as
the expr(1)
, sed(1)
and so on. So we have
added the only-printable mode for the honggfuzz
to finish
similar tasks. This implementation has been merged by the official
repository in a
pull request.
Summary
The GSoC 2018 project of "Integrating libFuzzer for the Userland Applications" has finally ended. During this period, I have been more and more proficient with different fuzzing tools and the NetBSD system. At the same time, I also feel so good to contribute something to the community especially when some commits or suggestions have been accepted. With the help of Kamil Rytarowski, I fortunately also get a chance to give a talk about this project in the EuroBSDcon 2018.
Thanks to my mentors, Kamil Rytarowski and Christos Zoulas for
their patient guidance during this summer. Besides, I also want to
thank Robert Swiecki for his great suggestions on fuzzing
with honggfuzz
, thank Kamil Frankowicz for his help on
fuzzing programs with both AFL
and honggfuzz
, thank Matthew Green and Joerg Sonnenberger
for their help for working with LLVM on NetBSD. Finally, thanks to
Google Summer of Code and the NetBSD Community for this chance to work
with you in this unforgettable summer!