Enchancing Syzkaller Support for NetBSD, Part 3


August 27, 2019 posted by Kamil Rytarowski

Prepared by Siddharth Muralee(@R3x) as a part of Google Summer of Code’19

As a part of Google Summer of Code’19, I am working on improving the support for Syzkaller kernel fuzzer. Syzkaller is an unsupervised coverage-guided kernel fuzzer, that supports a variety of operating systems including NetBSD.

You can take a look through the first report to see the initial changes that we made and you can look at the second report to read about the initial support we added for fuzzing the network stack.

This report details the work done during the final coding period where the target was to improve the support for fuzzing the filesystem stack.

Filesystem fuzzing is a relatively less explored area. Syzkaller itself only has filesystem fuzzing support for Linux.

Analysis of the existing Linux setup

Filesystems are more complex fuzzing target than standalone system calls. To fuzz Filesystems we do have a standard operation like mount which comes with system call vector and an additional binary image of the filesystem itself. While normal syscalls generally have a size of a few bytes, sizes of real world Filesystem images is in order of Gigabytes or larger, however for fuzzing minimal size can be used which is in order of KB-MB. Since syzkaller uses a technique called as mutational fuzzing - where it mutates random parts of the input (according to specified guidelines), having a large input size causes delay due to higher I/O time.

Syzkaller deals with large images by disassembling them to non-zero chunks of the filesystem image. Syzkaller extracts the non-zero chunks and their offsets and stores it as separate segments and just before execution it writes all the chunks into the corresponding offsets - generating back the new/modified image.

Porting it to NetBSD

As an initial step towards filesystem fuzzing we decided to port the existing Linux approach of creating random segments to NetBSD. There are a few differences between the mounting process in both the operating systems - the most significant of them being the difference in the arguments to mount(2).

Linux:

int mount(const char *source, const char *target, const char *filesystemtype, unsigned long mountflags, const void *data);

The data argument is interpreted by the different filesystems. Typically it is a string of comma-separated options understood by this filesystem. mount(8) - shows possible arguments for each of the filesystem.

possible options for xfs filesystem in linux :

    wsync, noalign, swalloc, nouuid, mtpt, grpid, nogrpid, bsdgroups, 
    sysvgroups,norecovery, inode64, inode32, ikeep, noikeep,
    largeio, nolargeio, attr2, noattr2, filestreams, quota,
    noquota, lazytime, nolazytime, usrquota, grpquota, prjquota,
    uquota, gquota, pquota, uqnoenforce, gqnoenforce, pqnoenforce,
    qnoenforce, discard, nodiscard, dax, barrier, nobarrier, logbufs,
    biosize, sunit, swidth, logbsize, allocsize, logdev, rtdev

NetBSD:

Int mount(const char *type, const char *dir, int flags, void *data, size_t data_len);

The argument data describes the file system object to be mounted, and is data_len bytes long. data is a pointer to a structure that contains the type specific arguments to mount.

For FFS (one of the most common filesystems for NetBSD) - the arguments look like :

struct ufs_args {
        char      *fspec;   /* block special file to mount */
};

Currently, we have a pseudo syscall syz_mount_image which does the job of writing the mutated chunks of the filesystem into a file based on their offsets and later configuring the loop device using vndconfig(8) and mounting the filesystem image using mount(8).

Analysis of the current approach

One way to create mountable filesystems is to convert an existing filesystem image into a syzkaller pseudo grammar representation and then add it to the corpus so that syzkaller uses it for mutation and we have a proper image.

Some of the noted issues with syzkaller approach (as noted in "Fuzzing File Systems via Two-Dimensional Input Space Exploration) :

  • Lack of metadata knowledge - This may lead to corruption of filesystem specific aspects such as checksums.
  • Lack of Context awareness - Syzkaller isn't aware of the status of the filesystem image after a few operations are performed on it.
  • Steps Forward

    We also spent some time researching possible options to solve the existing issues and developing an approach that would give us better results.

    Image mutator approach

    One possible way forward is to actually use a seed image (a working filesystem image) and write a mutator which would be aware of all the metadata in the image. The mutator should be also be able to recreate metadata components such as the checksum so that the image is mountable.

    An existing implementation of such a mutator is JANUS which is a filesystem mutator written for Linux with inspiration from fsck.

    Grammar based approach

    Syzkaller uses a pseudo-formal grammar for representing arguments to syscalls. This grammar can also be modified to actually be able to properly generate filesystem images.

    Writing grammar to represent a filesystem image is quite a daunting task and we are not yet sure if it is possible but it is the approach that we have planned to take up as of now.

    Proper documentation detailing the structure of a filesystem image is rather scarce which has led me to actually go through filesystem code to figure out the type, uses and limits of a certain filesystem image. This data then has to be converted to syzkaller representation to be used for fuzzing.

    One advantage of writing a grammar that would be able to generate mountable images is that we would be able to get more coverage than fuzzing with a seed image, since we are also creating new images instead of just mutating the same image.

    I am currently working on learning the internals of FFS and trying to write a grammar definition which can properly generate filesystem images.

    Miscellaneous Work

    Meanwhile, I have also been working in parallel on improving the existing state of Syzkaller.

    Add kernel compiled with KUBSAN for fuzzing

    So far we only used a kernel compiled with KCOV and KASAN for fuzzing with syzkaller. We also decided to add support for syzkaller building a kernel with KUBSAN and KCOV. This would help us have an another dimension in the fuzzing process.

    This required some changes in the build config. We had to remove the hardcoded kernel config and add support for building a kernel with a config passed to the fuzzer. This move would also help us to easily add support for upcoming sanitizers such as KMSAN.

    Improve syscall descriptions

    Improving system call descriptions is a constant ongoing work - I recently added support for fuzzing syscalls such as mount, fork and posix_spawn.

    We are also planning to add support for fuzzing device drivers soon.

    Relevant Links

  • Syzkaller Dashboard for NetBSD
  • Syzkaller repository on Github
  • NetBSD docs on setting up syzkaller
  • GSoC'19 proof of work repository
  • Summary

    We have managed to meet most of the goals that we had planned for the GSoC project. Overall, I have had a wonderful summer with the NetBSD foundation and I look forward to working with them to complete the project.

    Last but not least, I want to thank my mentors, @kamil and @cryo for their useful suggestions and guidance. I also thank Maciej for his insight and guidance which was very fundamental during the course of the project. I would also like to thank Dmitry Vyukov, Google for helping with any issues faced with regard to Syzkaller. Finally, thanks to Google to give me a good chance to work with NetBSD community.

    [0 comments]

     



    Post a Comment:
    • HTML Syntax: NOT allowed