Google Summer of Code 2020: [Final Report] Enhancing Syzkaller support for NetBSD


October 19, 2020 posted by Kamil Rytarowski

This report was written by Ayushu Sharma as part of Google Summer of Code 2020.

This post is a follow up of the first report and second report. Post summarizes the work done during the third and final coding period for the Google Summer of Code (GSoc’20) project - Enhance Syzkaller support for NetBSD

Sys2syz

Sys2syz would give an extra edge to Syzkaller for NetBSD. It has a potential of efficiently automating the conversion of syscall definitions to syzkaller’s grammar. This can aid in increasing the number of syscalls covered by Syzkaller significantly with the minimum possibility of manual errors. Let’s delve into its internals.

A peek into Syz2syz Internals

This tool parses the source code of device drivers present in C to a format which is compatible with grammar customized for syzkaller. Here, we try to cull the details of the target device by compiling, and then collocate the details with our python code. For further details about proposed design for the tool, refer to previous post.

Python code follows 4 major steps:

  • Extractor.py - Extraction of all ioctl commands of a given device driver along with their arguments from the header files.
  • Bear.py - Preprocessing of the device driver's files using compile_commands.json generated during the setup of tool using Bear.
  • C2xml.py - XML files are generated by running c2xml on preprocessed device files. This eases the process of fetching the information related to arguments of commands
  • Description.py - Generates descriptions for the ioctl commands and their arguments (builtin-types, arrays, pointers, structures and unions) using the XML files.

Extraction:

This step involves fetching the possible ioctl commands for the target device driver and getting the files which have to be included in our dev_target.txt file. We have already seen all the commands for device drivers are defined in a specific way. These commands defined in the header files need to be grepped along with the major details, regex comes in as a rescue for this


	io = re.compile("#define\s+(.*)\s+_IO\((.*)\).*")
	iow = re.compile("#define\s+(.*)\s+_IOW\((.*),\s+(.*),\s+(.*)\).*")
	ior = re.compile("#define\s+(.*)\s+_IOR\((.*),\s+(.*),\s+(.*)\).*")
	iowr = re.compile("#define\s+(.*)\s+_IOWR\((.*),\s+(.*),\s+(.*)\).*")

Code scans through all the header files present in the target device folder and extracts all the commands along with their details using compiled regex expressions. Details include the direction of buffer(null, in, out, inout) based on the types of Ioctl calls(_IO, _IOR, _IOW, _IOWR) and the argument of the call. These are stored in a file named ioctl_commands.txt at location out/<target_name>. Example output:


out, I2C_IOCTL_EXEC, i2c_ioctl_exec_t

Preprocessing:

Preprocessing is required for getting XML files, about which we would look in the next step. Bear plays a major role when it comes to preprocessing C files. It records the commands executed for building the target device driver. This step is performed when setup.sh script is executed.

Extracted commands are modified with the help of parse_commands() function to include ‘-E’ and ‘-fdirectives’ flags and give it a new output location. Commands extracted by this function are then used by the compile_target function which filters out the unnecessary flags and generates preprocessed files in our output directory.

Generating XML files

Run C2xml on the preprocessed files to fetch XML files which stores source code in a tree-like structure, making it easier to collect all the information related to each and every element of structures, unions etc. For eg:


	<symbol type="struct" id="_5970" file="am2315.i" start-line="13240" start-col="16" end-line="13244" end-col="11" bit-size="96" alignment="4" offset="0">
		<symbol type="node" id="_5971" ident="ipending" file="am2315.i" start-line="13241" start-col="33" end-line="13241" end-col="41" bit-size="32" alignment="4" offset="0" base-type-builtin="unsigned int"/<
		<symbol type="node" id="_5972" ident="ilevel" file="am2315.i" start-line="13242" start-col="33" end-line="13242" end-col="39" bit-size="32" alignment="4" offset="4" base-type-builtin="int"/>
		<symbol type="node" id="_5973" ident="imasked" file="am2315.i" start-line="13243" start-col="33" end-line="13243" end-col="40" bit-size="32" alignment="4" offset="8" base-type-builtin="unsigned int"/>
	</symbol>
	<symbol type="pointer" id="_5976" file="am2315.i" start-line="13249" start-col="14" end-line="13249" end-col="25" bit-size="64" alignment="8" offset="0" base-type-builtin="void"/>
	<symbol type="array" id="_5978" file="am2315.i" start-line="13250" start-col="33" end-line="13250" end-col="39" bit-size="288" alignment="4" offset="0" base-type-builtin="unsigned int" array-size="9"/>

We would further see how attributes like - idents, id, type, base-type-builtin etc conveniently helps us to analyze code and generate descriptions in a trouble-free manner .

Descriptions.py

Final part, which offers a txt file storing all the required descriptions as its output. Here, information from the xml files and ioctl_commands.txt are combined together to generate descriptions of ioctl commands and their arguments.

Xml files for the given target device are parsed to form trees,


for file in (os.listdir(self.target)):
	tree = ET.parse(self.target+file)
	self.trees.append(tree)

We then traverse through these trees to search for the arguments of a particular ioctl command (particularly _IOR, _IOW, _IOWR commands) by the name of the argument. Once an element with the same value for ident attribute is found, attributes of the element are further examined to get its type. Possible types for these arguments are - struct, union, enum, function, array, pointer, macro and node. Using the type information we determine the way to define the element in accordance with syzkaller’s grammar syntax.

Building structs and unions involves defining their elements too, XML makes it easier. Program analyses each and every element which is a child of the root (struct/union) and generates its definitions. A dictionary helps in tracking the structs/unions which have been already built. Later, the dictionary is used to pretty print all the structs and union in the output file. Here is a code snippet which depicts the approach


            name = child.get("ident")
            if name not in self.structs_and_unions.keys():
                elements = {}
                for element in child:
                    elem_type = self.get_type(element)
                    elem_ident = element.get("ident")
                    if elem_type == None:
                        elem_type = element.get("type") 
                    elements[element.get("ident")] = elem_type

                element_str = ""
                for element in elements: 
                    element_str += element + "\t" + elements[element] + "\n"
                self.structs_and_unions[name] = " {\n" + element_str + "}\n"
            return str(name)

Task of creating descriptions for arrays is made simpler due to the attribute - `array-size`. When it comes to dealing with pointers, syzkaller needs the user to fill in the direction of the pointer. This has already been taken care of while analyzing the ioctl commands in Extractor.py. The second argument with in/out/inout as its possible value depends on ‘fun’ macros - _IOR, _IOW, _IOWR respectively.

There is another category named as nodes which can be distinguished using the base-type-builtin and base-type attributes.

Result

Once the setup script for sys2syz is executed, sys2syz can be used for a certain target_device file by executing the python wrapper script (sys2syz.py) with :

#bin/sh
python sys2syz.py -t <absolute_path_to_device_driver_source> -c compile_commands.json -v

This would generate a dev_<device_driver>.txt file in the out directory. An example description file autogenerated by sys2syz for i2c device driver.


#Autogenerated by sys2syz
include 

resource fd_i2c[fd]

syz_open_dev$I2C(dev ptr[in, string["/dev/i2c"]], id intptr, flags flags[open_flags]) fd_i2c

ioctl$I2C_IOCTL_EXEC(fd fd_i2c, cmd const[I2C_IOCTL_EXEC], arg ptr[out, i2c_ioctl_exec])

i2c_ioctl_exec {
iie_op	flags[i2c_op_t_flags]
iie_addr	int16
iie_buflen	len[iie_buf, intptr]
iie_buf	buffer[out]
iie_cmdlen	len[iie_cmd, intptr]
iie_cmd	buffer[out]
}

Future Work

Though we have a basic working structure of this tool, yet a lot has to be worked upon for leveling it up to make the best of it. Perfect goals would be met when there would be least of manual labor needed. Sys2syz still looks forward to automating the detection of macros used by the flag types in syzkaller. List of to-dos also includes extending syzkaller’s support for generation of description of syscalls.

Some other yet-to-be-done tasks include-

  • Generating descriptions for function type
  • Calculating attributes for structs and unions

Summary

We have surely reached closer to our goals but the project needs active involvement and incremental updates to scale it up to its full potential. Looking forward to much more learning and making more contribution to NetBSD community.

Atlast, a word of thanks to my mentors William Coldwell, Siddharth Muralee, Santhosh Raju and Kamil Rytarowski as well as the NetBSD organization for being extremely supportive. Also, I owe a big thanks to Google for giving me such a glaring opportunity to work on this project.

[1 comment]

 



Comments:

How to upgrade NetBSD 9.0 to 9.1? sysupgrade auto http://nycdn.netbsd.org/pub/NetBSD-daily/netbsd-9/latest/amd64 error 404 not found

Posted by eryk on October 21, 2020 at 01:49 AM UTC #

Post a Comment:
Comments are closed for this entry.