NetBSD fully reproducible builds


February 20, 2017 posted by Christos Zoulas

Introduction

I have been working on and off for almost a year trying to get reproducible builds (the same source tree always builds an identical cdrom) on NetBSD. I did not think at the time it would take as long or be so difficult, so I did not keep a log of all the changes I needed to make. I was also not the only one working on this. Other NetBSD developers have been making improvements for the past 6 years.

I would like to acknowledge the NetBSD build system (aka build.sh) which is a fully portable cross-build system. This build system has given us a head-start in the reproducible builds work.

I would also like to acknowledge the work done by the Debian folks who have provided a platform to run, test and analyze reproducible builds. Special mention to the diffoscope tool that gives an excellent overview of what's different between binary files, by finding out what they are (and if they are containers what they contain) and then running the appropriate formatter and diff program to show what's different for each file.

Finally other developers who have started, motivated and did a lot of work getting us here like Joerg Sonnenberger and Thomas Klausner for their work on reproducible builds, and Todd Vierling and Luke Mewburn for their work on build.sh.

Sources of difference

Here's is what we found that we needed to fix, how we chose to fix it and why, and where are we now.

There are many reasons why two separate builds from the same sources can be different. Here's an (incomplete) list:

  1. timestamps

    Many things like to keep track of timestamps, specially archive formats (tar(1), ar(1)), filesystems etc. The way to handle each is different, but the approach is to make them either produce files with a 0 timestamp (where it does not matter like ar), or with a specific timestamp when using 0 does not make sense (it is not useful to the user).
  2. dates/times/authors etc. embedded in source files

    Some programs like to report the date/time they were built, the author, the system they were built on etc. This can be done either by programmatically finding and creating source files containing that information during build time, or by using standard macros such as __DATE__, __TIME__ etc. Usually putting a constant time or eliding the information (such as we do with kernels and bootblocks) solves the problem.
  3. timezone sensitive code

    Certain filesystem formats (iso 9660 etc.) don't store raw timestamps but formatted times; to achieve this they convert from a timestamp to localtime, so they are affected by the timezone.
  4. directory order/build order

    The build order is not constant especially in the presence of parallel builds; neither is directory scan order. If those are used to create output files, the output files will need to be sorted so they become consistent.
  5. non-sanitized data stored into files

    Writing data structures into raw files can lead to problems. Running the same program in different operating systems or using ASLR makes those issues more obvious.
  6. symbolic links/paths

    Having paths embedded into binaries (specially for debugging information) can lead to binary differences. Propagation of the logical path can prove problematic.
  7. general tool inconsistencies

    gcc(1) profiling uses a PROFILE_HOOK macro on RISC targets that utilizes the "current function" number to produce labels. Processing order of functions is not guaranteed. gpt(8) creation involves uuid generation; these are generally random. block allocation on msdos filesystems had a random component. makefs(8) uses timezones with timestamps (iso9660), randomness for block selection (msdos), stores stray pointers in superblock (ffs).
  8. toolchain

    Every program that is used to generate other output needs to have consistent results. In NetBSD this is done with build.sh, which builds a set of tools from known sources before it can use those tools to build the rest of the system). There is a large number of tools. There are also internal issues with the tools that make their output non reproducible, such as nondeterministic symbol creation or capturing parts of the environment in debugging information.
  9. build information / tunables / environment

    There are many environment settings, or build variable settings that can affect the build. This needs to be kept constant across builds so we've changed the list of variables that are reported in Makefile.params:
    .if ${MKREPRO:Uno} != "yes"
    RELEASEVARS+=   BSDOBJDIR BSDSRCDIR BUILDID BUILDINFO BUILDSEED \
                    DESTDIR KERNARCHDIR KERNCONFDIR KERNOBJDIR KERNSRCDIR MAKE \
                    MAKEFLAGS NBUILDJOBS NETBSDSRCDIR OBJMACHINE OBJMACHINE_ARCH \
                    RELEASEDIR RELEASEMACHINEDIR TOOLDIR USR_OBJMACHINE X11SRCDIR
    .endif
    
  10. making sure that the source tree has no local changes

Variables controlling reproducible builds

Reproducible builds are controlled on NetBSD with two variables: MKREPRO (which can be set to yes or no) and MKREPRO_TIMESTAMP which is used to set the timestamp of the builds artifacts. This is usually set to the number of seconds from the epoch. The build.sh -P flag handles reproducible builds automatically: sets the MKREPRO variable to yes, and then finds the latest source file timestamp in the tree and sets MKREPRO_TIMESTAMP to that.

Handling timestamps

The first thing that we needed to understand was how to deal with timestamps. Some of the timestamps are not very useful (for example inside random ar archives) so we choose to 0 them out. Others though become annoying if they are all 0. What does it mean when you mount install media and all the dates on the files are Jan 1, 1970?

We decided that a better timestamp would be the timestamp of the most recently modified file in the source tree. Unfortunately this was not easy to find on NetBSD, because we are still using CVS as the source control system, and CVS does not have a good way to provide that. For that we wrote a tool called cvslatest, that scans the CVS metadata files (CVS/Entries) and finds the latest commit. This works well for freshly checked out trees (since CVS uses the source timestamp when checking out), but not with updated trees (because CVS uses the current time when updating files, so that make(1) thinks they've been modified). To fix that, we've added a new flag to the cvs(1) "update" command -t, that uses the source checkout time.

The build system needs now to evaluate the tree for the latest file running cvslatest(1) and find the latest timestamp in seconds from the Epoch which is set in the MKREPRO_TIMESTAMP variable. This is the same as SOURCE_DATE_EPOCH. Various Makefiles are using this variable and MKRERPO to determine how to produce consistent build artifacts.

For example many commands (tar(1), makefs(8), gpt(8), ...) have been modified to take a --timestamp or -T command line switch to generate output files that use the given timestamp, instead of the current time.

Other software (am-utils, acpica, bootblocks, kernel) used __DATE__ or __TIME__, or captured the user, machine, etc. from the environment and had to be changed to a constant time, user, machine, etc.

roff(7) documents used the td macro to generate the date of formatting in the document have been changed to conditionally use the macro based on register R, for example as in intro.me and then the Makefile was changed to set that register for MKREPRO.

Handling Order

We don't control the build order of things and we also don't control the directory order which can be filesystem dependent. The collation order also is environment specific, and sorting needs to be stable (we have not encountered that problem yet). Two different programs caused us problems here:

  • file(1) with the generation of the compiled magic file using directory order (fixed by changing file(1)).
  • install-info(1), texinfo(5) files that have no specific order. For that we developed another tool called sortinfo(1) that sorts those files as a post-process step.

Fortunately the filesystem builders and tar programs usually work with input directories that appear to have a consistent order so far, so we did not have to fix things there.

Permissions

NetBSD already keeps permissions for most things consistent in different ways:

  • the build system uses install(8) and specifies ownership and mode.
  • the mtree(8) program creates build artifacts using consistent ownership and permissions.

Nevertheless, the various architecture-specific distribution media installers used cp(1)/mkdir(1) and needed to be corrected.

Toolchain

Most of the issues found had to do with capturing the environment in debugging information. The two biggest issues were: DW_AT_Producer and DW_AT_comp_dir:

DW_AT_producer    : (indirect string, offset: 0x80): GNU C99 5.4.0 \
    -fno-canonical-system-headers -mtune=nocona \
    -march=x86-64 -g -O2 -std=gnu99 -fPIE -fstack-protector \
    -fdebug-prefix-map=$NETBSDSRCDIR=/usr/src \
    -fdebug-prefix-map=$X11SRCDIR=/usr/xsrc \
    -fdebug-regex-map=/usr/src/(.*)/obj.*=/usr/obj/\1 \
    -fdebug-regex-map=/usr/src/(.*)/obj.*/(.*)=/usr/obj/\1/\2 \
    --param ssp-buffer-size=1

Here you see two changes we made for reproducible builds:

  • We chose to allow variable names (and have gcc(1) expand them) for the source of the prefix map because the source tree location can vary. Others have chosen to skip -fdebug-prefix-map from the variables to be listed.
  • We added -fdebug-regex-map so that we could handle the NetBSD specific objdir build functionality. Object directories can have many flavors in NetBSD so it was difficult to use -fdebug-prefix-map to capture that.

DW_AT_comp_dir presented a different challenge. We got non-reproducibility when building on paths where either the source or the object directories contained symbolic links. Although gcc(1) does the right thing handling logical paths (respects $PWD), we found that there were problems both in the NetBSD sh(1) (fixed here) and in the NetBSD make(1) (fixed here). Unfortunately we can't depend on the shell to obey the logical path so we decided to go with:

    ${MAKE} -C other/dir
instead of:
    cd other/dir && ${MAKE}

This works because make(1) is a tool (part of the toolchain we provide) whereas sh(1) is not.

Another weird issue popped up on sparc64 where a single file in the whole source tree does not build reproducibly. This file is asn1_krb5_asn1.c which is generated in here. The problem is that when profiling on RISC machines gcc uses the PROFILE_HOOK macro which in turn uses the "function number" to generate labels. This number is assigned to each function in a source file as it is being compiled. Unfortunately this number is not deterministic because of optimization (a bug?), but fortunately turning optimization off fixes the problem.

Status and future work

As of 2017-02-20 we have fully reproducible builds on amd64 and sparc64. We are planning to work on the following areas:

  • Vary more parameters on the system build (filesystem types, build OS's)
  • Verify that cross building is reproducible
  • Verify that unprivileged builds work
  • Test on all the platforms
[4 comments]

 



Comments:

amazing work. well done

Posted by don bright on February 21, 2017 at 03:59 AM UTC #

for how wide of a scope did you turn off optimization?

Posted by sang yong on February 21, 2017 at 12:11 PM UTC #

That single file is compiled with -O0: http://cvsweb.netbsd.org/bsdweb.cgi/src/crypto/external/bsd/heimdal/lib/libasn1/Makefile.diff?r1=1.3&r2=1.4

Posted by Christos Zoulas on February 21, 2017 at 04:06 PM UTC #

I has worked on reproducible builds almost a year. I am keen to get a way to communicate with each other, in the field of reproducible builds. How could i do?

Posted by lijun on February 24, 2017 at 06:28 AM UTC #

Post a Comment:
Comments are closed for this entry.