Child pages
  • Reproducible/Verifiable Builds
Skip to end of metadata
Go to start of metadata

( short link: https://s.apache.org/reproducible-builds )

Status
 WIP
Version
Issue(s)
MNG-6276 - Getting issue details... STATUS
Sources
Developer(s)

Context

https://reproducible-builds.org/ (see mailing list)
Reproducible builds
are a set of software development practices that create a verifiable path from human readable source code to the binary code used by computers

How?

First, the build system needs to be made entirely deterministic: transforming a given source must always create the same result. Typically, the current date and time must not be recorded and output always has to be written in the same order.

Second, the set of tools used to perform the build and more generally the build environment should either be recorded or pre-defined.

Third, users should be given a way to recreate a close enough build environment, perform the build process, and verify that the output matches the original build.

Tooling like diffoscope have been created to measure differences between archives content.

Java builds are naturally not immediately reproducible: timestamps in jar files are the first source of non-idempotence (if you do a build twice with just javac and jar commands, the result won't be the same bit for bit).

But Maven plugins in the whole ecosystem (not only provided by Apache Maven team) sometimes add some variable parts that adds to the problem: timestamp text or username in MANIFEST.MF, ...

reproducible-build-maven-plugin has been created to try to fix issues after packaging, by rewriting the archive and reworking content known for variable parts.

The goal of this proposal is to prepare a set of configuration and practices to have reproducible/verifiable builds at packaging time, both by enhancing java natural build behaviour and by removing some variability introduced by some Maven plugins (core plugins at first, but also in the Maven eco-system).
In parallel to this proposal, "Reproducible Maven Builds" site has been created to work on prototypes.

Use cases

  1. As a user of artifacts published on repositories like Maven Central, I want to be able to check that the binary version of the artifact matches its source version.
    On a software QA point of view, this would allow to detect quality problems in the build/publish process.
    On a computer security point of view, this would allow to detect the introduction of a backdoor during the build/publish process (instead of other solutions based on checking signatures like envisioned in MNG-6026).
  2. As a developer voting on an Apache source release against a staging repository, I want to verify that the binary from my local build from sources is the same as the binary that is staged and signed by the release manager

Sources of unreproducible bits

  • Timestamps:
    • Timestamps in ZIP/JAR files: file last modification time/date in central directory and file entry headers + possible optional fields "X5455_ExtendedTimestamp" (PLEXUS-ARCHIVER-48)
    • Timestamp in pom.properties generated by maven-archiver (MSHARED-494 (tick))
    • Timestamp in plugin.xml and plugin-help.xml descriptors generated by maven-plugin-tools-generator (MPLUGIN-326 (tick))
    • Timestamp in MANIFEST.MF (Bnd-LastModified) generated by Felix maven-bundle-plugin
    • Timestamps in generated javadoc HTML files (can be disabled with javadoc options "notimestamp" and "bottom")
    • Timestamps in bytecode generated from Groovy code (added by GroovyClassLoader.addTimeStamp())
  • Username:
    • UID/GID in tar file entries
    • Username in MANIFEST.MF (Built-By) generated by maven-archiver (MSHARED-661 (tick))
  • Ordering:
    • Order of the file entries in a ZIP/JAR file (depends on file system order)
    • Order of the entries in the MANIFEST (MSHARED-511(tick))
    • Order of goals in plugin.xml generated by maven-plugin-tools (MPLUGIN-261(tick))
    • Order of the methods of the ObjectFactory.java file generated by JAXB/xjc (JAXB-598(tick))
    • Order of components in META-INF/plexus/components.xml generated by plexus metadata (issue #8(tick))
  • Tools Versions:
    • exact JDK version used to build in MANIFEST.MF (Build-Jdk) generated by maven-archiver (MSHARED-797(tick))
      Notice that keeping the major version of the JDK used still makes sense, since it has an influence on generated bytecode: with the same source code and defined -target version, javac from JDK 6, 7, 8, ... do not produce the same bytecode. If we want to isolate the generated binary from JDK used, the compiler used will have to not be javac provided by running JDK (see Using Non-Javac Compilers)
    • exact Maven version used to build in MANIFEST.MF (Created-By) generated by maven-archiver (MSHARED-799(tick))
    • exact Maven version used to build in META-INF/.../pom.properties generated by maven-archiver (MSHARED-800(tick))

Line endings is also a problem, and even if we could force given line endings for build-generated text files (MANIFEST, pom.properties...), it would be hazardous to try to change the line endings of the resource files.

Out of scope

Given the variety of sources of unreproducible builds and a balance between their impact and the complexity of fixing, a few ones are considered out of scope of this proposal: once reproducible builds works well with chosen limitations, and if it has success against users, these limitations can be reworked later:

  • version ranges in pom.xml: version ranges makes version resolution unstable over time. This proposals start from a stable build.
    Notice that some nice strategies have been discussed on how to introduce stability while maintaining version ranges: see the discussion on Maven dev mailing list...
  • line ending (Windows CRLF vs Unix LF): updating plugins that generate content can be easy, but this will require normalizing line endings of resource files, which may be hazardous
    Notice that building with -Dline.separator='\n' is an easy first step
  • JDK version: from initial tests, only major version has an impact, which is manageable to get an environment for reproducible build.
    Future strategies on easing rebuild management could consider using another compiler than javac, that could be downloaded as a plugin dependency...

Output Archive Entries Timestamp

Packaging plugins, that create zip or tar archives, will require a parameter to define the value of a timestamp to use for archive entries, independantly from effective build timestamp. This is something equivalent to Reproducible Build's SOURCE_DATE_EPOCH environment variable.

Life would become easier if there was a dedicated POM element like ${project.build.outputTimestamp} (with an ISO-8601 formatted date+time) which could be used to specify the timestamp value once per entire project. Every plugin could use it as default value, like it has been done with source files encoding:

/**
 * Timestamp for reproducible output archive entries, either formatted as ISO 8601
 * <code>yyyy-MM-dd'T'HH:mm:ssXXX</code> or as an int representing seconds since the epoch (like
 * <a href="https://reproducible-builds.org/docs/source-date-epoch/">SOURCE_DATE_EPOCH</a>).
 
*/
@Parameter( defaultValue = "${project.build.outputTimestamp}" )

private String outputTimestamp;

Adding this element to the POM structure without breaking backward compatibility can only happen in a future version, yet to be defined (at least after Maven 3.0, see POM Model Version 5 proposal):

<project>
  ...
  <build>
    <!-- NOTE: This is just a vision for the future, it's not yet implemented: see MNG-xxx -->
    <outputTimestamp>2019-10-02T08:04:00Z</outputTimestamp>
    ...
  </build>
  ...
</project>

For Maven 2.x and 3.x, the value can be defined as an equivalent property:

<project>
  ...
  <properties>
    <project.build.outputTimestamp>2019-10-02T08:04:00Z</project.build.outputTimestamp>
    ...
  </properties>
  ...
</project>

Thus plugins could immediately be modified to use ${project.build.outputTimestamp} default value, whatever Maven version is used.

MSHARED-837 issue has been created to provide to plugins a shared API to parse the timestamp and configure reproducible archive creation in a uniform way.

MRELEASE-1029 has been created to update the timestamp value during release:prepare.

Rebuilding

The underlying problem is that the pom file does not capture all the configuration of the build environment: it includes the plugins used during the build with their version number, but it does not include the version of Maven and the JDK, the Operating System and the architecture used to produce the artifact, etc...

What can we do? (non-exhaustive list):

  • Drop the Maven/JDK version numbers in MANIFEST: most of the time you should get exactly the same result with two different versions of Maven and/or JDK (if you keep the same major version number). But you have to restrain to a given OS family because of line endings. However, with the new development roadmap of OpenJDK (2 "major" versions per year, each one increasing the class file version number), it may be difficult to find 2 versions of javac that produce the same class files in the near future.
  • Keep the version numbers in the MANIFEST: there are not easily accessible here. Moreover the semantic is poor (e.g. there is no JDK vendor so you can have different results if you use OpenJDK/Oracle JDK/Eclipse compiler, no reference of the Operating System used for line endings).
  • Add a "reproducible build bill of materials" in an external file. This is the way Debian took to manage reproducible builds: they record the "build environment" in an external ".buildinfo" file that has all the information required to reproduce the build environment. If the frame of Maven we can think of several ways to achieve that (non-exhaustive list):
    • Create a secondary artifact (e.g. *-buildinfo.xml) with the required information
    • "Patch" the published pom file to add properties with the required information, something like:
<properties>
  <maven.reproducible.build.maven.version>3.5.0</maven.reproducible.build.maven.version>
  <maven.reproducible.build.jdk.version>8u123</maven.reproducible.build.jdk.version>
  <maven.reproducible.build.jdk.vendor>openjdk</maven.reproducible.build.jdk.vendor>
  <maven.reproducible.build.arch>amd64</maven.reproducible.build.arch>
  <maven.reproducible.build.os>linux</maven.reproducible.build.os>
</properties>

This buildinfo for the jvm is currently a work in progress on reproducible-builds.

Another way to ease the reproducibility would be to use a wrapper script that would download from Maven Central the exact Maven & JDK versions that should be used to build the project. It is the same kind of idea than the maven-wrapper tool, but extended to the JDK itself. This feature would also benefit people not interested in reproducible builds because it would ease the computer setup of every developer and erase most of the discrepancies between developers builds and CI builds.

What are the issues to solve?

issue trackingdescription
MSHARED-661 ((tick) maven-archiver 3.4.0)

META-INF/MANIFEST.MF

maven-archiver adds "Built-By: <username>" Manifest entry: the entry was removed

MSHARED-796 ((tick) maven-archiver-3.4.0)

META-INF/MANIFEST.MFmaven-archiver adds "Built-Jdk: <detailed java version>" Manifest entry: better replaced with "Built-Jdk: <java specification version>"
MSHARED-494 ((tick) maven-archiver 3.1.0)META-INF/maven/$groupId/$artifactId/pom.propertiesTimestamp in pom.properties
MSHARED-800 (tick)META-INF/maven/$groupId/$artifactId/pom.propertiesMaven version in pom.properties
MPLUGIN-261 ((tick) maven-plugin-plugin 3.3)META-INF/maven/plugin.xmlgenerated plugin.xml is non-deterministic
MPLUGIN-326 ((tick) maven-plugin-plugin 3.5.1)META-INF/maven/plugin.xml
META-INF/maven/$groupId/$artifactId/plugin-help.xml
Timestamp in plugin.xml and plugin-help.xml descriptors generated by maven-plugin-tools-generator
plexus-containers issue #8 ((tick) plexus-component-metadata 2.0.0)
META-INF/plexus/components.xmlsort components when generating META-INF/plexus/components.xml
plexus-containers issue #27 ((tick) plexus-component-metadata 2.1.0)META-INF/plexus/components.xmlsort components when merging discovered components with manually crafted component files
bnd-maven-plugin #3521 ((tick) bnd-maven-plugin configuration)META-INF/MANIFEST.MFsee bnd-maven-plugin documentation to configure Reproducible Build
FELIX-6269 ((tick) maven-bundle-plugin:manifest & bundle 4.2.2)META-INF/MANIFEST.MF
  • "Built-By: <user name>": user name not reproducible
  • "Build-Jdk: <detailed JDK version>": patch version of the JDK not reproducible
  • "Private-Package" has not the same order between builds
FELIX-6203 ((tick) maven-bundle-plugin:bundle 4.2.2)META-INF/maven/$groupId/$artifactId/pom.propertiescurrent timestamp  in pom.properties for bundle goal
sisu-maven-plugin PR#5 ((tick) sisu.inject 0.3.4)META-INF/sisu/javax.inject.Named

META-INF/sisu/javax.inject.Named content (created by sisu-maven-plugin) has non reproducible order for content

MRRESOURCES-114 ((tick) maven-remote-resources-plugin 1.7.0)projectTimespan, as often printed in META-INF/NOTICEprojectTimespan property, containing current year, is calculated using current date through new Date()
JDK-8240734 (JDK 15, perhaps JDK 11uxxx)module-info.classModuleHashes attribute in module-info.class not reproducible between builds (see Java core-libs-dev email thread)
zip entries timestamp and order
COMPRESS-485 ((tick) commons-compress 1.19)keep entries order when gathering ParallelScatterZipCreator
plexus-archiver issue #48, PR #49 ((tick) plexus-archiver 4.2.1)avoid timestamp issues in archives created by plexus-archiver (widely used in Maven plugins creating jar, zip, war, tar... archives)
plexus-archiver issue #114 ((tick) plexus-archiver 4.2.0)To enable reproducible builds `AbstractArchiver#addFileSet` should add the files in order

MSHARED-837 ((tick) maven-archiver 3.5.0)

support SOURCE_DATE_EPOCH environment variable or equivalent: see https://reproducible-builds.org/docs/timestamps/

=> see "Output Archive Entries Timestamp" section of the proposal


remove variation based on user's umask on Unixes
plexus-archiver #124 ((tick) plexus-archiver 4.2.0)remove variation based on uid/gid & userName/groupName in tar
MSOURCES-120 ((tick) maven-source-plugin 3.2.0)apply reproducible zip (entries order and timestamp) to maven-source-plugin
MASSEMBLY-921 ((tick) maven-assembly-plugin 3.2.0)apply reproducible archive (entries order and timestamp) to maven-assembly-plugin
MJAR-263 ((tick) maven-jar-plugin 3.2.0)apply reproducible zip (entries order and timestamp) to maven-jar-plugin
MSITE-851 ((tick) maven-site-plugin 3.9.0)apply reproducible zip (entries order and timestamp) to site:jar 
MJAVADOC-627 ((tick) maven-javadoc-plugin 3.2.0)apply reproducible zip (entries order and timestamp) to javadoc:*jar 
MSHADE-347 ((tick) maven-shade-plugin 3.2.2)apply reproducible zip (entries order and timestamp) to shade:shade
MSHADE-352 ((tick) maven-shade-plugin 3.2.3)keep reproducible timestamp when shading with transformer
ARCHETYPE-590 ((tick) maven-archetype-plugin 3.2.0)apply reproducible zip (entries order and timestamp) to archetype:jar 
MWAR-432 ((tick) maven-war-plugin 3.3.0)apply reproducible zip (entries order and timestamp) to war:jar 
MACR-53 ((tick) maven-acr-plugin 3.2.0)
MEAR-280 ((tick) maven-ear-plugin 3.1.0)
MEJB-128 ((tick) maven-ejb-plugin 3.1.0)
MRAR-86 ((tick) maven-rar-plugin 3.0.0)
MJLINK (maven-jlink-plugin)
issues fixed in maven-archiver will have to be picked by 9 other plugins managed by Apache Maven team (acr, ear, ejb, jlink, rar) and perhaps other plugins managed outside Apache Maven team
FELIX-6304 ((tick) maven-bundle-plugin:bundle)order and timestamp of jar entries for bundle goal
spring-boot-maven-plugin:repackage #20176 ((tick) 2.3.0-M4)timestamp

Debian approach

Debian has a strong reproducible builds structure working on the topic for a few years: see BuildinfoFiles for environment info recording.

On java and Maven issues, Debian maintains a serie of patches that perhaps could be integrated (thank you Emmanuel Bourg for the summary):

REX on Clojure: source .clj must have one second difference with .class, or Clojure will recompile