The term "Reproducible Builds" refers to making sure the build process for various artifacts is so deterministic that building the same sources twice results in a bit-by-bit identical artifact. You can read more about it on https://reproducible-builds.org/.

One of the advantages of Reproducible Builds is that, when those two builds happen on independently-managed infrastructure, validating that both environments produce the same bit-by-bit artifact improves the confidence that no backdoor or other malware was injected into the artifact due to a compromise of the infrastructure.

Reproducible Builds for ASF releases

It is good practice for all artifacts released by the ASF to be reproducible. For projects that want to build and sign artifacts on CI, Reproducible Builds are required. This means:

  • your builds must be deterministic enough that independent builds produce bit-by-bit identical artifacts
  • you have documented how and when artifacts are actually independently rebuilt and verified in your release process
  • you follow this process in practice.

Ecosystem-specific notes

See below for any ecosystem-specific notes that could be helpful for other projects to make their builds reproducible. If you have additional input to share, feel free to edit this wiki page. If you have questions or want to discuss approaches, you can use the security-discuss mailinglist or Slack channel.

Java / Maven

If you have a project that is built with Apache Maven, refer to the Configuring for Reproducible Builds guide.

Python

Modern Python tooling (such as Flit and Hatch) support reproducible builds for pure-Python projects. You can read more about reproducible build support in Flit reproducible build docs and Hatch reproducible build docs. It's a bit more complex if your assets require native compilation, but if you can assure that your native compilation produces reproducible libraries on its own the packaging tool will produce reproducible builds..

A few guidelines:

  • You should be following the modern ways of packaging projects - ideally define your project's metadata in pyproject.toml (PEP-621) and specify your build requirements as pinned dependencies following PEP-518
  • In order to get plausible looking packages where files have "real" modification dates, you should - in your build process - set SOURCE_DATE_EPOCH environment variable before running hatch build  or flit build - it should be a fixed timestamp
  • It is recommended that you store your timestamp in the repository and update it whenever release is being prepared (so for example when release notes change). Example how it is stored (in yaml file) and updated automatically (with pre-commit) in Airflow 

You can read more about reproducible build support in Flit reproducible build docs and Hatch reproducible build docs.

Preparing reproducible .tar.gz packages

If you prepare source-tarball, or another .tar.gz packae you can use scripts similar to this one - which takes the same source_date_epoch and repacks the .tar.gz file to be reproducible. There are however few gotchas:

1) Make sure to remove permissions for "group" and "other" for all files that you add to the repository. This is needed because group/other permissions have different deault settings (based on umask) and clearing them is the most certain way of reproducibility. This can be done in a few ways:

  • if you use git archive  - "-c tar.umask=0077" removes all permissions for group/others
  • Just run chmod -R og= <directory>  for the directory to compress - before running the reproducible script

2) if you usse git archive , you can exclude some of the directories with .gitattributes  eport-ignore  specification


  • No labels