Prerequisites

  • python3 and docker
  • time and patience

This page is a work in progress

This example is for PDFBox. After Tika's harnesses have been contributed to the oss-fuzz project, we'll update this page.

Steps

  • grab the repo: git clone https://github.com/google/oss-fuzz && cd oss-fuzz
  • build the image: python3 infra/helper.py build_image pdfbox 
  • build the project and its fuzzers:
    • from pdfbox's github repo main : python3 infra/helper.py build_fuzzers pdfbox 
    • from a local repo: python3 infra/helper.py build_fuzzers pdfbox /home/Intellij/my-pdfbox 
  • run a fuzzer: python3 infra/helper.py run_fuzzer pdfbox PDFExtractTextFuzzer 
  • reproduce a problem: python3 infra/helper.py reproduce pdfbox PDFExtractTextFuzzer build/out/pdfbox/timeout-bc0fe673ec0c97982de56ef8ab1ee08eff081a3b 


Notes

  • When a problem file has been found, it is written to oss-fuzz/build/out/pdfbox
  • 172282862    REDUCE cov: 2049 ft: 7874 corp: 1460/766Kb lim: 4096 exec/s: 16587 rss: 1233Mb L: 197/4096 MS 
    • This means that there's coverage on 2049 paths, and the fuzzer is aware of 7874 paths
  • Seeds are so, so important. PDFBox's PDFExtractText Fuzzer had cov=2049 ft=7874 after several hours with no seeds. When I added 1k pdfs as a seed corpus, I hit cov=13886 ft=58782 within a few minutes.

Typical Workflow

  • Build the image, fuzz, find bug
  • Fix bug in local repo, rebuild image, fuzz again. Find new bugs.
  • Repeat.

While it is possible to configure the fuzzer to keep going, some bugs are just easier to hit. The fuzzer will often trigger/discover the super easy to find and won't reveal the true, rare beauties until after the easier bugs are fixed. In general, I got little benefit from running the fuzzer multiple times... at least to start.

When enrolling a new repo or building a new harness, there will likely be lots of findings initially. Once the initial findings are fixed, the maintenance period is not bad. The startup costs are non-trivial, though, in fixing a repo that was not designed with security as the first goal. 

Common problems

Not building from a local repo

There are two different things that can go wrong.

  1. A number of repos in oss-fuzz do not work out of the box with local builds. The error message looks like this: ERROR:__main__:Cannot use local checkout with "WORKDIR: /src". The workaround is straightforward – change the WORKDIR to something else like: WORKDIR $SRC/project-parent/pdfbox .  You'll also have to adjust your build.sh slightly to reflect the change in the working directory. Obv, then make sure to open a PR to fix these repos in oss-fuzz!!!
  2. This may have been unique to our slight mods to oss-fuzz, but there were a number of times when I thought that I was working from a local repo, but the build was silently building from the github repo and ignoring my local repo. I got into the habit of removing a parenthesis in my local repo to see if that would cause the build to fail. I think  the fix for this was to rm -rf  the oss-fuzz/build/out/pdfbox directory and start from scratch.

Other issues

  • Drive running out of space. I initially started with Docker on Ubuntu installed with snap. The /var/snap/docker directory took up nearly 600GB after a couple of runs because of the way it caches images/containers/something. docker system prune -af  worked to clear that, but then I had to rebuild my images. I uninstalled Docker in snap and reinstalled with apt , and everything was instantly better on this front.


  • No labels