Web lists-archives.com

Re: [RFC][PATCH] index-pack: add testcases found using AFL

On 10/03/2017 20:06, Jeff King wrote:
On Fri, Mar 10, 2017 at 04:15:56PM +0100, Vegard Nossum wrote:

I've used AFL to generate a corpus of pack files that maximises the edge
coverage for 'git index-pack'.

This is a supplement to (and not a replacement for) the regular test cases
where we know exactly what each test is checking for. These testcases are
more useful for avoiding regressions in edge cases or as a starting point
for future fuzzing efforts.

To see the output of running 'git index-pack' on each file, you can do
something like this:

  make -C t GIT_TEST_OPTS="--run=34 --verbose" t5300-pack-object.sh

I observe the following coverage changes (for t5300 only):

  path                  old%  new%    pp
  builtin/index-pack.c  74.3  76.6   2.3
  pack-write.c          79.8  80.4    .6
  patch-delta.c         67.4  81.4  14.0
  usage.c               26.6  35.5   8.9
  wrapper.c             42.0  46.1   4.1
  zlib.c                58.7  64.1   5.4

I'm not sure how I feel about this. More coverage is good, I guess, but
we don't have any idea what these packfiles are doing, or whether
index-pack is behaving sanely in the new lines. The most we can say is
that we tested more lines of code and that nothing segfaulted or
triggered something like ASAN.

That's something I guess, but I'm not enthused by the idea of just
dumping a bunch of binary test cases that nobody, not even the author,

I understand your concern. This is how I see it:


 - 'make test' takes 1 second longer to run

 - 548K data added to git.git


 - regularly exercising more of the code, especially some of the corner
cases which are not caught by the rest of the test suite, possibly
catching bugs in a security-critical bit of git before it makes it into
a release

 - no impact to existing code, everything self-contained in 1 directory

 - giving more people access to the testcases I discovered without
having to repeat the effort of setting up AFL, fixing up SHA1 checksums,
minimising the corpus, running AFL for a week, etc. (each step by itself
is pretty small, but taken altogether I think it's worthwhile not
to have to repeat that).

Then I guess you have to weigh the negatives and positives. For me it's
a clear net win, but others may see it differently.

For sure, I (or somebody else) can go through each testcase and figure
out what it's doing, what it's doing differently from the existing
manual testcases in t5300, and what its expected output should be. It's
not that I couldn't understand what each testcase is doing if I tried,
but I don't think it's worth the effort. I've run everything under
valgrind and the only thing that turned up were some suspicious-looking
allocations (which AFAICT should be safe to ignore because of git's
built-in limits). Otherwise it's mostly hitting sanity checks:

error: inflate: data stream error (incorrect header check)
fatal: pack has 4 unresolved deltas
fatal: pack has bad object at offset 12: delta base offset is out of bound
fatal: invalid tag

These are errors you wouldn't see normally and which the existing
testcases don't check for (or not exhaustively or systematically, in any

If somebody were to look at the code and say: "hey, that check looks a
bit off" (which is something I personally do all the time), then being
able to quickly find an input to execute exactly that line of code is
extremely valuable -- and you can do that simply by running the
testcases through gcov.

Anyway, the patch/data is there, use it or don't.