Re: [PATCH v8 2/3] p0006-read-tree-checkout: perf test to time read-tree

On Mon, Apr 10, 2017 at 09:14:02PM +0000, git@xxxxxxxxxxxxxxxxx wrote:

> From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx>
> Created t/perf/repos/many-files.sh to generate large, but
> artificial repositories.

I think this is a good direction. In the long run we might want some
kind of magic to pull from the "library" of repos when running perf
tests, but it's not a big deal to run the script manually and point
GIT_PERF_REPO at the result.

As a bonus, this should be faster when running perf tests, since we can
reuse the built repo when testing each version of Git.

> +## This test measures the performance of various read-tree
> +## and checkout operations.  It is primarily interested in
> +## the algorithmic costs of index operations and recursive
> +## tree traversal -- and NOT disk I/O on thousands of files.
> +## Therefore, it uses sparse-checkout to avoid populating
> +## the ballast files.
> +##
> +## It expects the test repo to have certain characteristics.
> +## Branches:
> +## () master        := an arbitrary commit.
> +## () ballast       := an arbitrary commit with a large number
> +##                     of changes relative to "master".
> +## () ballast-alias := a branch pointing to the same commit
> +##                     as "ballast".
> +## () ballast-1     := a commit with a 1 file difference from
> +##                     "ballast".

I'm OK with leaving these requirements on the repo in the name of
simplicity, though it does make it harder to perf-test against a regular

I wonder if we could make reasonable guesses, like:

  master => HEAD
  ballast => $(git rev-list HEAD | tail -n 1)
  ballast-alias => git branch $ballast
  ballast-1 => HEAD^

That would approximate your conditions in a real-world repository, and
it should be easy to make your synthetic one fit the bill exactly.

I don't know if you'd want to turn on sparse checkout manually or not
when testing a real-world repo.