Web lists-archives.com

Re: merge-base --is-ancestor A B is unreasonably slow with unrelated history B

On 1/9/2018 10:17 AM, Ævar Arnfjörð Bjarmason wrote:
This is a pathological case I don't have time to dig into right now:

     git branch -D orphan;
     git checkout --orphan orphan &&
     git reset --hard &&
     touch foo &&
     git add foo &&
     git commit -m"foo" &&
     time git merge-base --is-ancestor master orphan

This takes around 5 seconds on linux.git to return 1. Which is around
the same time it takes to run current master against the first commit in

     git merge-base --is-ancestor 1da177e4c3f4 master

This is obviously a pathological case, but maybe we should work slightly
harder on the RHS of and discover that it itself is an orphan commit.

I ran into this while writing a hook where we'd like to do:

     git diff $master...topic

Or not, depending on if the topic is an orphan or just something
recently branched off, figured I could use --is-ancestor as on
optimization, and then discovered it's not much of an optimization.


This is the same performance problem that we are trying to work around with Jeff's "Add --no-ahead-behind to status" patch [1]. For commits that are far apart, many commits need to be parsed. I think the right solution is to create a serialized commit graph that stores the adjacency information of the commits and can create commit structs quickly. This requires storing the commit id, commit date, parents, and root tree id to satisfy the needs of parse_commit_gently(). Once the framework for this data is constructed, it is simple to add generation numbers to that data and start consuming them in other algorithms (by adding the field to 'struct commit').

I'm working on such a patch right now, but it will be a few weeks before I'm ready.


[1] v5 of --no-ahead-behind https://public-inbox.org/git/20180109185018.69164-1-git@xxxxxxxxxxxxxxxxx/T/#t

[2] v4 of --no-ahead-behind https://public-inbox.org/git/nycvar.QRO.