Web lists-archives.com

Re: [PATCH 0/3] Make add_missing_tags() linear




On Thu, Nov 1, 2018 at 5:32 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
> On 11/1/2018 2:52 AM, Elijah Newren wrote:
> > On Wed, Oct 31, 2018 at 5:05 AM Derrick Stolee <stolee@xxxxxxxxx> wrote:
> >> On 10/31/2018 2:04 AM, Elijah Newren wrote:
> >>>
> >>> On the original repo where the topic was brought up, with commit-graph
> >>> NOT turned on and using origin/master, I see:
> >>>
> >>> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> >>> To /home/newren/repo-mirror
> >>>   * [new branch]       test5 -> test5
> >>>
> >>> real 1m20.081s
> >>> user 1m19.688s
> >>> sys 0m0.292s
> >>>
> >>> Merging this series in, I now get:
> >>>
> >>> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> >>> To /home/newren/repo-mirror
> >>>   * [new branch]       test5 -> test5
> >>>
> >>> real 0m2.857s
> >>> user 0m2.580s
> >>> sys 0m0.328s
> >>>
> >>> which provides a very nice speedup.
> >>>
> >>> Oddly enough, if I _also_ do the following:
> >>> $ git config core.commitgraph true
> >>> $ git config gc.writecommitgraph true
> >>> $ git gc
> >>>
> >>> then my timing actually slows down just slightly:
> >>> $ time git push --follow-tags --dry-run /home/newren/repo-mirror
> >>> To /home/newren/repo-mirror
> >>>   * [new branch]          test5 -> test5
> >>>
> >>> real 0m3.027s
> >>> user 0m2.696s
> >>> sys 0m0.400s
> >> So you say that the commit-graph is off in the 2.8s case, but not here
> >> in the 3.1s case? I would expect _at minimum_ that the cost of parsing
> >> commits would have a speedup in the commit-graph case.  There may be
> >> something else going on here, since you are timing a `push` event that
> >> is doing more than the current walk.
> >>
> >>> (run-to-run variation seems pretty consistent, < .1s variation, so
> >>> this difference is just enough to notice.)  I wouldn't be that
> >>> surprised if that means there's some really old tags with very small
> >>> generation numbers, meaning it's not gaining anything in this special
> >>> case from the commit-graph, but it does pay the cost of loading the
> >>> commit-graph.
> >> While you have this test environment, do you mind applying the diff
> >> below and re-running the tests? It will output a count for how many
> >> commits are walked by the algorithm. This should help us determine if
> >> this is another case where generation numbers are worse than commit-date,
> >> or if there is something else going on. Thanks!
> > I can do that, but wouldn't you want a similar patch for the old
> > get_merge_bases_many() in order to compare?  Does an absolute number
> > help by itself?
> > It's going to have to be tomorrow, though; not enough time tonight.
>
> No rush. I'd just like to understand how removing the commit-graph file
> can make the new algorithm faster. Putting a similar count in the old
> algorithm would involve giving a count for every call to
> in_merge_bases_many(), which would be very noisy.

$ time git push --dry-run --follow-tags /home/newren/repo-mirror
count: 92912
To /home/newren/repo-mirror
 * [new branch]              test5 -> test5

real    0m3.024s
user    0m2.752s
sys    0m0.320s


Also:
$ git rev-list --count HEAD
55764
$ git rev-list --count --all
91820

Seems a little odd to me that count is greater than `git rev-list
--count --all`.  However, the fact that they are close in magnitude
isn't surprising since I went digging for the commit with smallest
generation number not found in the upstream repo, and found:
$ git ls-remote /home/newren/repo-mirror/ | grep refs/tags/v0.2.0; echo $?
1
$ git rev-list --count refs/tags/v0.2.0
4
$ git rev-list --count refs/tags/v0.2.0 ^HEAD
4


So, the commit-graph can only help us avoid parsing 3 or so commits,
but we have to parse the 5M .git/objects/info/commit-graph file, and
then for every parse_commit() call we need to bsearch_graph() for the
commit.    My theory is that parsing the commit-graph file and binary
searching it for commits is relatively fast, but that the time is just
big enough to measure and notice, while avoiding parsing 3 more
commits is a negligible time savings.

To me, I'm thinking this is one of those bizarre corner cases where
the commit-graph is almost imperceptibly slower than without the
commit-graph.  (And it is a very weird repo; someone repeatedly
filter-branched lots of small independent repos into a monorepo, but
didn't always push everything and didn't clean out all old stuff.)
But if you still see weird stuff you want to dig into further (maybe
the 92912 > 91820 bit?), I'm happy to try out other stuff.