Re: Git blame performance on files with a lot of history
- Date: Mon, 17 Dec 2018 12:59:33 -0800
- From: Clement Moyroud <clement.moyroud@xxxxxxxxx>
- Subject: Re: Git blame performance on files with a lot of history
On Fri, Dec 14, 2018 at 1:31 PM Derrick Stolee <stolee@xxxxxxxxx> wrote:
> Please double-check that you have the 'core.commitGraph' config setting
> enabled, or you will not read the commit-graph at run-time:
> git config core.commitGraph true
Yeah, this is what happens when trying too many things at once :( I
had removed it to get
with/without scores, and forgot to re-enable it before trying my last
set of experiments.
Here are the results with it enabled:
> time GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y /path/to/git rev-list --count --full-history HEAD -- important/file.C
10:32:06.665057 revision.c:483 bloom filter total queries:
286363 definitely not: 234605 maybe: 51758 false positives: 48212 fp
GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y rev-list --count
HEAD - 2.62s user 0.14s system 97% cpu 2.830 total
> time /path/to/git rev-list --count --full-history HEAD -- ic/lv/src/iclv/drc_compiler.C
/path/to/git rev-list 8.86s user 0.15s system 99% cpu 9.031 total
So I'm getting a 3x benefit, not bad! This is on the re-repacked repo,
which is why I ran again
with and without the Bloom filter.
Let's see what this does for blame:
> time GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y /path/to/git blame master -- important/file.C > /tmp/foo
Blaming lines: 100% (33179/33179), done.
12:50:42.703522 revision.c:483 bloom filter total queries: 0
definitely not: 0 maybe: 0 false positives: 0 fp ratio: -nan
GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y blame master --
> 132.59s user 2.15s system 99% cpu 2:14.95 total
Seems like it's not implemented for blame operations. I'll be happy to
test any implementation.