Re: Git blame performance on files with a lot of history
- Date: Fri, 14 Dec 2018 16:31:24 -0500
- From: Derrick Stolee <stolee@xxxxxxxxx>
- Subject: Re: Git blame performance on files with a lot of history
On 12/14/2018 1:29 PM, Clement Moyroud wrote:
My group at work is migrating a CVS repo to Git. The biggest issue we
face so far is the performance of git blame, especially compared to
CVS on the same file. One file especially causes us trouble: it's a
30k lines file with 25 years of history in 3k+ commits. The complete
repo has 200k+ commits over that same period of time.
I think the 30k lines is the bigger issue than the 200k+ commits. I'm
not terribly familiar with the blame code, though.
Currently, 'cvs annotate' takes 2.7 seconds, while 'git blame'
(without -M nor -C) takes 145s.
I tried using the commit-graph with the Bloom filter, per
Thanks for the interest in this prototype feature. Sorry that it doesn't
appear to help you in this case. It should definitely be a follow-up
when that feature gets polished to production-quality.
Please double-check that you have the 'core.commitGraph' config setting
enabled, or you will not read the commit-graph at run-time:
Looking at the blame code, it does not seem to be able to use the
commit graph, so I tried the same rev-list command from the e-mail,
using my own file:
> GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y
/path/to/git rev-list --count --full-history HEAD -- important/file.C
git config core.commitGraph true
I see that the commit introducing GIT_TRACE_BLOOM_FILTER  does
nothing if the commit-graph is not loaded.