Web lists-archives.com

Re: Git blame performance on files with a lot of history




On 12/14/2018 1:29 PM, Clement Moyroud wrote:
My group at work is migrating a CVS repo to Git. The biggest issue we
face so far is the performance of git blame, especially compared to
CVS on the same file. One file especially causes us trouble: it's a
30k lines file with 25 years of history in 3k+ commits. The complete
repo has 200k+ commits over that same period of time.

I think the 30k lines is the bigger issue than the 200k+ commits. I'm not terribly familiar with the blame code, though.

Currently, 'cvs annotate' takes 2.7 seconds, while 'git blame'
(without -M nor -C) takes 145s.

I tried using the commit-graph with the Bloom filter, per
https://public-inbox.org/git/61559c5b-546e-d61b-d2e1-68de692f5972@xxxxxxxxx/.

Thanks for the interest in this prototype feature. Sorry that it doesn't appear to help you in this case. It should definitely be a follow-up when that feature gets polished to production-quality.
Looking at the blame code, it does not seem to be able to use the
commit graph, so I tried the same rev-list command from the e-mail,
using my own file:
     > GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y
/path/to/git rev-list --count --full-history HEAD -- important/file.C
     3576

Please double-check that you have the 'core.commitGraph' config setting enabled, or you will not read the commit-graph at run-time:

    git config core.commitGraph true

I see that the commit introducing GIT_TRACE_BLOOM_FILTER [1] does nothing if the commit-graph is not loaded.

Thanks,
-Stolee

[1] https://github.com/derrickstolee/git/commit/adc469894b755512c9d02f099700ead2a7a78377