Web lists-archives.com

Re: Git blame performance on files with a lot of history

On Fri, Dec 14, 2018 at 2:48 PM Ævar Arnfjörð Bjarmason
<avarab@xxxxxxxxx> wrote:
> On Fri, Dec 14 2018, Clement Moyroud wrote:
> > My group at work is migrating a CVS repo to Git. The biggest issue we
> > face so far is the performance of git blame, especially compared to
> > CVS on the same file. One file especially causes us trouble: it's a
> > 30k lines file with 25 years of history in 3k+ commits. The complete
> > repo has 200k+ commits over that same period of time.
> There's a real-world repo with a shape & size very similar to this that
> has good performance, gcc.git: https://github.com/gcc-mirror/gcc
>     $ wc -l ChangeLog
>     20240 ChangeLog
>     $ git log --oneline -- ChangeLog | wc -l
>     2676
>     $ git log --oneline | wc -l
>     165309
>     $ time git blame ChangeLog >/dev/null
>     real    0m1.977s
>     user    0m1.909s
>     sys     0m0.069s
> Its history began in 1997, and the changes to the ChangeLog file by its
> nature is fairly evenly spread through that period.
> So check out that repo to see if you have similar or worse
> performance. Does your work repo show the same problem with a history
> produced with 'git fast-export --anonymize', and if so is that something
> you'd be OK with sharing?

Hi Ævar,

I see around 3s here on the GCC repo, but I'm on a VM and the repo is
cloned on an NFS disk, so I'd say it matches :) It's around 45x faster
than my repo, on the same NFS share and VM. So there's definitely
something to improve here on my end (see my reply to Bryan re: repack
in a separate e-mail).

The anonymized export won't work in that case: all file contents are
replaced with 'anonymous blob <n>', so there's no per-line history for
blame to follow. Let me see if I can post-process a non-anonymized
version to keep the relevant data available.