Web lists-archives.com

Bad performance when using git log --parents (used by gitk)




Hi!

The LLVM project is moving from SVN to git, creating a single repo on github for several LLVM sub-projects.
In the past we have had one git repo mirror for each sub-project (mirroring the SVN projects).

Unfortunately, I've seen some performance problems with git (or rather gitk) when starting to use the new llvm-project git repo.

It seems like gitk is using "git log --no-color -z --pretty=raw --show-notes --parents --boundary HEAD -- <file>" when loading the history. So it seems to be the performance of "git log --parents . -- <file>" that is causing the performance problem afaict.


Example:

Run "git log --parents" for an old file (bswap.ll), and a brand new file (dummy).

First we try it using the new "llvm-project" repository.

--------------------------------------------------------------------------------
bash-4.1$ git clone https://github.com/llvm/llvm-project.git && cd llvm-project
Cloning into 'llvm-project'...
remote: Enumerating objects: 130, done.
remote: Counting objects: 100% (130/130), done.
remote: Compressing objects: 100% (98/98), done.
remote: Total 3361980 (delta 39), reused 58 (delta 26), pack-reused 3361850
Receiving objects: 100% (3361980/3361980), 605.50 MiB | 15.63 MiB/s, done.
Resolving deltas: 100% (2755544/2755544), done.
Checking out files: 100% (82618/82618), done.

bash-4.1$ /usr/bin/time git log --parents -- llvm/test/CodeGen/Generic/bswap.ll >> /dev/null
190.63user 0.43system 3:11.01elapsed 100%CPU (0avgtext+0avgdata 702756maxresident)k
232inputs+0outputs (2major+177913minor)pagefaults 0swaps

bash-4.1$ touch dummy
bash-4.1$ git add dummy
bash-4.1$ git commit -m "test"
[master ce43ac2e487] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 dummy
bash-4.1$ /usr/bin/time git log --parents -- dummy >> /dev/null
205.54user 0.37system 3:25.83elapsed 100%CPU (0avgtext+0avgdata 644576maxresident)k
0inputs+0outputs (0major+163134minor)pagefaults 0swaps
--------------------------------------------------------------------------------


Now do the same for the old "llvm" repository.

--------------------------------------------------------------------------------
bash-4.1$ git clone https://github.com/llvm-mirror/llvm.git llvm && cd llvm
Cloning into 'llvm'...
remote: Enumerating objects: 84, done.
remote: Counting objects: 100% (84/84), done.
remote: Compressing objects: 100% (61/61), done.
remote: Total 1673859 (delta 25), reused 35 (delta 23), pack-reused 1673775
Receiving objects: 100% (1673859/1673859), 373.08 MiB | 12.72 MiB/s, done.
Resolving deltas: 100% (1369306/1369306), done.
Checking out files: 100% (36477/36477), done.
bash-4.1$ /usr/bin/time git log --parents -- test/CodeGen/Generic/bswap.ll >> /dev/null
4.89user 0.27system 0:05.19elapsed 99%CPU (0avgtext+0avgdata 468072maxresident)k
0inputs+0outputs (0major+120244minor)pagefaults 0swaps

bash-4.1$ touch dummy
bash-4.1$ git add dummy
bash-4.1$ git commit -m "test"
[master 1db81b43a30] test
 1 file changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 dummy
bash-4.1$ /usr/bin/time git log --parents -- dummy >> /dev/null
4.05user 0.24system 0:04.32elapsed 99%CPU (0avgtext+0avgdata 437920maxresident)k
0inputs+0outputs (0major+112503minor)pagefaults 0swaps
--------------------------------------------------------------------------------


So for bswap.ll it takes about 190/5 = 38 times longer time to run "git log --parents",
and for the new dummy file it takes 205/4 = 51 times longer time, when using the new repo.

The size of the llvm-project repo is a little bit larger (since we have merged
several project, so the number of commits increases from ~180000 to ~310000, but I doubt
that such an increase should affect the time for git log --parents by a factor of 50.


>From what I understand --parents can take some time, but I see huge degradation when using our new repo compared to the old.
Not sure if just the repo is too large (or poorly packed?), or if this is a git problem.

Any help understanding this is welcome.

I used git version 2.20.0 in the tests above.


PS. I also think that the problem can be seen for files with longer history, for example CODE_OWNERS.txt (llvm/CODE_OWNERS.txt in llvm-project). But then the git log command starts printing commits much sooner. So with gitk I actually get to see some history just after a few seconds also when using llvm-project (even though it takes some time to load the full history). For the files with a very short history (like the dummy file example) the printout won't happen until at the end (after 200 seconds) so git log (and gitk) just appears to be stuck. Is git log caching the result somehow, not printing anything until it has more than one commit to print?

Regards,
Björn Pettersson A    

Ericsson
Datalinjen 4 (Hus K)
58330, Linköping
Sweden