Re: Questions on GSoC 2019 Ideas
- Date: Wed, 6 Mar 2019 17:17:17 +0700
- From: Duy Nguyen <pclouds@xxxxxxxxx>
- Subject: Re: Questions on GSoC 2019 Ideas
On Wed, Mar 6, 2019 at 6:47 AM Matheus Tavares Bernardino
> This exercise of estimating a good spot to gain performance with
> parallelism at git seems more difficult than I thought, firstly. Also,
> I'm not that familiar yet with git packing (neither with the sections
> of it that could benefit from parallelism). So could anyone point me
> some good references on this, where I could study and maybe come back
> with more valuable suggestions?
I think you should skim through
Documentation/technical/pack-format.txt to have an idea what we're
talking about here (inflation, delta cache...).
The short (and slightly inaccurate) version is, in order to give the
content of a SHA-1, we need to
1. locate that "object" in the pack file (I'm ignoring loose objects)
2. assume (worst case) that this object is a delta object, it contains
only modification of another object, so you would need to get the
content of the other object first, then apply these modification ("the
delta") to get the content. This could repeat many times
3. once you get full content, it's actually zlib compressed, so you
need to uncompress/inflate it.
Step 2 would be super slow if you have to uncompress that "another
object" every single time. So there's a "delta cache" which stores the
content of these "other objects". Next time you see a delta object on
top of one of these in the cache, you can just apply the delta and be
done with it. Much faster. This delta cache is of course global and
not thread safe.
Step 3 can also be slow when dealing with large blobs.
Another global state that Jeff mentions is pack windows I think is a
bit harder to explain quickly. But basically we have many "windows" to
see the raw content of a pack file, these windows are global (per pack
actually) and are also not thread safe.
So all these steps are not thread safe. When a command with multiple
thread support accesses the pack, all those steps are protected by a
single mutex (it's grep_mutex in builtin/grep.c or read_lock() in
builtin/pack-objects.c). As you can see steps here are CPU-bound (step
3 is obvious, step 2 will have to inflate the other objects), so if
you use a more fine-grained mutex, chances are the inflation step can
be done in parallel.
I think the good spots to gain performance are the commands that
already have multiple thread support. I mentioned git-grep and
git-pack-objects.c above. git-index-pack is the third one but it's
special and I think does not use general pack access code.
I think if we could somehow measure lock contention of those big locks
above we can guesstimate how much gain there is. If threads in
git-grep for example often have to wait for grep_mutex, then in the
best case scenario when you make pack access thread safe, lock
contention goes down to near zero.
> On Tue, Mar 5, 2019 at 9:57 AM Duy Nguyen <pclouds@xxxxxxxxx> wrote:
> > On Tue, Mar 5, 2019 at 11:51 AM Jeff King <peff@xxxxxxxx> wrote:
> > > > processing power from multiple cores, but about _not_ blocking. I
> > > > think one example use case here is parallel checkout. While one thread
> > > > is blocked by pack access code for whatever reason, the others can
> > > > still continue doing other stuff (e.g. write the checked out file to
> > > > disk) or even access the pack again to check more things out.
> > >
> Hmm, you mean distributing the process of inflating, reconstructing
> deltas and checking out files between the threads? (having each one
> doing the process for a different file?)
Yes. So if one thread hits a giant file (and spends lot of time
inflating), the other threads can still go on inflate smaller files
and writing them to disk.
> > > I'm not sure if it would help much for packs, because they're organized
> > > to have pretty good cold-cache read-ahead behavior. But who knows until
> > > we measure it.
> > >
> > > I do suspect that inflating (and delta reconstruction) done in parallel
> > > could be a win for git-grep, especially if you have a really simple
> > > regex that is quick to search.
> > Maybe git-blame too. But this is based purely on me watching CPU
> > utilization of one command with hot cache. For git-blame though, diff
> > code as to be thread safe too but that's another story.
> I don't know if this relates to parallelizing pack access, but I
> thought that sharing this with you all could perhaps bring some new
> insights (maybe even on parallelizing some other git section): I asked
> my friends who contribute to the Linux Kernel what git commands seems
> to take longer during their kernel work, and the answers were:
> - git log and git status, sometimes
> - using pager's search at git log
> - checking out to an old commit
> - git log --oneline --decorate --graph
Sometimes the slowness is not because of serialized pack access. I'm
sure git-status is not that much impacted by slow pack access.
The pager search case may be sped up (git-log has to look at lots of
blobs and trees) but you also need to parallelize diff code as well
and that I think is way too big to consider.
The checking out old commit, I think could also be sped up with
parallel checkout. Again this is really out of scope. But of course it
can't be done until pack access is thread safe. Or can be done but not
as pretty .  suggests 30% speedup. That's the share-nothing best
possible case, I think. But that code is no way optimized so real
speedup could be higher.