Web lists-archives.com

Re: Questions on GSoC 2019 Ideas

On Mon, Mar 11, 2019 at 09:18:47PM -0300, Matheus Tavares Bernardino wrote:
> I've been thinking on how I could implement a test to estimate the
> lock contention but had no success until now. I wanted to try
> mutrace[2] but couldn't install it; I tried valgrind's drd but it
> didn't seem to report a contention time estimation; And I tried
> measuring the time of "pthread_mutex_lock(&grep_mutex)" but I don't
> know how much significative this value is, since we can't directly
> compare it (the time of many threads at lock) with the overall
> execution time. Do you have an idea on how we could measure the lock
> contention here?

(I'm supposed to be doing something else, but this is more fun :D)

Yeah lock contention is probably hard to measure. But at least we can
measure how much time is blocked by mutex. Something like this [1]
gives me

    $ time ~/w/git/temp/git grep --threads=8 abc HEAD >/dev/null
    warning: let's have some fun
    block_time = 20ms
    block_count = 10725
    real    0m0,379s
    user    0m0,425s
    sys     0m0,073s

>From this I know "git grep" took 379ms and at least 20ms (probably
including measurement overhead) is wasted on pthread_mutex_lock(). It
does not look that significant, I admit.

> Another thing that is occurring to me right now is whether git-grep,
> as it is implemented today, would really benefit from thread-safe pack
> access. I may have to study the code more, but it seems to me that
> just the producer thread uses pack access.

That producer I think is just handing out assignments to worker
threads. The real work is still done by worker threads.

The entry point to pack access in this code is protected by
grep_read_lock(). I believe the multithread one is in this deep call
chain (I found it out by gdb)

[main thread] grep_oid() -> add_work() ->
[worker thread] grep_source() -> grep_source_1() ->
grep_source_is_binary() -> grep_source_load() ->
grep_source_load_oid() -> read_object_file()

Note that there's another source of pack access, the
lock_and_read_oid_file() in grep_tree(). This is where we unpack tree
objects and traverse to get blob SHA-1.

This code is currently on the main thread (maybe this is what you
found?) so it does not really benefit from multi thread. We could
still add some sort of lookahead queue to inflate tree objects in
advance in parallel, but I don't know how much gain that will be.

One thing I didn't notice is we currently force no threads in the case
we need pack access, e.g. "git grep <regex> <commit>" or "git grep
--cached <regex>". So if you need to experiment, you need to hack it
and remove that restriction. That's the "let's have some fun" code in

So, assuming this is CPU bottleneck, let's have a look at how CPU is used.

    perf record git grep --threads=1 abc HEAD >/dev/null
    perf report

shows me the top CPU consumers are

  51,16%  git      libz.so.1.2.11      [.] inflate_fast
  19,55%  git      git                 [.] bmexec

You need to do some guessing here. But I believe the top call is from
inflating objects (either from packs or from loose objects). The
second one is from regular expression engine.

Since the regex here is very short (I deliberately try Jeff's case
where object access dominates), the CPU is mostly used up for object
inflation. That suggests that if we could spread it out across cores,
the gain could be quite good. There aren't many dependencies in grep
tasks so if we spread the workload on 8 cores, execution time should
be reduced by 7 or 8 times.

For fun, I hacked up a horrible, horrible "thread safe" version [2]
that probably broke 99% of git and made helgrind extremely unhappy, so
these numbers are at best guidelines, but..

    $ time ./git grep --threads=1 abc HEAD >/dev/null
    real    0m0,253s
    user    0m0,225s
    sys     0m0,029s
    $ time ./git grep --threads=8 abc HEAD >/dev/null
    warning: let's have some fun!
    real    0m0,157s
    user    0m0,312s
    sys     0m0,089s

You can see the "real" rows show quite good time reduction. Not 8
times reduction, mind you, but that's where serious people dig in and
really speed it up ;-)

[1] https://gitlab.com/snippets/1834609
[2] https://gitlab.com/snippets/1834613

> So, although it would be out of scope for GSoC, checkout, diff and log
> (and maybe others) could all benefit from a thread-safe/parallel pack
> access, right? If so, it is very motivating the impact this project
> could, in theory, have :)

We have to analyze case by case. It may turn out that there are many
opportunity to utilize multi threads. I think checkout is definitely a
good candidate. For "git diff" and "git log" maybe you can try "perf"
to see how much workload is locked in pack access (mostly inflation,
because it's easier to spot from the profile report)