Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
- Date: Wed, 7 Feb 2018 09:13:28 -0500
- From: Ben Peart <peartben@xxxxxxxxx>
- Subject: Re: [PATCH v1] dir.c: don't flag the index as dirty for changes to the untracked cache
On 2/6/2018 7:27 AM, Duy Nguyen wrote:
This is another thing that bugs me. I know you're talking about huge
index files, but at what size should we start this sort of
optimization? Writing down a few MBs on linux is cheap enough that I
won't bother optimizing (and I get my UNTR extension repaired all the
time, so reduced lstat calls and stuff). This "typically" only comes
at certain size, what size?
It's important to identify what we're trading off here. With the
proposed optimization, we'll end up doing less writes of the index but
potentially more lstat calls. Even with a small index, writing the
index is much more expensive than calling lstat on a file. Exactly how
much more expensive depends on a lot of variables but even with a SSD
its still orders of magnitude difference.
That means we could potentially lstat hundreds or thousands of files and
still come out ahead. Given the untracked cache works at the directory
level, the lstat cost actually scales with the number of directories
that have had changes (ie changing a 2nd file in the same directory
doesn't add any additional cost). In "typical" workflows, developers
don't often change hundreds of files across different directories so
we'd "typically" still come out ahead.
We have internal performance tests based on common developer sequences
of commands (status, edit a file, status, add, status, commit, log,
push, etc) that show that having the untracked cache turned on actually
makes these sequences slower. At the time, we didn't have time to
investigate why so we just turned off the untracked cache.
When writing the fsmonitor patch series which can utilize the untracked
cache, I did some investigation into why the untracked cache was slowing
things down in these tests and discovered the cause was the additional
index writes. That is what led to this patch.
I've been sitting on it for months now because I didn't have the time to
write some performance tests for the git perf framework to demonstrate
the problem and how this helps solve it. With the discussion about the
fsmonitor extension, I thought this might be a good time to send it out
If you have the cycles, time a typical series of commands with and
without the untracked cache (ie don't just call status over and over in
a loop) and you should see the same results. Given my limited time
right now, I'm OK putting this on the back burner again until a git perf
test can be written to ensure it actually speeds things up as advertised.