Web lists-archives.com

Re: What's cooking in git.git (Nov 2018, #06; Wed, 21)




On Thu, Nov 22 2018, Jeff King wrote:

> On Wed, Nov 21, 2018 at 11:48:14AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>>
>> On Wed, Nov 21 2018, Junio C Hamano wrote:
>>
>> > * jk/loose-object-cache (2018-11-13) 9 commits
>> >   (merged to 'next' on 2018-11-18 at 276691a21b)
>> >  + fetch-pack: drop custom loose object cache
>> >  + sha1-file: use loose object cache for quick existence check
>> >  + object-store: provide helpers for loose_objects_cache
>> >  + sha1-file: use an object_directory for the main object dir
>> >  + handle alternates paths the same as the main object dir
>> >  + sha1_file_name(): overwrite buffer instead of appending
>> >  + rename "alternate_object_database" to "object_directory"
>> >  + submodule--helper: prefer strip_suffix() to ends_with()
>> >  + fsck: do not reuse child_process structs
>> >
>> >  Code clean-up with optimization for the codepath that checks
>> >  (non-)existence of loose objects.
>> >
>> >  Will cook in 'next'.
>>
>> I think as noted in
>> https://public-inbox.org/git/e5148b8c-9a3a-5d2e-ac8c-3e536c0f2358@xxxxxx/
>> that we should hold off the [89]/9 of this series due to the performance
>> regressions this introduces in some cases (while fixing other cases).
>>
>> I hadn't had time to follow up on that, and figured it could wait until
>> post-2.20 for a re-roll.
>
> Yeah, my intent had been to circle back around to this, but I just
> hadn't gotten to it. I'm still pondering a config option or similar,
> though I remain unconvinced that the cases in which you've showed it
> being slow are actually realistic or worth worrying about

FWIW those "used to be 2ms are now 20-40ms" pushes on ext4 are
representative of the actual prod setup I'm mainly targeting. Now, I
don't run on ext4 this patch helps there, but it seems plausible that it
matters to someone who's counting on that performance.

Buh yeah, it's certainly obscure. I don't blame you if you don't want to
hack on it, and not ejecting this out before 2.20 isn't going to break
anything for me. But do you mind if I make it configurable as part of my
post-2.20 "disable collisions?"

>  (and certainly having an obscure config option is not enough to help
> most people). If we could have it kick in heuristically, that would be
> better.

Aside from this specific scenario. I'd really prefer if we avoid having
heuristic performance optimizations at all costs.

Database servers tend to do that sort of thing with their query planner,
and it results in cases where your entire I/O profile changes overnight
because you're now on the wrong side of some if/else heuristic about
whather to use some index or not.

> However, note that the cache-load for finding abbreviations _must_ have
> the complete list. And has been loading it for some time. So if you run
> "git-fetch", for example, you've already been running this code for
> months (and using the cache in more places is now a free speedup).

This is reminding me that I need to get around to re-submitting my
core.validateAbbrev series, which addresses this part of the problem:
https://public-inbox.org/git/20180608224136.20220-21-avarab@xxxxxxxxx/

> At the very least, we'd want this patch on top, too. I also think René's
> suggestion use access() is worth pursuing (though to some degree is
> orthogonal to the cache).

I haven't had time to test that, and wasn't prioritizing it since I
figured this was post-2.20. My hunch is it doesn't matter much if at all
on NFS. The roundtrip time is what matters, whether that roundtrip is
fstat() or access() probably not.

> -- >8 --
> Subject: [PATCH] odb_load_loose_cache: fix strbuf leak
>
> Commit 66f04152be (object-store: provide helpers for
> loose_objects_cache, 2018-11-12) moved the cache-loading code from
> find_short_object_filename(), but forgot the line that releases the path
> strbuf.
>
> Reported-by: René Scharfe <l.s.r@xxxxxx>
> Signed-off-by: Jeff King <peff@xxxxxxxx>
> ---
>  sha1-file.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/sha1-file.c b/sha1-file.c
> index 5894e48ea4..5a272f70de 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2169,6 +2169,7 @@ void odb_load_loose_cache(struct object_directory *odb, int subdir_nr)
>  				    NULL, NULL,
>  				    &odb->loose_objects_cache);
>  	odb->loose_objects_subdir_seen[subdir_nr] = 1;
> +	strbuf_release(&buf);
>  }
>
>  static int check_stream_sha1(git_zstream *stream,