Web lists-archives.com

Re: What's so special about objects/17/ ?




On Sun, Oct 07 2018, Johannes Sixt wrote:

> Am 07.10.18 um 20:28 schrieb Ævar Arnfjörð Bjarmason:
>> In 2007 Junio wrote
>> (https://public-inbox.org/git/7vr6lcj2zi.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxx/):
>>
>>      +static int need_to_gc(void)
>>      +{
>>      +	/*
>>      +	 * Quickly check if a "gc" is needed, by estimating how
>>      +	 * many loose objects there are.  Because SHA-1 is evenly
>>      +	 * distributed, we can check only one and get a reasonable
>>      +	 * estimate.
>>      +	 */
>
>> 1. We still have this check of objects/17/ in builtin/gc.c today. Why
>>     objects/17/ and not e.g. objects/00/ to go with other 000* magic such
>>     as the 0000000000000000000000000000000000000000 SHA-1?  Statistically
>>     it doesn't matter, but 17 seems like an odd thing to pick at random
>>     out of 00..ff, does it have any significance?
>
> The reason is explained in the comment. And, BTW, you do know about
> this one: https://xkcd.com/221/ don't you? (TLDR: the title is "Random
> Number")

Picking any one number is explained in the comment. I'm asking why 17 in
particular not for correctness reasons but as a bit of historical lore,
and because my ulterior is to improve the GC docs.

The number in that comic is 4 (and no datestamp on when it was
published). Are you saying Junio's patch is somehow a reference to that
xkcd in particular, or that it's just a funny reference in this context?

>> 2. It seems overly paranoid to be checking that the files in
>>    .git/objects/17/ look like a SHA-1. If we have stuff not generated by
>>    git in .git/objects/??/ we probably have bigger problems than
>>    prematurely triggering auto gc, can this just be removed as
>>    redundant. Was this some check e.g. expecting that this would need to
>>    deal with tempfiles in these directories that we created at the time
>>    (but no longer do?)?
>
> It's not about that there are SHA-1s in there, it's about how many
> there are.

Right, I'm wondering if it couldn't be replaced by some general path.c
"number_of_files_in_dir" helper. I.e. why this code is being paranoid
about ignoring the likes of
.git/objects/17/{foo,bar,some-other-garbage}. A number_of_files_in_dir()
would obviously need to ignore "." and "..".