Web lists-archives.com

Re: [PATCH v2 3/5] gc --auto: exclude base pack if not enough mem to "repack -ad"

Duy Nguyen <pclouds@xxxxxxxxx> writes:

>>> +Set environment variable `GIT_TRACE` in order to see the memory usage
>>> +estimation in `git gc --auto` that determines whether the base pack is
>>> +kept.
>> This is somewhat a puzzling use of trace.  It sounds more like a way
>> to find out "how" memory usage estimation is done and arriving at a
>> wrong value for those who want to debug the feature.
> Yeah. I'm not sure if this estimation could be really problematic that
> people need to debug this way. A cleaner way (if we think people will
> need this often) is just add a new option in "git gc" to report this
> estimation breakdown and do nothing else.

Actually after reading the implementation and seeing what it does, I
personally do not have any problem with the way GIT_TRACE is used
for this purpose in this patch.  I am not sure how interesting the
information given by that codepath in real life; it looks even less
intereseting than say what comes out of "verify-pack --stat".

>> This is finding the largest pack.
> The discussion on .keep files raises one question for me, what if the
> largest pack already has a .keep file, do we still consider it the
> base pack, or should we find the next largest non-kept pack?
> I'm guessing we keep things simple here and ignore .keep files.

I agree that it is a sensible design decision.

>>> +#elif defined(GIT_WINDOWS_NATIVE)
>>> +     MEMORYSTATUSEX memInfo;
>>> +
>>> +     memInfo.dwLength = sizeof(MEMORYSTATUSEX);
>>> +     if (GlobalMemoryStatusEx(&memInfo))
>>> +             return memInfo;ullTotalPhys;
>> Is this legal C in Microsoft land?
> That's the problem with writing code without a way to test it. At
> least travis helped catch a compiler bug on mac.
> I'm torn between just #error here and let Windows/Mac guys implement
> it (which they may scream "too much work, I don't wanna") but if I
> help write something first, yes things are potentially broken and need
> verification from those guys. I guess I'll just fix this up and hope
> non-linux guys do the rest.

Yup, we all collaborate and help in ways each of us can.  None of us
can be expected to do any more than that ;-)

>> I wonder if the above should go somewhere under compat/ without
>> ifdef but split into separate files for individual archs (I do not
>> know the answer to this question).
> My first choice too. I chose this way after seeing online_cpus()
> thread-utils.c. Not sure which way is best either.


>> But to those who say "packs larger than this value is too big" via
>> configuration, keeping only the largest of these above-threshold
>> packs would look counter-intuitive, wouldn't it, I wonder?
> I think I'll just clarify this in the document. There may be a use
> case for keeping multiple large packs, but I don't see it (*). We can
> deal with it when it comes.

When the project's history grows too much, a large pack that holds
its first 10 years of stuff, together with another one that holds
its second 20 years of stuff, may both be larger than the threshold
and want to be kept.  If we pick only the largest one, we would
explode the other one and repack together with loose objects.

But realistically, those who would want to control the way in which
their repository is packed to such a degree are very likely to add
".keep" files to these two packfiles themselves, so the above would
probably not a concern.  Perhaps we shouldn't do the "automatically
pick the largest one and exclude from repacking" when there is a
packfile that is marked with ".keep"?

> The use of super large gc.bigBasePackThreshold to disable this keeping
> base pack is intended. But I can't go too high here it may break
> limits on 32 bit platforms. And 2g sounds really arbitrary.

You could use 42m instead to clarify that it really is an arbitrary
threshold that was chosen only for the purpose of this test perhaps?