Re: Is there some script to find un-delta-able objects?
- Date: Fri, 05 Oct 2018 09:47:47 -0700
- From: Junio C Hamano <gitster@xxxxxxxxx>
- Subject: Re: Is there some script to find un-delta-able objects?
Jeff King <peff@xxxxxxxx> writes:
> On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I.e. something to generate the .gitattributes file using this format:
>> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
>> there's some repo scanner utility to spew this out for a given repo.
> I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
> where we're not likely to find a delta? Or ones where Git will not try
> to look for a delta?
> If the latter, I think the only rules are the "-delta" attribute and the
> object size. You should be able to use git-check-attr and "git-cat-file"
> to get that info.
> If the former, I don't know how you would know. We can only report on
> what isn't a delta _yet_.
I am reasonably sure that the question is about solving the former
so that "-delta" attribute is set appropriately.
Iniitially, I thought that it is likely an undeltifiable object has
higher randomness than deltifiable ones and that can be exploited,
but if you have such a highly random blob A (and no other object
like it) in the repository and then later acquire another blob B
that happens to share most of the data with A, then A and B by
themselves will pass the "highly random" test but still yet each can
be expressed as a delta derived from the other. So your "what isn't
a delta yet" is a reasonable assessment of what mechanically can be
Knowledge/heuristic like "No two '*.gpg' files are expected to be
alike" needs something more than the randomness of individual files,