Re: Is there some script to find un-delta-able objects?

Jeff King <peff@xxxxxxxx> writes:

> On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> I.e. something to generate the .gitattributes file using this format:
>> https://git-scm.com/docs/gitattributes#_packing_objects
>> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
>> there's some repo scanner utility to spew this out for a given repo.
> I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
> where we're not likely to find a delta? Or ones where Git will not try
> to look for a delta?
> If the latter, I think the only rules are the "-delta" attribute and the
> object size. You should be able to use git-check-attr and "git-cat-file"
> to get that info.
> If the former, I don't know how you would know. We can only report on
> what isn't a delta _yet_.

I am reasonably sure that the question is about solving the former
so that "-delta" attribute is set appropriately.

Iniitially, I thought that it is likely an undeltifiable object has
higher randomness than deltifiable ones and that can be exploited,
but if you have such a highly random blob A (and no other object
like it) in the repository and then later acquire another blob B
that happens to share most of the data with A, then A and B by
themselves will pass the "highly random" test but still yet each can
be expressed as a delta derived from the other.  So your "what isn't
a delta yet" is a reasonable assessment of what mechanically can be

Knowledge/heuristic like "No two '*.gpg' files are expected to be
alike" needs something more than the randomness of individual files,
I guess.