Web lists-archives.com

Re: Is there some script to find un-delta-able objects?




On Fri, Oct 05 2018, Jeff King wrote:

> On Fri, Oct 05, 2018 at 04:20:27PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> I.e. something to generate the .gitattributes file using this format:
>>
>> https://git-scm.com/docs/gitattributes#_packing_objects
>>
>> Some stuff is obvious, like "*.gpg binary -delta", but I'm wondering if
>> there's some repo scanner utility to spew this out for a given repo.
>
> I'm not sure what you mean by "un-delta-able" objects. Do you mean ones
> where we're not likely to find a delta? Or ones where Git will not try
> to look for a delta?
>
> If the latter, I think the only rules are the "-delta" attribute and the
> object size. You should be able to use git-check-attr and "git-cat-file"
> to get that info.
>
> If the former, I don't know how you would know. We can only report on
> what isn't a delta _yet_.

Some version of the former. Ones where we haven't found any (or much of)
useful deltas yet. E.g. say I had a repository with a lot of files
generated by this command at various points in the history:

    dd if=/dev/urandom of=file.binary count=1024 bs=1024

Some script similar to git-sizer which could report that the
packed+compressed+delta'd version of the 10 *.binary files I had in my
history had a 1:1 ratio of how large they were in .git, v.s. how large
the sum of each file retrieved by "git show" was (i.e. uncompressed,
un-delta'd).

That doesn't mean that tomorrow I won't commit 10 new objects which
would have a really good delta ratio to those 10 existing files,
bringing the ratio to ~1:2, but if I had some report like:

    <ratio> <extension>

For a given repo that could be fed into .gitattributes to say we
shouldn't bother to delta files of certain extensions.