Web lists-archives.com

Re: Is there some script to find un-delta-able objects?

On Fri, Oct 05, 2018 at 06:44:25PM +0200, Ævar Arnfjörð Bjarmason wrote:

> Some version of the former. Ones where we haven't found any (or much of)
> useful deltas yet. E.g. say I had a repository with a lot of files
> generated by this command at various points in the history:
>     dd if=/dev/urandom of=file.binary count=1024 bs=1024
> Some script similar to git-sizer which could report that the
> packed+compressed+delta'd version of the 10 *.binary files I had in my
> history had a 1:1 ratio of how large they were in .git, v.s. how large
> the sum of each file retrieved by "git show" was (i.e. uncompressed,
> un-delta'd).

You can get the uncompressed and on-disk sizes with:

  git cat-file --batch-all-objects \
    --batch-check='%(objectname) %(objectsize) %(objectsize:disk)'

and then compare the sizes/ratios however you like. If you want just a
subset of the blobs, drop the "--batch-all-objects" and just feed the
object names or even "HEAD:filename" on stdin).

> That doesn't mean that tomorrow I won't commit 10 new objects which
> would have a really good delta ratio to those 10 existing files,
> bringing the ratio to ~1:2, but if I had some report like:
>     <ratio> <extension>
> For a given repo that could be fed into .gitattributes to say we
> shouldn't bother to delta files of certain extensions.

I don't know of a tool that does that, but I think a modest application
of perl to the cat-file output would produce it.