Web lists-archives.com

Re: easy way to demonstrate length of colliding SHA-1 prefixes?

On 12/02/2018 05:23 AM, Ævar Arnfjörð Bjarmason wrote:

On Sun, Dec 02 2018, Robert P. J. Day wrote:

   as part of an upcoming git class i'm delivering, i thought it would
be amusing to demonstrate the maximum length of colliding SHA-1
prefixes in a repository (in my case, i use the linux kernel git repo
for most of my examples).

   is there a way to display the objects in the object database that
clash in the longest object name SHA-1 prefix; i mean, short of
manually listing all object names, running that through cut and sort
and uniq and ... you get the idea.

   is there a cute way to do that? thanks.

Here is a one-liner to do it. It is Perl line noise, so it's not very cute, thought that is subjective. The output shown below is for the Git project (not Linux) repository as I've currently synced it:

$ git rev-list --objects HEAD | sort | perl -anE 'BEGIN { $prev = ""; $long = "" } $n = $F[0]; for my $i (reverse 1..40) {last if $i < length($long); if (substr($prev, 0, $i) eq substr($n, 0, $i)) {$long = substr($prev, 0, $i); last} } $prev = $n; END {say $long}'


$ git cat-file -t c68038ef

error: short SHA1 c68038ef is ambiguous
hint: The candidates are:
hint: c68038effe commit 2012-06-01 - vcs-svn: suppress a signed/unsigned comparison warning
hint:   c68038ef00 blob
fatal: Not a valid object name c68038ef

You'll always need to list them all. It's inherently an operation where
for each SHA-1 you need to search for other ones with that prefix up to
a given length.

Perhaps you've missed that you can use --abbrev=N for this, and just
grep for things that are loger than that N, e.g. for linux.git:

     git log --oneline --abbrev=10 --pretty=format:%h |
     grep -E -v '^.{10}$' |
     perl -pe 's/^(.{10}).*/$1/'

I think the goal was to search all object hashes, not just commits. And git rev-list --objects will do that.