Web lists-archives.com

Re: Design of multiple hash support

On Sun, Nov 4, 2018 at 6:36 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
> "brian m. carlson" <sandals@xxxxxxxxxxxxxxxxxxxx> writes:
> > I'm currently working on getting Git to support multiple hash algorithms
> > in the same binary (SHA-1 and SHA-256).  In order to have a fully
> > functional binary, we'll need to have some way of indicating to certain
> > commands (such as init and show-index) that they should assume a certain
> > hash algorithm.
> >
> > There are basically two approaches I can take.  The first is to provide
> > each command that needs to learn about this with its own --hash
> > argument.  So we'd have:
> >
> >   git init --hash=sha256
> >   git show-index --hash=sha256 <some-file
> >
> > The other alternative is that we provide a global option to git, which
> > is parsed by all programs, like so:
> >
> >   git --hash=sha256 init
> >   git --hash=sha256 show-index <some-file
> I am assuming that "show-index" above is a typo for something like
> "hash-object"?

Actually both seem plausible, as both do not require
RUN_SETUP, which means they cannot rely on the
extensions.objectFormat setting.

When having a global setting, would that override the configured
object format extension in a repository, or do we error out?

So maybe

  git -c extensions.objectFormat=sha256 init

is the way to go, for now? (Are repository format extensions parsed
just like normal config, such that non-RUN_SETUP commands
can rely on the (non-)existence to determine whether to use
the default or the given hash function?)

> It is hard to answer the question without knowing what exactly does
> "(to) support multiple hash algorithms" mean.  For example, inside
> today's repository, what should this command do?
>         git --hash=sha256 cat-file commit HEAD

There is a section "Object names on the command line"
in Documentation/technical/hash-function-transition.txt
and I assume that this before the "dark launch"
phase, so I would expect the latter to work (no error
but conversion/translation on the fly) eventually as a goal.
But the former might be in scope of one series.

> It can work this way:
>  - read HEAD, discover that I am on 'master' branch, read refs/heads/master
>    to learn the object name in 40-hex, realize that it cannot be
>    sha256 and report "corrupt ref".
> Or it can work this way:
>  - read repository format, realize it is a good old sha1 repository.
>  - do the usual thing to get to read_object() to read the commit
>    object data for the commit at HEAD, doing all of it in sha1.
>  - in the commit object data, locate references to other objects
>    that use sha1 name.
>  - replace these sha1 references with their sha256 counterparts and
>    show the result.
> I am guessing that you are doing the former as a good first step, in
> which case, as an option that changes/affects the behaviour of git
> globally, I think "git --hash=sha256" would make sense, like other
> global options like --literal-pathspecs and --no-replace-objects.
> Thanks.