Web lists-archives.com

Re: [PATCH 1/1] protocol: limit max protocol version per service

On Wed, Oct 3, 2018 at 2:34 PM Josh Steadmon <steadmon@xxxxxxxxxx> wrote:

> I believe that git-upload-archive can still speak version 1 without any
> trouble, but it at least doesn't break anything in the test suite to
> limit this to v0 either.

ok, let me just play around with archive to follow along:

Running the excerpt that I found in one of the tests in t5000

    GIT_TRACE_PACKET=1 git archive --remote=. HEAD >b5.tar

(whoooa ... that spews a lot of text into my terminal, which makes
sense as transported tar files unlike packets starting with PACK are
not cut short; so onwards:)

    $ git init test && cd test
    $ GIT_TRACE_PACKET=1 git archive --remote=. HEAD >b5.tar
    git< \2fatal: no such ref: HEAD
    fatal: sent error to the client: git upload-archive: archiver died
with error
    remote: git upload-archive: archiver died with error

This sounds similar to a bug that I have on my todo list for
clone recursing into submodules. Maybe we need to talk
about HEAD-less repositories and how to solve handling them
in general. Usually it doesn't happen except for corner cases like
now, so:

    $ git commit --allow-empty -m "commit"
    [master (root-commit) 24d7967] commit
    $ GIT_TRACE_PACKET=1 git archive --remote=. HEAD >b5.tar
15:28:00.762615 pkt-line.c:80           packet:          git> argument HEAD
15:28:00.762704 pkt-line.c:80           packet:          git> 0000
15:28:00.766336 pkt-line.c:80           packet:          git> ACK
15:28:00.766428 pkt-line.c:80           packet:          git> 0000
15:28:00.766483 pkt-line.c:80           packet:          git< ACK
15:28:00.766508 pkt-line.c:80           packet:          git< 0000
15:28:00.767694 pkt-line.c:80           packet:          git< \2
15:28:00.767524 pkt-line.c:80           packet:          git< argument HEAD
15:28:00.767583 pkt-line.c:80           packet:          git< 0000
remote: 15:28:00.767524 pkt-line.c:80           packet:          git<
argument HEAD
remote: 15:28:00.767583 pkt-line.c:80           packet:          git< 0000
15:28:00.768758 pkt-line.c:80           packet:          git> 0000
15:28:00.770475 pkt-line.c:80           packet:          git<
    ... \0\0\0\0\0\0\0\0\0\0\0\0\ ...
 # not too bad but the tar file contains a lot of zeros

Ah I forgot the crucial part, so

    $ GIT_TRACE_PACKET=1 git -c protocol.version=1 archive --remote=.
HEAD >b5.tar
15:33:10.030508 pkt-line.c:80           packet:          git> argument HEAD

This pretty much looks like v0 as v1 would send a "version 1", c.f.

    $ GIT_TRACE_PACKET=1 git -c protocol.version=1 fetch .
15:34:26.111013 pkt-line.c:80           packet:  upload-pack> version 1
15:34:26.111140 pkt-line.c:80           packet:        fetch< version 1

> Is there a method or design for advertising multiple acceptable versions
> from the client?

I think the client can send multiple versions, looking through protocol.c
(and not the documentation as I should for this:)

   * Determine which protocol version the client has requested.  Since
   * multiple 'version' keys can be sent by the client, indicating that
   * the client is okay to speak any of them, select the greatest version
   * that the client has requested.  This is due to the assumption that
   * the most recent protocol version will be the most state-of-the-art.
    const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
    string_list_split(&list, git_protocol, ':', -1);
    for_each_string_list_item(item, &list) {
        if (skip_prefix(item->string, "version=", &value))

in determine_protocol_version_server which already had the client
speak to it, so I think at least the server can deal with multiple versions.

But given that we transport the version in env variables, we'd
need to be extra careful if we just did not see the version
in the --remote=. above?

> From my understanding, we can only add a single
> version=X field in the advertisement, but IIUC we can extend this fairly
> easily? Perhaps we can have "version=X" to mean the preferred version,
> and then a repeatable "acceptable_version=Y" field or similar?

Just re-use "version X", separated by colons as above.

> > From a maintenance perspective, do we want to keep
> > this part of the code central, as it ties protocol (as proxied
> > by service name) to the max version number?
> > I would think that we'd rather have the decision local to the
> > code, i.e. builtin/fetch would need to tell protocol.c that it
> > can do (0,1,2) and builtin/push can do (0,1), and then the
> > networking layers of code would figure out by the input
> > from the caller and the input from the user (configured
> > protocol.version) what is the best to go forward from
> > then on.
> I like having it centralized, because enforcing this in git_connect()
> and discover_refs() catches all the outgoing version advertisements, but
> there's lots of code paths that lead to those two functions that would
> all have to have the acceptable version numbers plumbed through.

Makes sense.

> I suppose we could also have a registry of services to version numbers,
> but I tend to dislike non-local sources of data. But if the list likes
> that approach better, I'll be happy to implement it.

> > But I guess having the central place here is not to
> > bad either. How will it cope with the desire of protocol v2
> > to have only one end point (c.f. serve.{c,h} via builtin/serve
> > as "git serve") ?
> I'm not sure about this. In my series to add a v2 archive command, I
> added support for a new endpoint for proto v2 and I don't recall seeing
> any complaints, but that is still open for review.

Ah I guess new end points would imply that you can speak at least
a given version N.

> I suppose if we are strict about serving from a single endpoint, the
> version registry makes more sense, and individual operations can declare
> acceptable version numbers before calling any network code?

Ah yeah, that makes sense!

Thanks for your explanations and prodding,