Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
- Date: Tue, 12 Jun 2018 18:54:17 +0000
- From: Kirill Smelkov <kirr@xxxxxxxxxx>
- Subject: Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all
On Tue, Jun 12, 2018 at 05:48:49AM -0400, Jeff King wrote:
> On Mon, Jun 11, 2018 at 09:43:02AM +0000, Kirill Smelkov wrote:
> > > Looking deeper, we do not need these trees and blobs at all. The problem
> > > is really just a tag that peels to an object that is not otherwise a ref
> > > tip, regardless of its type.
> > Thanks for feedback and for coming up with the fix. Sure, I'm ok with
> > moving the test into your patch. However, even if a test becomes
> > different - narrowing down root of _current_ problem, I suggest to also
> > keep explicitly testing tag-to-blob and tag-to-tree (and if we really
> > also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip
> > those now, they can potentially break in the future.
> Yeah, I have no problem testing these cases separately. There's no bug
> with them now, but it is a slightly uncommon case. My suggestion would
> be to submit a patch that goes on top of mine that covers these cases.
Ok, I will try to do it.
> > I would also suggest to fix upload-pack, as it is just not consistent to
> > reject sending objects that were advertised, and so can strike again
> > some way in the future. After all git.git's fetch-pack is not the only
> > git client that should be possible to interact with git.git's
> > upload-pack on remote side, right?
> No, it's not the only client. At the same time, I am on the fence over
> whether upload-pack's behavior is wrong or not. It depends what you take
> a peeled advertisement line to mean. Does it mean: this object has been
> advertised and clients should be able to fetch it? Or does it mean: by
> the way, you may be interested to know the peeled value of this tag in
> case you want to do tag-following?
> So far I think it has only meant the latter. I could see an argument for
> the former, but any client depending on that would never have worked,
> AFAICT. We could _make_ it work, but how would a client know which
> server version it's talking to (and therefore whether it is safe to make
> the request?). I think you'd have to add a capability to negotiate.
I see. I don't know the details of the exchange, just it was surprising
for outside observer that fetching what was advertised is rejected. For
the reference there is no strong need for me for this to work anymore
(please see below).
> > I'm not sure, but I would say that `fetch-pack --all` from an empty
> > repository should not fail and should just give empty output as fetch
> > does.
> Yeah, that seems reasonable to me. The die() that catches this dates
> back to 2005-era, and we later taught the "fetch" porcelain to handle
> this. I don't _think_ anybody would be upset that the plumbing learned
> to treat this as a noop. It's probably a one-liner change in
> fetch_pack() to return early instead of dying.
Ok, I will try to send related testcase, and it is indeed easy to find
- the fix itself.
> > For the reference all the cases presented here are real - they appear in
> > our repositories on lab.nexedi.com for which I maintain the backup, and
> > I've noticed them in the process of switching git-backup from using
> > fetch to fetch-pack here:
> > https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436
> I applaud you using the porcelain for your scripts, but I suspect that
> fetch-pack by itself is not at all well-used or well-tested these days
> (certainly this --all bug has been around for almost 6 years and is not
> very hard to trigger in practice).
I see; thanks for the warning.
> If an extra connection isn't a problem, you might be better off with
> "git ls-remote", and then picking through the results for refs of
> interest, and then "git fetch-pack" to actually get the pack. That's how
> git-fetch worked when it was a shell script (e.g., see c3a200120d, the
> last shell version).
Yes, this is what I ended up doing:
but for another reason - to avoid repeating for every fetched repository
slow (in case of my "big" destination backup repository) quickfetch()
checking in every spawned `git fetch`: git-backup can build index of
objects we already have ourselves only once at startup, and then in
fetch, after checking lsremote output, consult that index, and if we see
we already have everything for an advertised reference - just avoid
giving it to fetch-pack to process. It turns out for many pulled
repositories there is usually no references changed at all and this way
fetch-pack can be skipped completely:
> It may also be sane to just use "git fetch", which I'd say is _fairly_
> safe to script. Of course I have no problem if you want to fix all of
> the corner cases in fetch-pack. Just giving you fair warning. :)
Thanks again for the warning. I'm happy the switch to fetch plumbing
happenned on my side, and so far it is working well. Like I said above I
cannot use `git fetch` as is, because quickfetch() overhead for my case
became dominant and very slow, taking ~ 30 seconds of the time just to
check whether we have everything from one fetched repository, which, if
there are 100x or 1000x of such, adds up to hours.
If ls-remote has to be used anyway switching to plumbing seems natural.
Let's see if I hit any more corner place or not - I will be keeping your
warning in mind.