Web lists-archives.com

Re: BUG: fetch in certain repo always gives "did not send all necessary objects"




On Tue, Feb 06, 2018 at 04:00:32PM -0800, Elijah Newren wrote:

> > According to Peff this got fixed
> > https://public-inbox.org/git/20171020031630.44zvzh3d2vlhglv4@xxxxxxxxxxxxxxxxxxxxx/
> > and but you've had a corrupted repo from back when you were using an older
> > version of Git.
> >
> > Did that repo exist before d0c39a49cc was rolled out? Then we can keep that
> > hypothesis of "left-over corruption" as Peff put it.
> 
> I'm somewhat confused by this explanation.  That precise commit is the
> one I bisected to that _caused_ the fetch to fail.  Also, there might
> be one important difference here -- in the link you provide, it
> suggests that you had a corrupted working directory that made use of a
> now gc'ed commit.  In the case I was able to dig into, we did not.
> (There was a left-over .git/worktree/<something> that had a now gc'ed
> commit, but no working directory that used it.)

If you had a corrupted .git/worktree/<something>/HEAD, then that does
sound like the same problem. It's true that the commit you bisected to
caused "fetch" to fail, but only because it started looking at more of
your corrupted repository. The corruption happened long before (and I
don't know exactly when it was fixed, but I couldn't replicate it
anymore; it might even still exist).

In your case it sounds like you have the extra twist that the matching
working directory for "<something>" had gone away, but I don't think
that materially changes anything. Until you run "git worktree prune",
that HEAD file is still there and still supposed to be valid.

> I suspect you mean that there was another previous bug that induced
> corruption, that this commit fixed that other bug, while also
> introducing this new bug that makes folks' clones unusable because the
> error doesn't provide enough information for users to know how to fix.

If you want to call that last thing a bug, then I guess so. It's perhaps
a matter for the philosophers whether it is the fault of the new code to
start complaining about an existing on-disk corruption.

> It took me hours to figure it out, after users ran out of ideas and
> came and asked me for help.  (Maybe if I was familiar with worktree,
> and knew they had been using it, then I might have guessed that "HEAD"
> meant "not your actual HEAD but the HEAD of the vestige of some other
> worktree").

Yeah, this is the obvious thing that seems like it ought to be improved.

> Does anyone have pointers about what might be doable in terms of
> providing a more useful error message to allow users to recover?
> And/or ideas of what steps could cause corruption so I can send out a
> PSA to help users avoid it?

Here's a minimal manual reproduction:

  # new repo...
  git init
  git commit --allow-empty -m one

  # with a worktree...
  git worktree add foo
  git -C foo commit --allow-empty -m two
  obj=.git/objects/$(git rev-parse foo | sed 's#..#&/#')

  # now we stop using that worktree
  git -C foo checkout --detach
  git branch -f -D foo
  rm -rf foo

  # and this is the corruption; this might have happened ye olden days
  # because of a bug in the worktree code, but we'll assume that somehow
  # the object went away
  rm -f $obj

And now lots of commands may fail with confusing errors:

  $ git prune
  fatal: unable to parse object: HEAD

Unfortunately fixing that is a little tricky. In this case the stack
looks like:

  parse_object_or_die (oid=0x7fffffffd690, name=0x555555792880 "HEAD") at object.c:239
  add_one_ref (path=0x555555792880 "HEAD", oid=0x7fffffffd690, flag=0, cb_data=0x7fffffffd8e0) at reachable.c:38
  refs_head_ref (refs=0x555555a65430, fn=0x5555556b6ef5 <add_one_ref>, cb_data=0x7fffffffd8e0) at refs.c:1316
  other_head_refs (fn=0x5555556b6ef5 <add_one_ref>, cb_data=0x7fffffffd8e0) at worktree.c:404

So other_head_refs knows that it's looking at the worktrees. And it
passes the alternate ref-store to refs_head_ref(), with "add_one_ref" as
the callback. But the knowledge that we're not talking about the real
"HEAD" is lost as we cross that callback boundary. We'd need to either
add another parameter to the callback, or have some way of talking about
"HEAD in this worktree" as a refname (which AFAIK we don't have).

As for PSAs, my normal go-to in confusing matters like this is git-fsck.
But it seems that it does not check worktree HEADs. :(

  $ git fsck
  Checking object directories: 100% (256/256), done.

So that seems like another bug.

The best PSA for this particular bug may be "try pruning the worktrees":

  $ git worktree prune -v
  Removing worktrees/foo: gitdir file points to non-existent location

  $ git prune; echo $?
  0

-Peff