Re: reg. fatal: The remote end hung up unexpectedly on NFS
- Date: Fri, 6 Apr 2018 15:48:45 -0400
- From: Jeff King <peff@xxxxxxxx>
- Subject: Re: reg. fatal: The remote end hung up unexpectedly on NFS
On Fri, Apr 06, 2018 at 11:55:51PM +0530, Satya Prakash GS wrote:
> We have a distributed filesystem with NFS access. On the NFS mount, I
> was doing a git-clone and if NFS server crashed and came back up while
> the clone is going on, clone fails with the below message:
> git clone https://satgs@xxxxxxxxxx/fs/private-qa.git
> remote: Counting objects: 139419, done.
> remote: Compressing objects: 100% (504/504), done.
> Receiving objects: 7% (9760/139419), 5.32 MiB | 5.27 MiB/s
> error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
> fatal: The remote end hung up unexpectedly
> fatal: early EOF
> fatal: index-pack failed
Curl's result=18 is CURLE_PARTIAL_FILE. Usually that means the other
side hung up partway through. But given the NFS symptoms you describe, I
wonder if fwrite() to the file simply returned an error, and curl gave
> On NFS server crash, it usually takes a minute or two for our
> filesystem to failover to new NFS server. Initially I suspected it had
> something to do with the filesystem, like attributes of the file
> written by git weren't matching what it was expecting but the same
> test fails on open source NFS server. While clone is going on, if I
> stopped the open source NFS server for 2 minutes and restarted it, git
> clone fails.
> Another interesting thing is, if the restart happens within a few
> seconds, git clone succeeds.
> Sideband_demux fails while trying to read from the pipe. Read size
> doesn't match what is expected. If there are 2 parts to git clone
> which is fetching data and writing to local filesystem, is this error
> happening while trying to fetch ? Since it succeeds if the restart is
> done immediately, has this got something to do with the protocol
> Please advise on how to debug this further.
If you're on Linux, strace could show you the write error. Unfortunately
it's a little tricky because the http bits happen in a sub-process. But
cat >/in/your/$PATH/git-remote-strace <<\EOF
protocol=$(echo "$2" | cut -d: -f1)
exec strace -f -o /tmp/foo.out git-remote-$protocol "$@"
chmod +x /in/your/$PATH/git-remote-strace
git clone strace::https://github.com/whatever
My guess is you may find a failed `write()` in there.