Web lists-archives.com

reg. fatal: The remote end hung up unexpectedly on NFS




Hi,

We have a distributed filesystem with NFS access. On the NFS mount, I
was doing a git-clone and if NFS server crashed and came back up while
the clone is going on, clone fails with the below message:

git clone https://satgs@xxxxxxxxxx/fs/private-qa.git

remote: Counting objects: 139419, done.
remote: Compressing objects: 100% (504/504), done.
Receiving objects:   7% (9760/139419), 5.32 MiB | 5.27 MiB/s
error: RPC failed; result=18, HTTP code = 200 MiB | 96.00 KiB/s
fatal: The remote end hung up unexpectedly
fatal: early EOF
fatal: index-pack failed

On NFS server crash, it usually takes a minute or two for our
filesystem to failover to new NFS server. Initially I suspected it had
something to do with the filesystem, like attributes of the file
written by git weren't matching what it was expecting but the same
test fails on open source NFS server. While clone is going on, if I
stopped the open source NFS server for 2 minutes and restarted it, git
clone fails.

Another interesting thing is, if the restart happens within a few
seconds, git clone succeeds.

Sideband_demux fails while trying to read from the pipe. Read size
doesn't match what is expected. If there are 2 parts to git clone
which is fetching data and writing to local filesystem, is this error
happening while trying to fetch ? Since it succeeds if the restart is
done immediately, has this got something to do with the protocol
timeouts.

Please advise on how to debug this further.

Thanks,
Satya.