Re: Heck, why did the server freeze?

On Thu 25/Oct/2018 20:30:27 +0200 Brian wrote:
> On Thu 25 Oct 2018 at 19:53:26 +0200, Alessandro Vesely wrote:
>> Hi all,
>> early this morning a network card burned out.  A few hours later, the server
>> was not responding on any network address, nor on the system console.  I had to
>> power it down.
>> Upon rebooting, network errors were detected an I arranged the server to work
>> with the available hardware.  The last line logged was an incoming email from a
>> spammer in Brazil.  It shouldn't have triggered any severe damage.  I found no
>> breakdown hint in the logs.
>> My theory is that the system didn't realize that the card was broken, didn't
>> turn the interface down, and kept storing outgoing stuff until it blew off.  Is
>> that reasonable or should I be more paranoid?
> You have given an exact diagnosis of your problem - the network
> card failed. What's your problem? Replace it instead of agonising
> and theorising.

The problem is that the server froze.  I don't think that's what it is supposed
to do when a card fails.

During the reboot, I had a plethora of RTNETLINK answers: Network is down /
Network is unreachable / File exists.  It took me several RJ45 swaps to know
which network and which side of the link were the culprit.

Contrast that with log lines about anything else, from non-redundant power
supplies to failed GPG signatures.  In part, the missing precise diagnosis must
be a shortcoming on part of the card vendor.  However, how come the kernel
didn't realize that the link had to go down, log something, and just fail any
subsequent call on that interface, instead of freezing?  Or did it freeze for
an unrelated reason?


P.S. New card expected on Monday.