Web lists-archives.com

Re: Failing disk advice

On Sun, Mar 05, 2017 at 08:38:27PM -0800, David Christensen wrote:
> On 03/05/2017 01:02 PM, Gregory Seidman wrote:
> >I have a disk that is reporting SMART errors. It is an active disk in a
> >(kernel, not hardware) RAID1 configuration. I also have a hot spare in the
> >RAID1, and md hasn't decided it should fail the disk and switch to the hot
> >spare. Should I proactively tell md to fail the disk (and let the hot spare
> >take over), or should I just wait until md notices a problem?
> I'm confused by "I also have a hot spare in the RAID1".  Do you have a
> two-member RAID1 with a hot spare, or a three-member RAID1?  I would prefer
> the latter:
> https://manpages.debian.org/jessie/mdadm/md.4.en.html

Refining this advice a bit, I would convert the spare to a full RAID
member now, without explicitly failing the disk that reports SMART
errors first.
Assuming you have a two-member RAID1 with a hot spare, the command
should be similar to this (untested):
  mdadm -G /dev/mdX -n 3 
This ensures you keep redundancy during further maintenance actions.

Which SMART errors do you get, and who reports them?
What is the output of the following command for the failing drive?
  smartctl -A /dev/sdY