Web lists-archives.com

Re: Failing disk advice




On 03/05/2017 01:02 PM, Gregory Seidman wrote:
I have a disk that is reporting SMART errors. It is an active disk in a
(kernel, not hardware) RAID1 configuration. I also have a hot spare in the
RAID1, and md hasn't decided it should fail the disk and switch to the hot
spare. Should I proactively tell md to fail the disk (and let the hot spare
take over), or should I just wait until md notices a problem?

AFAIK desktop disks and "enterprise RAID" disks degrade differently. When a desktop disk is having trouble reading a sector, it will retry many times before giving up because it is likely the data does not exist anywhere else. But, an enterprise RAID disc will retry only a few times and then fail; because the data should exist elsewhere and hung reads are intolerable in enterprise environments. So, if you are using desktop disks in a RAID, you might need to manually intervene to compensate for the mismatch.


I'm confused by "I also have a hot spare in the RAID1". Do you have a two-member RAID1 with a hot spare, or a three-member RAID1? I would prefer the latter:

https://manpages.debian.org/jessie/mdadm/md.4.en.html


If you're planning on buying a fourth disk and adding it after fixing the RAID, can you add it now as a fourth RAID1 member, let it resilver, remove the failing disk from the RAID (e.g. reconfigure as three-member RAID1), and then pull the failing disk?


David