Web lists-archives.com

Failure to boot - LVM problems?

Hi all,

This is one of those annoying cases where I claim "It was working, and I
didn't do anything, and now it doesn't" - suspicious, I know ...

In this case, I can see from my emails that this machine booted (via
wake-on-lan from a cronjob) this morning, and then shut itself down (via
a local cronjob), having done its job. Then later, I booted it manually,
and it didn't - when I plugged in a screen and keyboard, I found it at
the 'root password for maintenance or Ctrl-D to continue' prompt.

On logging in, I found it had had problems mounting filesystems.

All further attempts to boot it have gone straight from grub to a
blinking underline cursor in the top left.

If I boot with one of the 'recovery mode' options, I can get back to the
maintenance option. Having dug around a bit, I find that 'vgchange -ay'
followed by 'systemctl default' brings it up, in an apparently normal state.

It reports warnings about being unable to connect to lvmetad, but I
gather that's not normally something to worry about. On the other hand,
the searches I've done have only found suggestions for disabling it, not
making it work.

General info about the system:

It's a generic tower system, with (currently) 6 disks.

I use RAID 1 for everything - but in an effort to keep the arrays small
(an attempt to reduce the risk of failing to rebuild before another
error happens), there are many partitions, grouped into vgs with lvm.
However, the filesystems I'm having problems with are on a vg that has
only one md in it, because those disks are relatively small anyway.

It runs a single kvm guest, which does my backups using dirvish (pulling
from various machines both locally and over the net) - hence the
automated boot in the middle of the night.

The weather is kind of hot (New Zealand summer), which is one of the
reasons I went to the cronjob solution to not (normally) run it during
the day - I started that last summer, but have had it running 24/7 for a
while, and then shut it down to rely on this system a couple of days
ago, so it hasn't had many boot cycles recently.

Any tips/questions very welcome.


Attachment: signature.asc
Description: OpenPGP digital signature