Web lists-archives.com

Re: Failure to boot - LVM problems?

On 4/01/19 2:48 PM, Richard Hector wrote:
> Hi all,
> This is one of those annoying cases where I claim "It was working, and I
> didn't do anything, and now it doesn't" - suspicious, I know ...
> In this case, I can see from my emails that this machine booted (via
> wake-on-lan from a cronjob) this morning, and then shut itself down (via
> a local cronjob), having done its job. Then later, I booted it manually,
> and it didn't - when I plugged in a screen and keyboard, I found it at
> the 'root password for maintenance or Ctrl-D to continue' prompt.
> On logging in, I found it had had problems mounting filesystems.
> All further attempts to boot it have gone straight from grub to a
> blinking underline cursor in the top left.
> If I boot with one of the 'recovery mode' options, I can get back to the
> maintenance option. Having dug around a bit, I find that 'vgchange -ay'
> followed by 'systemctl default' brings it up, in an apparently normal state.
> It reports warnings about being unable to connect to lvmetad, but I
> gather that's not normally something to worry about. On the other hand,
> the searches I've done have only found suggestions for disabling it, not
> making it work.
> General info about the system:
> It's a generic tower system, with (currently) 6 disks.
> I use RAID 1 for everything - but in an effort to keep the arrays small
> (an attempt to reduce the risk of failing to rebuild before another
> error happens), there are many partitions, grouped into vgs with lvm.
> However, the filesystems I'm having problems with are on a vg that has
> only one md in it, because those disks are relatively small anyway.
> It runs a single kvm guest, which does my backups using dirvish (pulling
> from various machines both locally and over the net) - hence the
> automated boot in the middle of the night.
> The weather is kind of hot (New Zealand summer), which is one of the
> reasons I went to the cronjob solution to not (normally) run it during
> the day - I started that last summer, but have had it running 24/7 for a
> while, and then shut it down to rely on this system a couple of days
> ago, so it hasn't had many boot cycles recently.
> Any tips/questions very welcome.

It turns out the later failures to boot probably weren't; it's just that
I had 'quiet' enabled in the kernel commandline. Disabling that enabled
me to see where it was hanging - which is now queried in my 'Slow boot?'

So the initial failure must have been a one-off thing, which if anything
is more worrying - sounding more like hardware.



Attachment: signature.asc
Description: OpenPGP digital signature