Web lists-archives.com

Re: Debian Buster release to partially drop non-systemd support

On Tue, Oct 16, 2018 at 08:38:06PM +0200, Bastian Blank wrote:
> On Tue, Oct 16, 2018 at 07:20:24PM +0200, Adam Borowski wrote:
> > On Tue, Oct 16, 2018 at 05:54:34PM +0200, Petter Reinholdtsen wrote:
> > > Absolutely.  And the sysvinit boot system have lots of unsolved problems
> > > we never got around to figuring out, related to disk and other device
> > > setup.  The main cause is the fact that the linux kernel is no longer
> > > predicatble and sequencial, but asynchonous.  No amount of wishful
> > > thinking is going to bring is back to a world where static sequencing of
> > > boot events is going to handle all the interdependencies.
> > Systemd fails to solve them as well -- while introducing a lot of unsolved
> > problems on its own, such as degraded RAID problems (no, it's not possible
> > do to that in an event-driven way, you need a static sequence in at least
> > some cases).
> But in the case of degraded RAID the system _is_ already broken.  How
> does a non-event-driven solution work for it?

1. MD timeouts then proceeds
2. btrfs stops unless -odegraded is given

Neither works if systemd is managing mounting, both work with mountall.

As I don't use systemd my data about MD+systemd might be stale, but judging
from a recent flamewar on linux-btrfs, this is still fundamentally broken
on systemd.

Systemd's algorithm for btrfs RAID is:
* take the device named in fstab (one of RAID parts)
* wait for it, query for fsid and number of devices it knows about
* endlessly wait until that number of devices with the given fsid pop up
* if the filesystem gets mounted by the user anyway, unmount it (?!?)

That's neither required nor sufficient to successfully mount.  In slightly
convoluted (but not entirely unrealistic) scenarios the mount might even
succeed non-degraded!  (The first device might have a stale idea about the
filesystem's layout, that's ok.)  In general, there's no way to know if a
mount will succeed other than to actually try it (and at most back off at
the last moment).  You need to read the chunk tree to know if you have
enough data to proceed: a non-degraded mount needs all replicas of every
block group, a degraded mount needs enough to recover the data -- which is
not the same as "X devices are present".

Thus, instead of trying to be smarter than the kernel, it's better to
request the mount and let the kernel do the job.

⣾⠁⢰⠒⠀⣿⡁ 10 people enter a bar: 1 who understands binary,
⢿⡄⠘⠷⠚⠋⠀ 1 who doesn't, D who prefer to write it as hex,
⠈⠳⣄⠀⠀⠀⠀ and 1 who narrowly avoided an off-by-one error.