Web lists-archives.com

Removal of linux-base from jessie-backports broke Xen upstream CI




Introduction
------------

I would like to recount a situation.  I'm not sure where, if anywhere,
the root bug(s) lie, but I am inclined to say that a big part of the
problem was a change to the contents of jessie-backports.  I would be
interested to hear what the backports team and ftpmaster have to say;
in particular, if anyone knows the answers to my questions below.

My tentative conclusions are that:

1. Packages should not be removed from foo-backports just because a
similar package is in foo-security, because there are situations where
a host may have been relying on the package being in foo-backports and
a similar (even, newer) package being in foo-security is not
sufficient.

2. Cruft removal in stable releases, including in -backports, should
perhaps be done with care/caution/announcement or something.


Background
----------

The upstream Xen project CI system does baremetal testing of Xen
hypervisors etc. and therefore needs to reinstall hosts quite often.
This is done by running a debian-installer netinst image with a
preseed file.  For Reasons we are still mostly on jessie.

We have some arm64 boxes.  They don't work with the kernel from
jessie.  So we arrange to use the kernel from jessie-backports.

Using the jessie-backports kernel with the jessie installer involves
using the preseed hook mechanism to add jessie-backports to the
target's apt sources, and an in-target apt-get install rune to install
the kernel package.

(Using the jessie-backports kernel also involves editing the installer
image to have the jessie-backports kernel and modules, but that is not
relevant to this tale.)

The arm64 kernel in jessie-backports is this package
  linux-image-4.9.0-0.bpo.2-arm64 (4.9.18-1~bpo8+1)
It Depends on `linux-base (>= 4.3~)'.

So it is necessary to have a newer linux-base.  According to
my git commit logs, in January 2017 I added the equivalent of
  apt-get install -t jessie-backports linux-base
to the commands run via the preseed mechanism: at that time a newer
linux-base was available in backports.


Breakage
--------

According to snapshot.d.o, until the 6th of February, linux-base
4.3~bpo8+1 was available in jessie-backports.  So things worked fine.

Around 16:00 UTC on the 7th of February, linux-base was removed from
jessie-backports, presumably because it was considered cruft.  After
all, linux-base 4.5~deb8u1 is now in jessie-security.

However, after that change to the archive, the dependency resolver
from jessie's apt, in our CI, is no longer willing to update to
linux-base from jessie-security.  (I have not yet investigated in
detail but I suspect that the apt-get -t jessie-backports rune above
is part of that causal chain.)

The result is that linux-image-4.9.*'s version dependency on
linux-base could not be satisfied.  In our CI this resulted in a
mysterious failure where despite us not having changed anything, the
host would fail to boot when it wanted to reboot into the installed
system, because it would try to use the original jessie 3.16 kernel
(which does not run on our hardware).


Logs
----

For the very curious, and for my reference, complete logs of an
example failure are preserved here:

 http://logs.test-lab.xenproject.org/~iwj/132973.test-arm64-arm64-xl/info.html

Mostly you want to look at the `Logfiles etc.'.  You can also click on
the entries in the `status' column to see the output from the CI
system perl scripts.  The installer syslog is here:

 http://logs.test-lab.xenproject.org/~iwj/132973.test-arm64-arm64-xl/3.ts-syslog-server.log

When looking at the serial log:

 http://logs.test-lab.xenproject.org/~iwj/132973.test-arm64-arm64-xl/serial-laxton0.log

it is important to realise that that logfile contains a fair amount of
previous output.  Look at the timestamps: you want the part of the log
starting at 2019-02-07 15:13:32 Z.


Analysis and questions
----------------------

I'm almost certain that the proximate cause of the breakage was the
removal of linux-base from jessie-security.

I think, but I am not sure, that that apt-get rune to request
linux-base from backports was was previously necessary.

The reason I say that I am not sure is that the CI commit which added
that rune had, according to its commit message, an additional effect
of putting backports in the apt sources; perhaps that latter would
have been sufficient.  (After I have sent this mail I am going to mess
about with the system to find a way to get it working properly again.)

Q: Was `apt-get install -t backports linux-base'
   unnecessary (and wrong) ?

It is unfortunate that something which worked for a period of over 2
years was broken by an archive change.

I don't know for sure that the removal this was cruft removal but it
seems like the most plausible explanation.  I haven't so far found any
explanation somewhere but perhaps I looked in the wrong places.

Q. Why was linux-base removed from jessie-backports ?


Opinions and suggestions welcome.

Thanks,
Ian.



-- 
Ian Jackson <ijackson@xxxxxxxxxxxxxxxxxxxxxx>   These opinions are my own.

If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.