Re: What can Debian do to provide complex applications to its users?
- Date: Sat, 17 Feb 2018 23:14:51 +0000
- From: Colin Watson <cjwatson@xxxxxxxxxx>
- Subject: Re: What can Debian do to provide complex applications to its users?
On Sat, Feb 17, 2018 at 07:22:05PM +0100, Tollef Fog Heen wrote:
> I think there's at least two types of vendoring you're referring to
> here, and they're substantially different.
> One is how Go currently does (but my understanding is that this is
> changing in newer versions). Here, the source code of dependencies are
> shipped together with the source code for the application. This leads
> to trees like
> https://github.com/kubernetes/kubernetes/tree/master/vendor where any
> one of those dependencies might be a released version or tag, or it
> might just be a random git snapshot, and there's not really any way to
> The other (you refer to this as freezing dependencies) is how
> Node.js/npm/yarn, Ruby/gem, (to some extent) Python/pip, and Rust/cargo
> does it. In those cases, you have some file specifying the versions of
> libraries the application needs, usually as «this version of this
> gem/crate/package» and there is somewhere those packages live by
> default. Quite often, there's also a lock file of some sort which lists
> out the exact versions used, recursively, which ensures you can deploy
> the exact same code multiple times.
> The second method means you can reason about what versions of code are
> included where.
This is an excellent point. (In fact, I think you can do some of that
reasoning for the first method too; it's just going to take more effort.
For example, you could run "govendor sync" or whatever and check that
the output tree actually matches what's promised. In some ways this is
similar to the question of "is this package's .orig.tar actually the
version it claims to be?".)
It's worth remembering that this isn't just some lazy upstreams not
getting round to keeping their dependencies up to date:
* Often they need newer versions of dependencies than are in Debian
stable, because retaining compatibility back to whatever version we
shipped for a half-million-line web application and actually testing
the result is a combinatorially-hard problem.
* Often the existence of packaged web applications in Debian itself
makes it harder to upgrade the libraries they depend on. Take
Django, for instance: it has really remarkably good documentation of
what you need to do to upgrade an application to a newer version and
a very clear deprecation policy, but you still have to do it and
keeping your application compatible with more than two or three
Django versions (by which I mean 1.8, 1.9, 1.10, etc.) before the
newest one you support is likely to be pretty painful. Now package
half a dozen Django applications and suddenly the library package
maintainers have a heck of a time keeping everything working. 
* And yes, sometimes an upstream can end up with out-of-date
dependencies that are going to take significant effort to resolve,
but there's something to be said for making it possible to shift the
upgrade effort to a time when it's convenient to do so, even though
there's obviously a trade-off with the possibility that you might end
up putting it off indefinitely. Not to mention that if they did
upgrade their dependencies, they might end up being incompatible with
the versions in Debian stable, and as it is that doesn't help our
users of packaged applications very much either. The lockstep
problem is really nasty.
* Packaging individual libraries that aren't especially useful on the
client side can be pretty boring work and is often not very socially
Debian have taken a lot of flak, for instance).
And yet ... it can be really nice to have some of these things packaged
in a common format. I know roughly nothing about PHP, and I don't want
to have to learn yet another packaging stack or another upstream's
favourite container technology, so it's great that Icinga Web 2 is
packaged so that I can just install it without having to learn that. I
want it to be as easy as possible for the security team and the people
who care about it to keep it up to date. For the case where something
depends on OpenSSL, it's mostly pretty obvious where the trade-offs lie.
For the case where something depends on 80 npm libraries and a security
update to the application might require updating 10 of those and maybe
breaking some other packaged application in the process, it's not so
clear that it even serves our developers let alone our users.
 Disclaimer: I only do in-house development at work of Django
applications that aren't packaged in Debian, and I'm not involved in
Django's Debian maintenance, so this is an outside view and I may
have some details wrong.
> You might still have to patch five versions of libvulnerable, but at
> least it's possible to find which five versions are in use, and get
> those fixed in a central place and then rebuild everything that's
> vulnerable, recursively. Not the most fun job in the world, but it's
> at least possible to automate somewhat.
So maybe what we need is to run with that a bit further? Debian has
historically been really good at coming up with ways to represent what
disparate upstream developers are doing in formats that can be consumed
in single common ways, so e.g. you can upgrade your system with a single
command rather than having to go and work out what to do for each
package. We could probably come up with a common format representing a
package's frozen dependencies, in the spirit of Built-Using, and see if
it's possible to build reasonable security workflows around those. For
instance, as a strawman starting point, we could aim for rules like
* Constrained to the sort of server-side applications that might
reasonably be run in a container on their own, just to keep the
problem size down a bit.
* No freezing of stuff that's outside of the primary language ecosystem
involved (so your Python webapp doesn't get to come up with a fancy
way to vendor OpenSSL).
* Maybe truncate the frozen dependency tree at C extensions, in order
that we can make sure those are built for all architectures, so you'd
still have to care about compatibility with those. It'd be a much
smaller problem, though.
* Every frozen dependency must be installed in a particular defined
path based on (application name, dependency name, dependency
version), and must be installed using blessed tools that generate
appropriate packaging metadata that can be scanned centrally without
having to download lots of source packages.
* Updating a single frozen dependency (assuming no API changes to the
application's source code) must be no harder than updating a single
line in a single file and bumping the Debian package version.
* Where applicable, deprecation warnings must be configured as
warnings, not errors, to maximise compatibility.
* Dependencies with a non-trivial CVE history would have an acceptable
range of versions, so we could drop applications that persistently
cause trouble while still allowing a bit of slack. For some
dependencies it might not be a problem at all; e.g. we probably don't
care what version of python-testtools something uses.
* We put all this together in a separate archive that explicitly
doesn't have security support; the security team wouldn't be expected
to provide support until the tooling is in a state where it's easy
enough for them.
I dunno. Maybe this is all so much blowing smoke, since I'm not
actually volunteering ... but I develop webapps in the
frozen-dependencies style at work and still see value that a
distribution like Debian could provide by way of packaging, so wanted to
offer some suggestions that I think might be useful compromises.
Colin Watson [cjwatson@xxxxxxxxxx]