Re: Proposal: A new approach to differential debs
- Date: Sun, 13 Aug 2017 13:11:15 -0400
- From: Peter Silva <peter@xxxxxxxxxxxxxxx>
- Subject: Re: Proposal: A new approach to differential debs
> apt by default automatically deletes packages files after a successful install,
I don't think it does that. It seems keep them around after install,
and even multiple
versions. I don't know the algorithm for the cache, but it doesn't do
what you say.
On my debian box, I have nothing but a straight install, changed nothing.
blacklab% cd /var/cache/apt/archives
blacklab% ls -al | wc -l
blacklab% ls -al xsensors*
-rw-r--r-- 1 root root 17688 Mar 6 21:11 xsensors_0.70-3+b1_amd64.deb
blacklab% dpkg -l xsensors
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
ii xsensors 0.70-3+b1 amd64
hardware health information viewer
blacklab% ls -al xserver-xorg-video-nvidia_375*
-rw-r--r-- 1 root root 3129858 Feb 23 16:13
-rw-r--r-- 1 root root 3138320 May 28 11:28
-rw-r--r-- 1 root root 3101944 Jul 16 17:14
On Sun, Aug 13, 2017 at 12:43 PM, Julian Andres Klode <jak@xxxxxxxxxx> wrote:
> On Sun, Aug 13, 2017 at 10:53:16AM -0400, Peter Silva wrote:
>> You are assuming the savings are substantial. That's not clear. When
>> files are compressed, if you then start doing binary diffs, well it
>> isn't clear that they will consistently be much smaller than plain new
>> files. it also isn't clear what the impact on repo disk usage would
> The suggestion is for it to be opt-in, so packages can optimize their file
> layout when opting in - for example, the gzipped changelogs and stuff can be
> the rsyncable ones (this might need some changes in debhelper to generate
> diffable files if the opt-in option is set).
>> The most straigforward option:
>> The least intrusive way to do this is to add differential files in
>> addition to the existing binaries, and any time the differential file,
>> compared to a new version exceeds some threhold size (for example:
>> 50%) of the original file, then you end up adding the sum total of the
>> diff files in addition to the regular files to the repos. I haven't
>> done the math, but it's clear to me that it ends up being about double
>> the disk space with this approach. it's also costly in that all those
>> files have to be built and managed, which is likely a substantial
>> ongoing load (cpu/io/people) I think this is what people are
>> objecting to.
> I would throw away patches that are too large, obviously. I think
> that twice the amount of space seems about reasonable, but then we'll
> likely restrict that to
> (a) packages where it's actually useful
> (b) to updates and their base distributions
> So we don't keep oldstable->stable diffs, or unstable->unstable
> diffs around. It's questionable if we want them for unstable at
> all, it might just be too much there, and we should just use them
> for stable-(proposed-)updates (and point releases) and stable/updates;
> and experimental so we don't accidentally break it during the
> dev cycle.
>> A more intrusive and less obvious way to do this is to use zsync (
>> http://zsync.moria.org.uk/. ) With zsync, you build tables of content
>> for each file, using the same 4k blocking that rsync does. To handle
>> compression efficiently, there needs to be an understanding of
>> blocking, so a customized gzip needs to be used. With such a format,
>> you produce the same .deb's as today, with the .zsyncs (already in
>> use?) and the author already provides some debian Packages files as
>> examples. The space penalty here is probably only a few percent.
>> the resource penalty is one read through each file to build the
>> indices, and you can save that by combining the index building with
>> the compression phase. To get differential patches, one just fetches
>> byte-ranges in the existing main files, so no separate diff files
>> needed. And since the same mechanisms can (should?) be used for repo
>> replication, the added cost is likely a net savings in bandwidth
>> usage, and relatively little complexity.
>> The steps would be:
>> -- add gzip-ware flavour to package creation logic
>> -- add zsync index creation on repos. (potentially combined with first step.)
>> -- add zsync client to apt-* friends.
>> To me, this makes a lot of sense to do just for repo replication,
>> completely ignoring the benefits for people on low bandwidth lines,
>> but it does work for both.
> It does not work for client use. apt by default automatically deletes
> packages files after a successful install, and upgrades are fairly infrequent
> (so an expiration policy would not help much either); so we can't rely on old
> packages to be available for upgradexs.
> Debian Developer - deb.li/jak | jak-linux.org - free software dev
> | Ubuntu Core Developer |
> When replying, only quote what is necessary, and write each reply
> directly below the part(s) it pertains to ('inline'). Thank you.