Exploring Software: Delta Packages — Speed Up Updates By Downloading Less

Delta PackagesAn RPM package is essentially an archive with packaging and control information — for example, you can extract the contents of a .rpm file using the rpm2cpio utility. A delta RPM is created by doing a binary diff of each file in the archive; the diff files are repackaged as a .drpm archive. Control information is also needed in the archive — for example, to ensure that unchanged or deleted files in the package are properly handled.

After downloading the .drpm file, a utility on your system can recreate the original (updated/new version) RPM file by using the data from the package that is already installed on the disk, and applying the diffs contained in the delta RPM.

I have been using delta RPMs with the yum-presto plugin for a couple of years, starting with the experimental repository for Fedora 8. The experience has been very satisfying; it has enabled me to keep my Fedora installation updated with reasonable effort, given the bandwidth constraints. Now, even though the bandwidth is better, the updates seem more frequent, so the delta RPM repository continues to be very valuable.

There were two ideas I wanted to explore: the first was whether delta RPMs could be used across Fedora versions (distribution upgrades), and the second was whether the same, or a similar solution, was possible with Ubuntu.

Fedora 12 to Fedora 13 delta RPMs

Let’s first consider an upgrade across Fedora distribution versions. There were over 2,000 RPMs installed on Fedora 13 on my system. Of these, almost 1,800 had a corresponding RPM in Fedora 12. As I had updated Fedora 12 just prior to upgrading to Fedora 13, I was curious to know what the impact would have been if delta packages had been available.

The deltarpm package contains the makedeltarpm program, which was just what I needed. It uses the bsdiff algorithm to create diffs of the files in a given RPM. I wrote a short Python program that tried to match each RPM installed on Fedora 13 with one from the package cache I had saved from Fedora 12. For each matching pair found, the program fired makedeltarpm to create a .drpm. (The applydeltarpm utility would be used to recreate the original RPM using the delta RPM, as explained above.)

I found that overall, the delta RPMs were about a third of the original RPM size, and about a quarter of the delta RPMs were less than 20 per cent of the original size. Using delta RPMs, the download requirement for these packages would have decreased from 1.3 GB to a little more than 400 MB, saving about 900 MB of data transfer.

Obviously, a delta repository would be of little use when installing (or updating) from the DVD. However, if one is upgrading using the pre-upgrade or an online upgrade option, the benefit would be considerable.

The same logic should be implementable for Debian packages as well.

Ubuntu and more

A little searching showed that there was a debdelta package, which has not been used much. There is a Debian delta repository available for Debian. The date-time stamps indicate that this repository is current; however, I have not tried it.

A posting by Onkar Shinde on the Ubuntu India mailing list indicated an effort to create one for Ubuntu. Sadly, the people who would find it most useful are also those who are most constrained by server resources — disk space and bandwidth, in particular.

Using debdelta is as simple as using makedeltarpm. I experimented with it using the packages on the Ubuntu 10.04 CD, about 240 of which were updated on my system. The size of the update reduced from around 200 MB to merely 12 MB — just 6 per cent of the original size. About 70 per cent of the delta packages were less than 10 per cent of the original; this may be unusually small. Still, it is obvious that debdelta‘s utility would be as great for Ubuntu as delta RPMs are for Fedora. It just needs someone who has the server resources to generously set up a suitable delta repository, with suitable scripts to ensure that it remains in sync with the official repository.

It was nice to come across interest in delta repositories in other distributions. Since I use Arch Linux as well, this post was a welcome one. As expected, one person with the username “sabooky” has taken the lead, and created a delta repository for i686 systems.

It uses xdelta3 instead of bsdiff, at present. It seems very promising, and is very easy to use, since the pacman (the package manager in Arch Linux) developers had already integrated the option of using delta repositories.

There is bound to be more activity in this area. All delta repositories should benefit from the work being done for Google Chrome updates. Remarkably, the statistics shown as an example (those of a Chromium update), were as follows:

  • Full update: 10,385,920 bytes
  • bsdiff update: 704,512 bytes
  • Courgette update: 78,848 bytes

I can’t wait to see which distribution implements a repository with Courgette first!

Feature image courtesy: Marcus Mo. Reused under the terms of CC-BY-NC-ND 2.0 License.

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.