Save Bandwidth by Setting Up a Fedora Mirror

The mirroring procedure

Having finished with the requirements, we now move on to actually setting up a Fedora mirror. Before you get your hands dirty, it would be better if you could study the directory structure of the Fedora repository for a while. You can find it here.

Synchronising content

Synchronising content is, to put it simply, copying the content of a Fedora mirror into your server in such a way that all the properties of the files and directories being transferred remain unchanged. As this is the most time-consuming process involving a large number of file downloads, it is suggested that you first get this started and while it pulls content from the server, you do other necessary configurations. The only reliable way to do mirroring is to use rsync, which is a utility for incremental file transfer. Like FTP, rsync also transfers files between a server and a client, but if the file transfer breaks down midway as a result of a network or power outage, it will resume transferring files from the point where it left off. From now on, we shall use the terms synchronise or pull instead of ‘file transfer’.

It is best to set up a new user account on your system, which will perform the synchronisation.

# useradd -r -m mirror

The directory structure you are mirroring should match that of Fedora’s master mirrors. To do so, simply create them and give your mirror user write permissions:

# mkdir -p /var/www/html/pub/fedora/linux/releases
# chown -R mirror:mirror /var/www/html/pub
# find /var/www/html/pub -type d -exec chmod 0755 {} ;

If you wish to exclude some content from synchronising, you will create an exclude.txt file. You may put any expression into that file and when rsync is told about it, it won’t pull that content. You can do this as your new mirror user:

# su - mirror
$ touch exclude.txt

An exclude.txt file typically looks like what follows:

#dont sync any ppc content
ppc*

#don’t sync debug directories
debug*

#don’t sync source directories
source*

As you can see, you can put regular expressions in the exclude file. It means that you need not put in all the names of the directories that you want to exclude. When you put ppc* in the exclude.txt file, all directories starting with ppc will not be pulled.

Now that we are finished with the exclude part, we are ready to pull in the actual content. The rsync command may look like what’s given below:

$ rsync -vaH --exclude-from=/home/mirror/exclude.txt --numeric-ids --delete --delete-after --delay-updates rsync://mirror.anl.gov/fedora/linux/releases/11/var/www/html/pub/fedora/linux/releases/

This command will start pulling the Fedora 11 repository and put them into /var/www/html/pub/fedora/linux/releases/11.

Now, let’s see what this means. rsync, as stated earlier, is an incremental file transfer protocol. -v stands for verbose mode, -a means the achieve option, and -H means that the rsync run will preserve hard links between the files (which saves considerable amounts of disk space and reduces file transfers).

We now define which directories not to synchronise using --exclude-from. The --delete, --delete-after and --delay-updates tells rsync not to delete old content while synchronising new data. Instead, it tells rsync to keep the old file and directories until the synchronisation is complete. Then, finally, we define the remote rsync server and the destination directory.

If you are worried from which server you want to pull the repositories from, you can get a list of servers, which provide the rsync service, from the Fedora mirrorlist. It would be nice to choose a reliable server near you. Also, don’t forget to drop a mail to the admin of the server, as a matter of courtesy and also to ensure there is no planned outage in the next couple of days, at their end.

Saving some bandwidth

A little trick can save you a few gigabytes of download. If you are not sure about the directory structure Fedora repositories have, be a bit careful about this.

The ISO of the Fedora DVD resides at the Fedora/$architecture/iso/ directory. Also, the same contents of the DVD are at Fedora/$architecture/os/, but as extracted files and directories. For example, http://118.102.181.66/releases/11/Fedora/i386/os/ contains the files of http://118.102.181.66/releases/11/Fedora/i386/iso/Fedora-11-i386-DVD.iso. So if you download the ISO image first and then copy the content over to the os/ directory, you need not download the same content twice. Let’s see how we do it.

Once the download of the DVD ISO file is completed, mount it somewhere:

# mount -o loop /var/www/html/pub/fedora/linux/releases/11/Fedora/i386/iso/Fedora-11-i386-DVD.iso /mnt
# cp -prv /mnt/* /var/www/html/pub/fedora/linux/releases/11/Fedora/i386/os/
# umount /mnt

Similarly, you can repeat this for x86_64 DVD ISO, if you are mirroring that architecture too.

Note: Be sure you use the -p option with cp. If you don’t, the copy operation will change the timestamps of the files being copied and rsync will treat them as invalid. rsync will pull all the content again, overwriting the copied files, and in the process thwart all your efforts to save bandwidth.

If the download stops

In the course of synchronising, it is highly possible that you will receive a few messages like this: “Suddenly the Dungeon collapses!! – You die…” and the download will stop. Don’t panic. It’s only that rsync has stopped for some reason. Just press the up arrow key and press Enter to run the same command again. rsync will pick up from where it left off. Also, you won’t be able to see any file in the directories until all the content of a directory is pulled. You can be assured that the download is indeed happening by using this feature periodically:

# du -m /var/www/html/ | tail -n 1

Let rsync run its own course. You have nothing to do other than periodically check if it has stopped. In the meantime, let’s do the other necessary configurations.

  • ashishkumar2703

    Just commenting to get a wave invite. ;)

  • tinhed

    Thanks for this very useful article.

  • DamnitDog

    Re the mount ISO “use cp -p” or DIE … errr, not quite. ;-)

    It’s a good idea to get used to doing the right thing, but if you don’t, then rsync -a (which implies -t) will save you in this case.

    Missing files get copied completely. Identical date/time/size/name files get skipped. But filenames with different date stamps get checksummed on both sides — and only the differences are sent. In this case there *are* none, so you spend a bit of time (but not much bandwidth) figuring that out — but the entire file doesn’t come down.

    (If a file _really has_ changed, then just the checksum and changes are transmitted, not the entire file.)

    VERY good article besides that single nit.

  • susmit

    @DamnitDog, you are right.

    The rsync will only change the timestamp if it is already present, it won’t pull the entire content.

    Sorry for the mistake.

  • Pingback: Getting larger files over n/w

  • http://linuxexplore.wordpress.com Rahul Panwar

    Thanks for the useful post.

    I try the same, but i am getting the error, on starting the apache service, after adding the following lines in httpd.conf:

    Header set Cache-Control “must-revalidate”
    ExpiresActive On
    ExpiresDefault “now”

    ERROR:
    Invalid command ‘Header’, perhaps misspelled or defined by a module not included in the server configuration

    What can i do to resolve this? I think this is related to some module that is not included in my conf file. Can you please tell me the name of that.

    Thanks & Regards,
    Your Fan :-)
    Rahul Panwar

  • http://www.mspy.com/ Tomfille

    A proxy mirror is a local mirror that does not sync the entire Fedora install tree. Instead, it serves files through a reverse caching proxy that connects to a public Fedora mirror and downloads files as needed.

All published articles are released under Creative Commons Attribution-NonCommercial 3.0 Unported License, unless otherwise noted.
Open Source For You is powered by WordPress, which gladly sits on top of a CentOS-based LEMP stack.

Creative Commons License.