The most common nocturnal activity of an engineering student, particularly when exams are approaching, is to fire up a first-person shooter game. Let’s suppose the geek in you, fed up with all the trivialities a textbook has to offer, decides to challenge your friend to a deadly dual of OpenArena. According to Murphy’s Law, he won’t have it installed, and the Internet connection will go down at that very moment.
Now being a geek, though you are sure that the world is conspiring against you, you won’t give up so easily, would you? You will decide to see this through to the end by creating a local mirror of the Fedora Linux distribution, so that every package is ready to serve when you want it. Of course, you will be able to do it yourself, but I consider it my duty to make it easier so as to allow you to take up more important duties like running an OpenArena server.
Now, having read so far, if you are not entirely sure what this is all about, let me tell you: it’s about mirroring Fedora repositories within your organisation or institute. The benefits: faster downloads for you and your friends, effective use of bandwidth and lowered cost.
According to Wikipedia, “In computing, a mirror is an exact copy of a data set. On the Internet, a mirror site is an exact copy of another Internet site.” When you try to install a new package into your Fedora installation, either using PackageKit or Yum, it tries to fetch the packages from an Internet site along with the libraries and other software required for it, and install it on your computer. Now software like OpenOffice.org or OpenArena are very big and along with all their dependencies, the download size may be in the order of hundreds of megabytes.
Let’s consider a simple calculation; if your organisation has 100 users and each downloads OpenOffice.org separately, it will take around 100 x 150 MB (which is equal to 15000 MB) of aggregated downloads. If you consider even a normal usage scenario, where users occasionally install new software and update their system, such downloads can easily reach the terabyte levels per month.
In countries like India, where bandwidth is a costly commodity, it is hardly possible for an organisation to invest in an astronomical amount for bandwidth, and this can easily play spoilsport to the advent of FOSS.
The easy solution to this problem is to put up a server inside the institute or organisation, where all the contents are downloaded and updated periodically, and users get the software from this local server instead of the Internet. Considering that the cost of bandwidth inside a LAN is trivial and usually offers much better throughput, mirroring can be an ideal solution to reduce expenditure and can considerably speed up installations of new software or updates. It can even reduce the need for physical media like a CD or DVD, as you can use the server for disk-less network installations.
In the subsequent sections, I will take you through the steps of setting up a Fedora mirror.
Mirroring does not cost much as far as hardware is concerned. If you are going to mirror the whole Fedora content, at least 1TB of disk space is needed. But if you are not an ISP or a big educational institute, you probably won’t need all the content available in the Fedora repositories. It should be fine for most organisations to keep 32-bit and 64-bit repositories of the last two releases, along with their updates. For example, if you are mirroring right now, it would be good to keep 32-bit (generally called x86) and 64-bit (x86_64) repositories of Fedora 10 and 11, along with their updates.
A server with approximately 250 GB of hard disk space (though the actual need will depend upon the content you want to keep), and 2 – 4 GB of RAM should do perfectly.
The software requirements for mirroring are also minimal. All you need is an Apache Web server or an FTP server. However, please check your version of Apache. If the version is 1.x or 2.0, you will need both the Apache and the FTP server, because earlier Apache servers cannot handle files over 2 GB in size; so you have to redirect the ISO download requests to the FTP server. However, if you are using Apache 2.1 or 2.2, you need not worry about this as large file handling support has been added in these versions. Here, we will explore mirroring only using Apache. Mirroring with FTP is similar and needs no remarkably different configuration.
The most essential requirement for mirroring is bandwidth. How long your download will take depends on the available bandwidth. Mirroring over a 5 MBps leased line may take several days for each release being mirrored, but most of these contents need to be downloaded only once. The subsequent downloads will need much less bandwidth, often as minimal as a couple of hundred megabytes per day.
If you are trying to be listed as a public mirror of Fedora, by which you want to offer downloads to people outside your organisation, the official bandwidth requirement is 100 MBps. However, in countries like India, where very few public mirrors are available, this requirement is often relaxed. The first public Fedora mirror in India started with a 5MBps leased line, until other institutes like NIT-H, IIT-M and IIT-K stepped in.
What to mirror?
Though I have suggested that you choose to mirror the last two releases along with their updates, this would obviously depend on you. The complete list of directories, along with their sizes, is given here. You can choose what to mirror and what not to, depending on your organisational needs.
Public or private
It’s also very important to decide if you want to make your mirror a public mirror, which serves content to people outside your organisation, or a private mirror that serves people only inside your organisation. If you don’t have large bandwidth, at least ~100Mbps, it is better to go for a private mirror. However, for countries like India, where the number of mirrors are far less than required, you can go public with 15-20 MBps bandwidth.
Just commenting to get a wave invite. ;)
Thanks for this very useful article.
Re the mount ISO “use cp -p” or DIE … errr, not quite. ;-)
It’s a good idea to get used to doing the right thing, but if you don’t, then rsync -a (which implies -t) will save you in this case.
Missing files get copied completely. Identical date/time/size/name files get skipped. But filenames with different date stamps get checksummed on both sides — and only the differences are sent. In this case there *are* none, so you spend a bit of time (but not much bandwidth) figuring that out — but the entire file doesn’t come down.
(If a file _really has_ changed, then just the checksum and changes are transmitted, not the entire file.)
VERY good article besides that single nit.
@DamnitDog, you are right.
The rsync will only change the timestamp if it is already present, it won’t pull the entire content.
Sorry for the mistake.
[…] all. but i am talking with reference to this LINK my purpose is to just copy a 690 MiB of file … can any one help me […]
Thanks for the useful post.
I try the same, but i am getting the error, on starting the apache service, after adding the following lines in httpd.conf:
Header set Cache-Control “must-revalidate”
Invalid command ‘Header’, perhaps misspelled or defined by a module not included in the server configuration
What can i do to resolve this? I think this is related to some module that is not included in my conf file. Can you please tell me the name of that.
Thanks & Regards,
Your Fan :-)
A proxy mirror is a local mirror that does not sync the entire Fedora install tree. Instead, it serves files through a reverse caching proxy that connects to a public Fedora mirror and downloads files as needed.