Vital data can be lost because of some physical damage to the storage hardware or due to a virus. That’s why it should be stored using a suitable backup method and on a reliable storage medium. There are a number of open source software tools to help businesses store data in an automated way. A few of these are discussed in this article.
Data can be backed up in various ways, but the most commonly used types of backup are full backup, differential backup, incremental backup and reverse incremental backup. Most of these backups can be done manually or be automated using backup software, and kept in the repository. A repository is a place where data is kept safely on storage media in an organised manner.
Here, every time a backup is taken, all the stored data is copied from the source storage (usually the hard drive) to the storage media, and kept in the repository. For example, if on Day 1, a full backup is done for all the stored data, then on Day 2 also, all the stored plus changed data will be copied to the backup storage media and kept in the repository. The same will be done on Day 3, Day 4, and so on. This type of backup is reliable but needs a lot of storage space on a regular basis.
A differential backup is done by comparing the differences in data between the first full backup and the changed source data. At first, a full backup is created, and then differential backups are created. Only the data that changed since the last full backup is copied to the storage media. So every time the data gets changed on the source storage, a differential backup of only the changed data is created and stored in the repository.
For example, on Day 1, a full backup is created for the source data. If there is any change in this data (compared to the last full backup) then another backup for only the changed data is created on Day 2. On Day 3, a backup of all the data that changed since Day 1 is taken, and this is repeated on Day 4, and so on. This type of backup is faster than a full backup and initially takes less storage space, but the backup file may grow bigger in size after a certain number of backups. That’s why this type of backup is usually done on a HDD or a tape.
This is done by creating a full backup first, and then taking backups only for the data that changed since the last backup. This is different from a differential backup, where data that changed since the last full backup is copied.
For example, on Day 1 a full backup is created. On Day 2, only the data that changed since Day 1 is copied to the backup storage medium. On Day 3, only the changes in the source data since Day 2 are copied as backup data, and the same is done on Day 4. Incremental backup is better than differential backup because it takes less storage space and time to backup data. That’s why this type of backup can also be done on optical discs like DVDs and Blu-ray.
In a reverse-incremental backup, the incremental backups are injected into the full backup to create a synthetic full backup.
For example, on Day 1, a full backup is created. On Day 2, an incremental backup is taken for only the changed data, and is also injected into the full backup that was created on Day 1. The process is repeated on Day 3, Day 4, and so on, creating incremental and synthetic backups for each of these days. Though this backup process takes a little longer than a normal incremental backup, it takes very less time to restore the backup, as the synthetic full backup can be used for restoration easily. This type of backup can also be done on optical discs, but tapes are more suitable for creating the synthetic full backup.
Among all these types of backups, the incremental and reverse-incremental backups are preferred. Such kinds of backups can also be done for home and personal use, but one needs backup software to do that. Many backup software use a method called Continuous Data Protection (CDP), with the help of which the source data is automatically copied to the backup storage media continuously in real-time, whenever there is any change in it.
Backup can be done on various storage media like hard drives, optical discs (CDs/DVDs) and tapes (digital audio tapes). A hard drive is the most common storage medium used for backups, but choosing the right hard drive and capacity is the biggest challenge. There are two types of hard drives — the Hard Disk Drive (HDD) and the Solid State Drive (SSD). HDDs are traditional hard drives, whereas SSDs are the latest ones and are much faster than the former. HDDs use a magnetic disk to store data and have mechanical parts, whereas SSDs use only microchips called NAND flash to store data and have no mechanical parts. That’s why SSDs are more durable when compared to HDDs and are less prone to data loss from a physical impact/shock. However, SSDs are way too expensive when compared to HDDs and are not usually preferred for storing huge amounts of data.
While selecting a HDD one should look for the RPM of the drive. Drives with 7200 RPM are deemed better. The transfer rate of HDDs depends on their RPMs, and a SATA-III, 7200 RPM HDD can achieve a transfer rate of up to 300MB/s, whereas an SSD can reach up to a rate of 600MB/s. SSDs are faster than HDDs but can’t hold data for a long period. A typical SSD can hold data for up to 10 years without power, but an HDD can hold data up to 30 years or more. SSDs use electrical charge to store data and because of internal electrical leakage, sometimes the data on these drives gets corrupted sooner than HDDs, whereas the latter use a magnetic disk to store data electro-magnetically. Choosing the capacity (GB) of the hard drive is also important to ensure enough storage is possible without wasting money on a drive with a high capacity.
Apart from the hard drive, backup can also be done using an optical disc. An optical disc is a kind of storage media on which data can be written or read using a laser beam. There are various types of optical discs like CDs, DVDs, and Blu-ray DVDs. A CD can store data up to 700MB, a DVD can store up to 4.7GB (single-sided, single-layer)/8.5GB (single-sided, double-layer) 9.4GB (double-sided, single-layer)/17.08 GB (double-sided, double-layered), and a Blu-ray has the capacity of 25GB (single-layer)/50GB (double-layer).
Creating backups on optical discs is also preferred because it’s a cheaper option. There are two formats of optical discs – recordable (R) and rewritable (RW)/recordable erasable (RE). A recordable disc can be used only once for backup, whereas a rewritable disc can be used many times to backup data. Therefore, DVD rewritable (DVD-RW/+RW) and Blu-ray recordable erasable (BD-RE) are preferred for regular backups. Moreover, the life span of an optical disc is more than 100 years!
Optical discs may be a better option for creating backups, but can suffer from physical damage if not handled carefully. Any scratch or cosmetic damage on the readable surface of the optical disc can corrupt its stored data. That’s why another better option is to backup data on tapes. Tapes used to backup digital data are called Linear Tape-Open (LTO) Ultrium, and an LTO-8 tape cartridge can store up to 12TB of uncompressed data and 30TB of compressed data.
Tapes are good for making multiple copies of a backup because they can be used for duplication and replication. It also takes very little time to restore a backup using tapes and that’s why many data centres still use tape drives in their backup repository. However, using tapes for back up at home is not recommended because the tape cartridges cost more than any optical disc. LTO Ultrium tapes have a better shelf life when compared to SDDs and can retain data up to 30 years.
Besides these storage media, backups can also be done on an online cloud storage platform like Google Drive. But for doing backups of large quantities of data on the cloud, high speed Internet with a good and stable connection is needed. It’s also not a good idea to store large files on cloud storage drives, because during any network issues it will be difficult to restore the backup data. Also, cloud storage providers are not always reliable.
Popular FOSS tools for backup
Backup can be done on a reliable storage medium using good backup software. There are various FOSS tools available for the job, and choosing the right tool can help you carry out the backup process conveniently. A few of these FOSS tools are outlined below.
Bacula: This is a popular FOSS enterprise-level computer backup tool for heterogeneous networks, and can automate any backup that needs intervention from a systems administrator or computer operator. It supports various operating systems including Linux, UNIX, Windows and macOS, and a range of professional backup devices including tape libraries. Bacula can be configured and used with a command line console, GUI or Web interface. It was developed in January 2000 by Kern Sibbald and written in C and C++, which are very powerful computer languages. Bacula is open source and released under the AGPL version 3 license, but there is an exception to permit the linking with OpenSSL and distributing the Windows binaries. The firewall administration and network security of Bacula is easy to use because the TCP/IP client-server communication uses standard ports and services instead of RPC for NFS, SMB, etc. It uses the CRAM-MD5 configurable client-server authentication and the GZIP/LZO client-side compression, which reduces the network bandwidth consumption. Bacula also uses transport layer security (TLS) network communication encryption, MD5/SHA file integrity verification, cyclic redundancy check (CRC) data block integrity verification, public key infrastructure (PKI) backup data encryption, network data management protocol (NDMP) enterprise version plugin, and supports cloud backup with some Amazon Simple Storage Service file storage services.
Amanda: This is another FOSS tool that can backup data from multiple computers in a network. It uses client-server architecture, and the server connects with the client to back up data at a scheduled time. It was developed at the University of Maryland in USA, and is written in C and Perl language. The commercial version, called the Amanda Enterprise Edition, was developed by Zmanda Inc. It can run on almost any UNIX operating system and also on Linux, Windows, macOS X and Solaris. Amanda can back up data on both tapes and disk drives. A special feature of Amanda is that it supports tape-spanning; so if the backup doesn’t fit in one tape, it can be saved on multiple tapes by splitting the data. Another special feature is the intelligent scheduler, which can optimise the use of computing resources across many backup runs.
Back in Time: This FOSS tool is exclusively written for the Linux operating system. It was developed by a group of people, namely, Oprea Dan, Bart de Koning, Richard Bailey, Germar Reitze and Taylor Raack in 2008, and is written in Python. This backup tool is distributed along with many Linux distributions. Similar to Apple’s Time Machine, this reliable backup tool uses rsync (a utility for synchronising files between various storage media) as a backend and hard links for file storage. This eliminates unnecessary usage of disk space, when there are identical snapshots of a file at different times. This hard link method also makes it easy to look at the snapshots of the system at different times and can help in the removal of any unnecessary snapshot. Back in Time also supports encryption of backups, and backups over SSH.
A disaster can happen at any time, and keeping data safe is the biggest challenge for all businesses. Backups can help recover the data after any disaster, but it is important to choose the right backup method and storage medium. These backups can be automated using a reliable tool, and choosing the right one can make the process much easier. The FOSS backup tools discussed here can be used to keep data safe.