To prevent a disastrous loss of data, regular backups are not only recommended but are de jure. From the many open source tools available in the market for this purpose, this article helps systems administrators decide which one is best for their systems.
Before discussing the need for backup software, some knowledge of the brief history of storage is recommended. In 1953, IBM recognised the importance and immediate application of what it called the ‘random access file’. The company then went on to describe this as having high capacity with rapid random access to files. This led to the invention of what subsequently became the hard disk drive. IBM’s San Jose, California laboratory invented the HDD. This disk drive created a new level in the computer data hierarchy, then termed random access storage but today known as secondary storage.
The commercial use of hard disk drives began in 1957, with the shipment of an IBM 305 RAMAC system including IBM Model 350 disk storage, for which a US Patent No. 3,503,060 was issued on March 24, 1970.
The year 2016 marks the 60th anniversary of the venerable hard disk drive (HDD). Nowadays, new computers are increasingly adopting SSDs (solid-state drives) for main storage, but HDDs still remain the champions of low cost and very high capacity data storage.
The cost per GB of data has come down significantly over the years because of a number of innovations and advanced techniques developed in manufacturing HDDs. The graph in Figure 1 gives a glimpse of this.
The general assumption is that this cost will be reduced further. Now, since storing data is not at all costly compared to what it was in the 1970s and ‘80s, why should one take backup of data when it so cheap to buy new storage. What are the advantages of having backup of data?
Today, we are generating a lot of data by using various gadgets like mobiles, tablets, laptops, handheld computers, servers, etc. When we exceed the allowed storage capacity in these devices, we tend to push this data to the cloud or take a backup to avoid any future disastrous events. Many corporates and enterprise level customers are generating huge volumes of data, and to have backups is critical for them.
Backing up data is very important. After taking a backup, we have to also make sure that this data is secure, is manageable and that the data’s integrity is not compromised. Keeping in mind these aspects, many open source backup software have been developed over a period of years.
Data backup comes in different flavours like individual files and folders, whole drives or partitions, or full system backups. Nowadays, we also have the ‘smart’ method, which automatically backs up files in commonly used locations (syncing) and we have the option of using cloud storage.
Backups can be scheduled, running as incremental, differential or full backups, as required.
For organisations and large enterprises that are planning on selecting backup software tools and technologies, this article reviews the best open source tools. Before choosing the best software or tool, users should evaluate the features they provide, with reference to stability and open source community support.
Advanced open source storage software like Ceph, Gluster, ZFS and Lustre can be integrated with some of the popular backup tools like Bareos, Bacula, AMANDA and CloneZilla; each of these is described in detail in the following section.
Ceph is one of the leading choices in open source software for storage and backup. Ceph provides object storage, block storage and file system storage features. It is very popular because of its CRUSH algorithm, which liberates storage clusters from the scalability and performance limitations imposed by centralised data table mapping. Ceph eliminates many tedious tasks for administrators by replicating and rebalancing data within the cluster, and delivers high performance and infinite scalability.
Ceph also has RADOS (reliable autonomic distributed object store), which provides the earlier described object, block and file system storage in singly unified storage clusters. The Ceph RBD backup script in the v0.1.1 release of ceph_rbd_bck.sh creates the backup solution for Ceph. This script helps in backing up Ceph pools. It was developed keeping in mind backing up of specified storage pools and not only individual images; it also allows retention of dates and implements a synthetic full backup schedule if needed.
Many organisations are now moving towards large scale object storage and take backups regularly. Ceph is the ultimate solution, as it provides object storage management along with state-of-art backup. It also provides integration into private cloud solutions like OpenStack, which helps one in managing backups of data in the cloud.
The Ceph script can also archive data, remove all the old files and purge all snapshots. This triggers the creation of a new, full and initial snapshot.
OpenStack has a built-in Ceph backup driver, which is an intelligent solution for VM volume backup and maintenance. This helps in taking regular and incremental backups of volumes to maintain consistency of data. Along with Ceph backup, one can use a tool called CloudBerry for versatile control over Ceph based backup and recovery mechanisms.
Ceph also has good support from the community and from large organisations, many of which have adopted it for storage and backup management and inturn contribute back to the community.
A lot of developments and enhancements are happening on a continuous basis with Ceph. A number of research organisations have predicted that Ceph’s adoption rate will increase in the future. Ceph also has certain cost advantages in comparison with other software products.
More information about the Ceph RBD script can be found at http://obsidiancreeper.com/2017/04/03/Updated-Ceph-Backup/.
Red Hat’s Gluster is another open source software defined scale out, backup and storage solution. It is also called RGHS. It helps in managing unstructured data for physical, virtual and cloud environments. The advantages of Gluster software are its cost effectiveness and highly available storage that does not compromise on scale or performance.
RGHS has a great feature called ‘snapshotting’, which helps in taking ‘point-in-time’ copies of Red Hat Gluster Storage server volumes. This helps administrators in easily reverting back to previous states of data in case of any mishap.
Some of the benefits of the snapshot feature are:
- Allows file and volume restoration with a point-in-time copy of Red Hat Gluster Storage volume(s)
- Has little to no impact on the user or applications, regardless of the size of the volume when snapshots are taken
- Supports up to 256 snapshots per volume, providing flexibility in data backup to meet production environment recovery point objectives
- Creates a read-only volume that is a point-in-time copy of the original volume, which users can use to recover files
- Allows administrators to create scripts to take snapshots of a supported number of volumes in a scheduled fashion
- Provides a restore feature that helps the administrator return to any previous point-in-time copy
- Allows the instant creation of a clone or a writable snapshot, which is a space-efficient clone that shares the back-end logical volume manager (LVM) with the snapshot
BareOS configured on GlusterFS has the advantage of being able to take incremental backups. One can create a ‘glusterfind’ session to remember the time when it was last synched or when processing was completed. For example, your backup application (BareOS) can run every day and get incremental results at each run.
More details on the RGHS snapshot feature can be found at https://www.redhat.com/cms/managed-files/st-gluster-storage-snapshot-technology-overview-inc0407879-201606-en.pdf.
The best open source backup software tools
AMANDA open source backup software
Amanda or Advanced Maryland Automatic Network Disk Archive (https://amanda.zmanda.com/) is a popular, enterprise grade open source backup and recovery software. According to the disclosure made by AMANDA, it runs on servers and desktop systems containing Linux, UNIX, BSD, Mac OS X and MS Windows.
AMANDA comes as both an enterprise edition and an open source edition (though the latter may need some customisation). The latest version of the AMANDA Enterprise version is release 3.3.5.
It is one of the key backup software tools to be implemented in government, databases, healthcare and cloud based organisations across the globe.
AMANDA has a number of good features to tackle the explosive data growth and for high data availability. It provides and helps in managing complex and expensive backup and recovery software products.
Some of its advantages and features are:
- Centralised management for heterogeneous environments (involving multiple OSs and platforms)
- Powerful protection with simple administration
- Wide platform and application support
- Industry standard open source support and data formats
- Low cost of ownership
Bareos (Backup Archiving Recovery Open Sourced)
Bareos offers high data security and reliability along with cross-network open source software for backups. Now being actively developed, it emerged from the Bacula Project in 2010.
Bareos supports Linux/UNIX, Mac and Windows based OS platforms, along with both a Web GUI and CLI.
Clonezilla is a partition and disk imaging/cloning program. It is similar to many variants available in the market like Norton Ghost and True Image. It has features like bare metal backup recovery, and supports massive cloning with high efficiency in multi-cluster node environments.
Clonezilla comes in two variants—Clonezilla Live and Clonezilla SE (Server Edition). Clonezilla Live is suitable for single machine backup and restore, and Clonezilla SE for massive deployment. The latter can clone many (40 plus) computers simultaneously.
Designed to be used in a cloud computing environment, Duplicati is a client application for creating encrypted, incremental, compressed backups to be stored on a server. It works with public clouds like Amazon, Google Drive and Rackspace, as well as private clouds and networked file servers. Operating systems that it is compatible with include Windows, Linux and Mac OS X.
Like Clonezilla, FOG is a disk imaging and cloning tool that can aid with both backup and deployment. It’s easy to use, supports networks of all sizes, and includes other features like virus scanning, memory testing, disk wiping, disk testing and file recovery. Operating systems compatible with it include Linux and Windows.