Backup solutions are a must for enterprises. Without a backup and recovery solution, businesses could lose a fortune if there is a data breach or loss due to some accident. This article offers a few backup and restore solutions which, true to the tradition of open source, are free of cost.
According to a recent Forbes report, 2.5 quintillion bytes of data are being generated every day and this is expected to multiply several times in the coming years as we inch towards the AI-enabled smart digital era. All the new technologies like machine learning and IoT are fuelling data creation and, at the same time, increasing our vulnerability to data breaches. Well, the spectre of potential breaches is definitely haunting both small and large businesses. Losing data can affect businesses very hard. For most medium to large scale companies, experiencing even a single hour of downtime can cost more than US$ 10,000 while for smaller firms, the cost can go up to US$ 1 million. With such potential losses, it’s little wonder that different enterprises invest heavily in backup and recovery solutions to help them get their data back.
If we look at a small business or home user, a hard drive crash may lead to the loss of decades of family photos or important business reports, tax forms and other such irreplaceable files. Without any backup and recovery solution, it can take days or even years to repair such damages. A backup solution helps businesses to prevent the heavy losses that can be caused by various uncontrollable disasters.
Backup and recovery is generally the process of creating and storing copies of the data, which can be used to protect firms and organisations against data loss. This is also referred to as operational recovery. Recovery using a backup typically involves restoring the data to its original location or to any alternate location, where it can be used in place of the lost or damaged data. A proper backup copy is stored in a different system or medium (like tape) from that of the primary data to protect against any possibility of data loss in case of primary hardware or software failure. Storing the copy of data on a separate medium is critical to protect against any primary data loss or corruption. This additional data storage medium can be as simple as a USB stick, an external drive, or something more substantial, like a cloud storage container, disk storage system, or tape drive. We can have the alternate medium in the same location as that of the primary data or even at a remote location. It is recommended that the backup copies are made on a regular basis in order to minimise the data lost between consecutive backups. The longer the time interval between consecutive backup copies, the higher the possibility for data loss while recovering from a backup.
Fortunately, the open source community has many solutions available that can support individuals and organisations of all sizes to protect their valuable data, with minimal expense.
Why do we need open source backup solutions?
- The main purpose of the backup is to create and store a copy of the data that can be easily recovered in case of any primary data failure, which can occur due to data corruption, hardware or software failure, or even a human-caused event like malicious malware attacks, or accidental deletion of data.
- Backup copies allow data to be easily restored from an earlier point in time to help the business recover from an unexpected event.
- Retaining multiple copies of data sets at regular intervals provides the flexibility and insrance to restore data at any instance of time, without being affected by malicious attacks or data corruption.
- Open source backup solutions are known for better hardware and software compatibility.
- As these solutions are available as open source, they can be customised and configured as per the requirement.
- They have better and wider community support, which helps in getting bugs and vulnerabilities fixed quickly.
- The security of any open source solution is generally underestimated; in reality, it is often more secure than many licensed backup solutions.
- Open source backup solutions also provide a better user experience, which helps any one easily use the features they require.
Different types of recovery and backup solutions
The two different approaches to recovery and backup are listed below.
Traditional or streaming backups: This is the traditional approach to recovery. All the available data is streamed from the application server or hosted through a backup server to a secondary storage system. The backup server basically ingests the data and then performs additional actions to optimise it before writing the data to the secondary medium. The additional actions commonly include:
- Indexing the data set for easy restore and search
- Data reduction (which includes de-duplication, compression, etc)
- Encrypting the available data to protect it during transit
Traditional backup and recovery systems offer a few benefits, including:
- The ability to manage and consolidate recovery and backup from multiple primary systems with the help of just a single backup interface and storage target
- The ability to reduce the data footprint with compression and global de-duplication
- Application integration to improve the state of data when restored
- Intelligent metadata management to improve the data recovery
Array-based backup or recovery: Array-based backup and recovery solutions are built using storage snapshots and offer an alternative approach to protecting the data set. There are various benefits to this approach:
- High-performance snapshots on the primary storage easily create local recovery points with a very low impact on the production workload. This enables higher service levels with shorter backup windows and frequent recovery points.
- Local snapshots generally offer rapid recovery as compared to the streaming backup since the snapshot is already present on the primary storage and doesn’t need to be retrieved from the secondary media.
- Array-based data replication removes the requirement to stream data through a virtual host or the application server.
- Data efficiencies are retained from the primary storage to the secondary storage.
- Data transferred to the secondary storage is not present in a proprietary format, hence making it faster and easier to use for other purposes like instant tests or dev workflows.
Top open source backup solutions
AMANDA, also known as Advanced Maryland Automatic Network Disk Archiver, is a popular enterprise backup system. It was developed at the University of Maryland in 1991 to protect files on a large number of client workstations using a single backup server. James Da Silva was one of its original developers. AMANDA runs on almost a million servers and desktops across the globe, supporting multiple operating systems. It is available in the market in three different editions—Enterprise, Community and Zmanda Backup Appliance. The community edition is available free of cost whereas the enterprise edition supports live application backups. The Zmanda backup appliance is basically a virtual machine that’s capable of easily backing up an entire network.
- Supports multiple operating systems including Windows, UNIX, BSD, MacOS, etc.
- Supports tape, disk and optical media backups.
- Allows users to set up a single master backup server to multiple Windows, Linux and UNIX hosts.
- It provides backup of sparse files and hard links as well.
- There is no change to the file’s timestamp during backup.
- Supports exclusion of files and directories.
- It uses various standard utilities like Dump and GNU tar, which help to recover data very easily.
- Its unique scheduler optimises the backup levels for various clients in such a way that the total backup time is about the same for every backup run.
- It frees system admins from having to guess the rate of data changes in their environments.
This is a free disk-to-disk backup software suite with a Web-based front-end. This cross-platform server can run on any Solaris, Linux or UNIX based server. In 2007, it was ranked among the three most well-known open source backup software. Even though it’s so amazing, not many have heard of it. This network backup system can archive a large number of files to a local or networked disk storage solution. Data de-duplication actually reduces the disk space that is required to store the backups in the disk pool. It is possible to use BackupPC as a disk-to-disk-to-tape solution, if its archive function is used to back up the disk pool to tape.
- BackupPC uses compression and pooling to make different archived files as small as possible, hence reducing the storage hardware capacity costs and requirements.
- No dedicated client is required as the server itself acts as a client for several protocols that are being handled by other services native to the client OS.
- It is not a block-level backup system, but performs file based restore and backup.
- It incorporates a SMB (server message block) client that can be used to back up the network shares of computers running Windows.
- It is considered very useful for Web servers running SSH (secure shell) with GNU tar and rsync available, as it allows the BackupPC server to be stored in a subnet that is separate from the Web server’s DMZ.
- It supports Windows, Solaris, Linux or UNIX based servers.
This is basically a set of computer programs used for recovery, backup and verification of computer data across a network of different kinds of computers. It’s relatively easy and efficient to use. It offers a couple of advanced storage management features that make it simple to find and recover the lost or damaged files. It’s made up of five major components or services.
- The Bacula Director service actually supervises different backups, and verifies, restores and archives various operations.
- The Bacula Console service lets the administrator or user communicate with the Bacula Director. Currently, the Bacula Console is available in three versions: GNOME-based interface, text-based console interface and a wxWidgets graphical interface.
- The Bacula File service (also referred to as the client program) is generally installed on the machine to be backed up.
- The Bacula Storage service consists of various software programs that perform storage and recovery of the file attributes as well as data to the physical backup media.
- The Catalogue service comprises different software programs responsible for maintaining the file indexes and volume databases for all the backed up files.
- Supports Windows, Linux and UNIX based OSs.
- Restores and backs up clients of any type, ensuring that all the attributes of the files are properly restored and saved.
- Supports multi-volume backups.
- Provides a comprehensive SQL standard database for all the files that are backed up. This allows online viewing of the saved files on any particular volume.
- Offers enterprise-grade recovery, backup and data verification capabilities.
- Supports automatic removal of old records, hence simplifying the database administration.
- Any of the SQL database engines can be used by Bacula, making it quite flexible.
- Provides a built-in job scheduler.
- Supports a rescue CD for Linux systems.
Also referred to as Backup Archiving Recovery Open Sourced, this is actually a network based open source backup solution. It can do backups to tape and disk drives including tape libraries. Apart from common backup strategies like incremental, full and differential, BAREOS also offers copies, migrations, virtual full backups and de-duplication with the help of base jobs. By using migrations, a backup into the cloud is also possible.
- Supports almost all of the well-established OSs like UNIX, Linux, Windows and MacOS X.
- It supports encryption and compression, both hardware and software based.
- The latest release of BAREOS also offers native integration for different cloud storage technologies like GlusterFS (Red Hat Storage), Ceph, etc. It allows backups to be directly written into the cloud backend. It is comparatively secure because of the available data encryption and transport.
- With the help of the self-backup feature, it gets back into operation rapidly in case of disasters.
- The software is available as open source, and offers completely open interfaces so that it can easily integrate with existing IT environments.
- It supports IPv6 and the passive client feature, with which it can be easily integrated with complex networks while complying with the security guidelines.
- It provides efficient bandwidth utilisation and practical console commands.