How to Choose the Right Open Source Database

April 17, 2018

6183

Databases store data in an organised manner so that its retrieval becomes easy. Also, the management of data is easier when it is stored in a database. There are many factors to be considered, however, before choosing a database for a particular software application. Let’s take a look at how we can make the right choice.

Considering the number of users working with different online applications nowadays, databases are about the most important part of any software application, and can make a huge impact on its performance. It is really difficult and time consuming for a database to operate if it has a huge load of data or if it has a variety of data sets to operate upon. There are also several other factors like the security of data, as well as the cost (if there is one) associated with the database, impacting the choice of the database. A free database helps users to avoid huge costs. Hence, open source databases have been playing quite an important role in many of the widely used software applications across the globe.

Different kinds of databases available in the market

There are various types of databases being used by different users across the globe, based on their requirement. The following is a broad classification of databases.

1. Relational databases: Relational databases are the most common among all the types. In such databases, the data is actually stored in the form of different data tables. Each of the tables has a unique key field and that key is used to connect one table to the other tables. Hence, different tables are related to each other with the help of various key fields. Such databases are widely used in industries like media, telecom, etc, and this is probably the type of database one is most likely to come across.

2. Operational databases: An operational database is very important for organisations, as it supports the customer database and the inventory database. It helps companies to keep a track of inventories as well as store details of the customers who buy its products. The data stored in various operational databases can be analysed and used based on the requirements of the company.

Figure 1: Interaction of a user with a DBMS

3. Database warehouses: There might be a requirement for different organisations to keep some of the relevant data for several years. Such data sets act as significant sources of information to compare and analyse the present year’s data with that of previous years, which makes it quite easy to determine key trends. All such data from previous years is maintained and stored in a large database warehouse. Since the stored data set has already gone through various kinds of editing, screening and integration, it no longer requires any editing or alteration.4. Distributed databases: Many organisations have several regional offices, manufacturing plants, branch offices and a head office. Each such work group may have its own set of databases, which collectively form the main database of the company. Such a system of databases is known as a distributed database.

5. End user databases: There is a variety of data available at the workstations of different end users of an organisation. Each workstation acts like a small database in itself, and has data in the form of spreadsheets, presentations, downloaded files, Word files and in Notepad format. All such small databases together form an end user database.

Now let’s have a look at some of the popular open source databases available in the market.

MySQL

MySQL has been in use since 1995 and is currently owned by Oracle. In addition to its open source version, there are a couple of paid editions available as well, which offer some additional features, such as cluster geo-replication and automatic scaling. We all know that MySQL has become an industry standard now, as it is compatible with almost every operating system and has been written in both C and C++. This database option is great for different international users as well, since the server can provide various error messages to the clients in a number of languages.

Pros:

Offers a flexible privilege and password system
Can be used even if there is no available network
Uses host-based verification
It consists of different libraries that can be embedded into various standalone applications
It has security encryption for all the password traffic
Supports servers as a separate program for the client server networked environment

Cons:

Users feel that MySQL no longer falls under the free OS category
It is no longer community driven, so bug fixes and patches do not happen in time
Falls behind other similar available options due to its slow updates

PostgreSQL

PostgreSQL, also called Postgres, is basically an object-relational DBMS with much emphasis on extensibility and standards compliance. As a database server, its important functions are to securely store data and return the data or data sets in response to different requests made from other software applications. PostgreSQL can handle different workloads, ranging from that of small single-machine applications to even large Internet-facing applications with many concurrent users. It has been developed by PostgreSQL Global Development Group, and is available as free and open source software.

Pros:

It is transactional and ACID (atomicity, consistency, isolation, durability)-compliant
Supports updatable views and materialised views
Functions, stored procedures, triggers, etc, can be very well used in it
Supports concurrency with the help of a system known as MVCC (Multi Version Concurrency Control).
Provides three levels of transaction isolation, which are: Read Committed, Repeatable Read and Serialisable
Supports serialisability using the serialisable snapshot isolation technique
Supports a large variety of data types like Boolean, binary, etc

Cons:

It does not have any bug tracker (whereas it supports a bug-submission form, which feeds into the pgsql-bugs mailing list), making it a bit difficult to know the status of bugs.

Figure 2: Global database markets at a glance

MariaDB

MariaDB is a database that is widely used by tech giants like Wikipedia, Facebook and even Google. It has been developed by various developers who worked as part of MySQL. It is basically a database server that provides drop-in replacement functionality for MySQL. Data security is one of the most significant concerns and priorities for developers of MariaDB, and almost in each of its solution releases the developers merge in all of MySQL’s security patches and also enhance the same, if required.

Pros:

It provides real-time access to the data sets
It supports the maximum number of core functionalities of MySQL (it is an alternative to MySQL)
It supports high scalability with easier integration
It provides a couple of alternate patches, storage engines and server optimisations

Cons:

It does not provide support for the memcached interface
The password complexity plugin is not available
Has no optimiser trace

Choosing the best available open source database

1. Size of data: When choosing a database, we should always consider the volume of data that we need to retrieve and store as critical application data in a database. The amount of data that we can retrieve and store may vary depending on the combination of the selected data structure, and the ability of any database to differentiate between various data sets available across multiple servers and file systems. Hence, we need to choose our database by considering the overall volume of data generated by a software application at any specific rate, and also the size of data that needs to be retrieved from the database.

2. Speed and scalability: We must also gauge the speed that we require for reading and writing different sets of available data into the database—the time taken to service all the incoming reads and writes to any specific application. Some databases are designed to optimise read-heavy applications, whereas others are designed to support write-heavy solutions. Selecting a database that can handle our application’s input/output needs can really go a long way to a scalable architecture.

3. Structure of data: Structure of the data set is all about the manner in which we need to store and retrieve our data. Since an application deals with data present in diverse formats, before selecting a database we should consider picking the right data structure for storing and retrieving the data sets. If we fail to select the right data structure for persisting available data, the application may take more time to retrieve data from the database. This could also lead to more development efforts to work around any of the data issues.

4. Accessibility of data: We should also think about the number of users that may concurrently access the database in order to perform any operation on available data, and also the level of computation involved in accessing any set of data. The processing speed of software applications may get affected if the database chosen is not really good enough to handle large loads.

5. Safety and security of data: We must also check the level of security that a database provides for the data stored in it. In case of highly confidential data, we really need to have a highly secured database for the application using it. The different safety measures implemented by a database in case of any system crash or failure are an important factor that you should look at before choosing a database.