Picking the Right Database

0
323
Right Database

The right database is what you need for your application’s success. Here we cover the key database types as well as what you should check before choosing one for your application.

In today’s data-driven world, businesses and organisations are faced with a wide range of options when it comes to selecting a database system to handle their data storage and management needs. The choice of the right database is crucial as it can significantly impact the performance, scalability, and overall success of an application or a project. Whether you are building a high-traffic e-commerce website, developing a mobile application, or working on a complex data analytics project, understanding the intricacies of database selection can help you build a robust and efficient system. So let’s explore the essential considerations for selecting the right database for different use cases along with the types of databases and their pros/cons.

  • Data model: The primary aspect is the nature of data that you wish to store. Data can be structured or unstructured, and can range from being a simple key-value to storing a document against a key. It can be data against a domain entity or relationships among the domain entities. Based on these use cases you can end up with primarily row-based, column-based, document-based or graph-based modelling at a high level. The data modelling in itself is heavily influenced by the domain modelling and design.
  • Query patterns/use cases: Application read and write patterns are a key decider because the data being stored is eventually for an application to read or write. So it’s crucial to make it read and/or write friendly. As one delves into the data flow level details of any application design, the query patterns become clearer. For instance, an application may want to look up a particular value against a key, in which case a row/document oriented database would make more sense, or say, get the keys for a range of values, in which case a columnar database would be a better choice.
  • CAP theorem: Production databases are generally deployed as distributed systems for the sake of fault tolerance and reliability. Hence various factors come into play, namely, consistency (i.e., how consistent is the value returned by all the members in a cluster), availability (i.e., how operational is the system despite failures or network partitions), and partition tolerance (i.e., how tolerant is the distributed system for network failures among the members in the cluster). The famous CAP theorem helps one disambiguate as to what is required and what can be traded off among consistency, availability and partition tolerance. Given network partitions are practically unavoidable, if your use case demands consistency then you may need to trade off availability or vice versa. For instance, in a simple 2-node database system (one master and one replica), if the use case demands consistency then the master will need to block the transactions against a certain key until the slave also gets updated for that key. However, during this window of master-replica synchronisation, any updates against the key will be blocked, thereby trading off the availability.
  • Scale: As you get into the non-functional aspects of the system, scale requirements get crucial. This concerns forecasting the volume of data and requests to the database. A few systems scale well horizontally while others do so vertically, and in some use cases, certain keys can end up being heavily accessed.
  • Performance: This involves assessing the performance requirements of your application. At some point of system design, you will understand which queries are read-heavy and which are write-heavy. The application response time decides the database transaction latency requirements.
  • Interoperability with ecosystem: A database may never end up being called only from one or two services. Over time new requirements can add up, and your database technology should be able to communicate with the rest of your infrastructure with ease. For instance, as your organisation expands you may wish to export the data from your database to a warehouse or, say, trigger actions on row updates, at which point your database should integrate well with the adjacent technologies in your architecture. You can also look into the ecosystem support for the technology choice – in terms of how good the support is, open source knowledge and off-the-shelf available libraries (for example, ORM or object-relational mapping to manage the data access for relational databases).
  • Security and compliance: In this age of growing fraud, security mechanisms like authentication, authorisation, encryption at rest and/or in transit are pretty basic offerings that you must look for. Compliance requirements can vary based on the organisation and can include handling sensitive data, ensuring the database complies with regulations, auditing, etc.
  • Development and operational overhead: Eventually, it is the developers who will maintain the system and hence need to ensure that their life is going to be easy while dealing with the decided choice. For instance, some databases are serverless while some need you to create and manage your own cluster. One also needs to evaluate how steep the learning curve will be for a newbie joining the team to learn about the technology.
  • Cost: Cost is an important consideration when making the selection. Consider not only the upfront costs but also the long-term costs associated with scaling, maintenance, and support. Some databases are free to use while some are licensed.

Types of databases

Let’s look into the broad types of databases based on data modelling that will usually be a pivot point for you to start making the choice.

Relational/row-based databases

  • These databases store and organise data in the form of tables using rows and columns with a predefined schema. They are ideal for handling complex querying (e.g., joins between tables), transactional support (through commits/rollbacks), and data integrity (through constraints like primary/foreign key, etc) to support ACID properties. These are also called SQL-based databases, given SQL is the most common form of accessing data.
  • A major downside is the schema extensibility and that makes these less suited for non-structural data or data whose structure is not known fully and can vary over time. This is where NoSQL (not only SQL) databases find their fit.
  • Examples include MySQL, Postgres, and Oracle databases. These are widely used in finance/banking domains where data consistency/integrity is of paramount importance, and also ERP and CRM.

Columnar-based databases

  • Unlike row-based databases, where data is stored as rows on the disk, columnar databases store data as columns, thereby optimising for queries against columns. Hence, they find good use for analytical queries. These databases are often used as data warehouses for business intelligence and reporting. Examples are Apache Cassandra and AWS Redshift.
  • These databases are not optimal for transactional workloads that demand high concurrency as they are generally optimised for read-heavy workloads. They could also be an overkill when your data set is small and doesn’t demand an analytical set of use cases but rather needs ACID properties. The read cost increases based on how many columns you like to pick in the query.

Document-oriented databases

  • These databases store data in self-describing documents in JSON/BSON or any custom form that is flexible/dynamic in schema. The structure of documents can be easily extended over time, and each key can have its own structure of the document. General use case is storing various attributes against a domain entity. They are indexed using one or more keys or even using attributes for a few technologies.
  • Examples are MongoDB and CouchDB; almost every modern-day application can be found using these databases.
  • You can choose between technologies based on consistency and availability needs, as discussed with respect to the CAP theorem earlier in this article. Examples include MongoDB and AWS DynamoDB.

Key-value databases

  • These are simple databases used to store values against a key. The key spaces are highly partitional and horizontally scalable, making these best suited for high throughput, low latency reads and write use cases.
  • Examples of key-value databases include Redis, Riak DB, and AWS DynamoDB. A few databases serve a dual purpose – both as a key-value store and as a document database. For instance, AWS DynamoDB can be used as a document database (to store a document against a key) or even a simple value against a key.

Graph databases

  • These databases model data in the form of a graph, and use nodes and edges. These nodes and edges can have properties. They are well suited for modelling highly connected data and conducting complex multi-hop traversals, reverse lookups and relationship analysis which is either too complex with other databases or even impossible to achieve.
  • Examples include Neo4j, AWS Neptune, and Azure Cosmos DB.

Time series databases

  • These are designed for efficient storage, retrieval and analysis of timestamp-based information like sensor data, system metrics like CPU, memory usage, financial market data, etc. Trying to achieve these use cases with other databases will mean conducting complex queries or incurring a lot of I/O cost, which is where time-series databases come to the rescue.
  • Examples include Influx DB and Open TSDB.

In-memory databases

  • For high performance/low latency applications that demand microsecond range of read/write latency (as opposed to other databases that usually deliver milliseconds range performance), in-memory databases are the choice. The improved performance is because data is stored and accessed from the primary memory/RAM as opposed to another database that involves disk I/O.
  • The downside over other databases is the cost. Because primary memory is generally more costly than disk, one pays a higher cost for the benefit of low latency while choosing in-memory databases. Examples include SAP HANA and Redis.

Besides these, there are several other database types such as ledger databases, spatial databases, and vector databases that are popular for large language models or LLMs and have made a mark in different fields. I’ve covered just the top commonly used database types for sake of the article’s length.

LEAVE A REPLY

Please enter your comment!
Please enter your name here