NoSQL databases can handle unstructured data. Find out how they differ from other databases, and when and where they should be used.
Different types of databases can be used for managing structured and unstructured data. Relational databases are used for structured data, which is formatted and has well-known types. NoSQL databases are used for unstructured data. A database management system typically has an interface, a query processing engine, metadata processor, indexing engine, and storage manager. These components help in querying and storing data into the DBMS. Earlier, blobs (an acronym for binary large objects) were used for storing unstructured data. Tabular data stores evolved to columnar data stores to handle data that is not normalised.
NoSQL databases differ from an SQL DBMS in the way data is available — unstructured or structured. SQL is the query language for relational databases. These have well defined schema. Relational databases are used in transactional or reporting scenarios. NoSQL databases are used to handle unstructured databases in data lake scenarios. The data can be document based and key value type, or can be graph based. Fields can be added dynamically, just like in a document.
The most popular NoSQL databases are MongoDB, Apache CouchDB, Apache HBase, Oracle NoSQL, Apache Cassandra, Neo4j, BigTable, Riak, Objectivity Infinite Graph, and MariaDB. Many enterprises are choosing NoSQL databases to handle data from multi-channel and IoT apps.
NoSQL database management systems were developed to handle key value data types, wide column data types, documents, and graph data types. A key-value data store is based on dictionary/hash tables. Document data stores can handle data formats like XML, YAML, and JSON. Graph data stores can handle graph based data. NoSQL databases are performant and scalable. REST, Graph QL, and other NoSQL query methods are used for accessing data from NoSQL databases.
SQL DBMSs handle data types like int, char, varchar, binary, varbinary, float, decimal, double, bit, blob, bool, longtext, tinytext, mediumtext, and text. Relational data format is used for transactional data processing like commerce ordering, accounting, and payment systems.
Programming language support
A structured query language is used for creation, updation, modification, and deletion of schema and data. In the design phase, a relational database (SQL DBMS) is designed for its schema and database procedures. NoSQL DBMS is based on a dynamic schema for handling unstructured and semi-structured data. Columns can be added dynamically and the syntax is different across the NoSQL databases. The query languages for NoSQL can handle documents, columns, graphs, and key-value data types.
Programming languages like Java, C#, Python and others provide support through JDBC drivers for SQL databases. Similarly, they provide support for NoSQL data access.
|“The cost of managing traditional databases is high. Mistakes made during routine maintenance are responsible for 80 per cent of application downtime.”
—Dev Ittycheria, president and CEO of MongoDB Inc.
An important design concern is the query performance. Query languages need to have support for optimisation to handle complex queries. SQL provides stored procedures and constructs to handle complex data processing programs. It is a lightweight and declarative query language. In NoSQL, query languages are not optimised for handling Big Data. On the other hand, a NoSQL database is scalable and flexible. Big Data frameworks such as Hadoop MapReduce and others help in handling complex data processing for a NoSQL database. In Web applications, developers handle the complex query processing in the application layer, and keep the data access layer plain and simple.
NoSQL database management systems can be used for creating documents, adding columns to the schema, storing documents, handling complex structures, and content. NoSQL databases abide by the BASE model — basic availability, soft state, and eventual consistency. They ensure data is available, and handle the changes in the state of this data. A NoSQL DBMS ensures the consistency of data. SQL databases abide by ACID (atomicity, consistency, isolation, and durability) properties. A few NoSQL databases also do so.
SQL is used for data querying and modification. The query performance can be optimised easily in SQL databases. In NoSQL database based architectures, query processing is handled in the business layer and not in the data access layer.
Relational DBMSs are scalable, and load can be handled by adding more processing power, memory and storage. NoSQL DBMSs can be scaled by adding new database nodes to handle the traffic using sharding. Sharding helps in handling Big Data and data which is changing.
Huge volumes of data can be processed at higher velocity by scaling out. Unstructured/semi-structured data can be stored by using the NoSQL datastore. This type of DBMS is consistent, performant and scalable. NoSQL, by definition, is not SQL-compliant. It has its own query language to handle data processing. Vertical scaling can be visualised as adding more computing power vertically. Horizontal scaling is like growing a farm by adding more devices horizontally.
Tables are used for designing database schema in SQL databases. NoSQL database management systems use key-value data types, documents, graphs, and wide column data types. SQL DBMSs abide by atomicity, consistency, isolation, and durability (ACID) properties. ACID properties help in preserving the integrity and validity of data. NoSQL DBMSs are based on the CAP theorem, which was coined by Brewers to focus on consistency, availability, and partition tolerance properties. This theorem states that a distributed database system can only have two of the three properties of consistency, availability and partition tolerance.
Hence an SQL DBMS is suitable for transactional applications like online ordering systems. NoSQL is good to handle unstructured data.
Convergence of NoSQL and SQL DBMSs
There are a few NoSQL DBMSs that comply with ACID properties as well as the CAP theorem, signifying a convergence of the SQL and NoSQL DBMSs. MySQL and other SQL databases provide a document store to handle unstructured data. MongoDB, on the other hand, provides features to handle ACID based transactions. Similarly, AWS managed NoSQL, DynamoDB, and others abide by ACID properties.
|“The choice to use a NoSQL database is often based on hype, or a wrong assumption that relational databases cannot perform as well as a NoSQL database. Operational costs, as well as other stability and maturity concerns, are often overlooked by engineers when it comes to selecting a database.”
—Yoav Abrahami, Scaling to 100M: MySQL is a Better NoSQL | Wix Engineering
Both SQL and NoSQL DBMSs come with their own advantages and disadvantages. If one DBMS can handle relational and unstructured data, it works the best in terms of cost and operational efficiency.