Cosmos DB: Microsoft’s New Globally Distributed and Horizontally Scalable Database Service


Data is gold in today’s world. But it has no value unless it is processed, manipulated and used to derive useful insights. Databases enable the handling of data, and there are many out there to choose from. Cosmos DB has a lot going for it, as readers will discover in this article.

Cosmos DB is the next generation of Azure DB and an enhanced version of Document DB. Azure Cosmos DB is Microsoft’s globally distributed, multi-modal database, which scales at the click of a button. Because it’s multi-modal, it supports document, table and graph values together in a single database. Cosmos DB (formerly known as Document DB) is designed to store semi-structured data such as documents, typically in JSON or XML format. Unlike traditional relational databases, the schema for each non-relational (NoSQL) document can vary, giving developers, database administrators and IT professionals more flexibility in organising and storing application data and reducing the storage required for optional values.

Cosmos DB is a planet scale database. It is a good choice for any serverless application that needs low order-of-millisecond response times, and has to scale rapidly and globally. It is more transparent to our application and config does not need to change.

How it was created
Microsoft Cosmos DB isn’t entirely new — it grew out of a Microsoft development initiative called Project Florence that began in 2010. Project Florence is a speculative glimpse into our future. It envisages both our natural and digital worlds co-existing in harmony through enhanced communication.

  • It was first commercialised in 2015 with the release of a NoSQL database called Azure Document DB.
  • Cosmos DB was introduced in 2017.
  • It expands on Document DB by adding multi-modal support, global distribution capabilities and relational-like guarantees for latency, throughput, consistency and availability.

Defining Cosmos DB
Cosmos DB is a database service that is globally distributed. It allows for the management of data even if it is stored in data centres that are scattered throughout the world. It provides the tools needed to scale both global distribution patterns and computational resources that are provided by Microsoft Azure.

Reasons for using Cosmos DB

  • It has no data scheme and is schema-free. It indexes all the data without requiring schema and index management.
  • It is also a multi-modal, natively supporting document, key-value, graph and column-family data model.
  • It is the industry’s first globally distributed, horizontally scalable, multi-modal database service. Azure Cosmos DB guarantees single-digit-millisecond latencies at the 99th percentile anywhere in the world. It offers multiple well-defined consistency models to fine-tune performance and guarantees high availability (HA).
  • There’s no need to worry about instances, servers, CPU or memory. Just select the throughput, the required storage and create collections. Cosmos DB works based only on throughputs. It is integrated with Azure functions. It is a serverless event-driven solution.
  • APIs and access methods: Cosmos DB comes with the Document DB API, Graph API (Gremlin), Mongo DB API, RESTful HTTP API and the Table API. This gives more flexibility to the developer.
  • It is elastic, globally scalable and with HA, automatically indexes all data.
  • It has five consistency levels — bounded staleness, consistent prefix, session consistency, eventual consistency and immediate consistency. And there are more options to choose between consistency and performance.
Figure 1: Managing data without Cosmos DB

Figure 1 shows a simple example of geographical data that highlights the importance of and need for Cosmos DB.

Managing data without Cosmos DB:

  • Data geo-replication might be a challenge for the developer.
  • Users from remote locations might experience latency and inconsistency in their data.
  • Providing an automatic failover is a real challenge.

Managing data with Cosmos DB:

  • Data can be geo-distributed with just a few clicks.
  • Developers do not need to worry about data replication.
  • Strong consistency can be given to the end users across geo-distributed locations.
  • Web-tier application can be changed anytime between primary and secondary tiers, just with a few clicks.
  • Failover can be initiated any time manually, and automatic failover is present.
Figure 2: Managing data with Cosmos DB

The five consistency levels

Data consistency is configurable on Cosmos DB, letting application developers choose from five different levels:

  • Eventual consistency doesn’t guarantee any ordering and only ensures that replicas will eventually converge.
  • Consistent prefix adds ordering guarantees on top of eventual consistency.
  • Session consistency is scoped to a single client connection, and basically ensures a read-your-own-writes consistency for each client; it is the default consistency level.
  • Bounded staleness augments consistent prefix by ensuring that reads won’t lag beyond x versions of an item or some specified time window.
  • Strong consistency (or linearisability) ensures that clients always read the latest globally committed write.

Until now, Microsoft has been a follower on the NoSQL DB path, with MongoDB as the leader. However, with the new release of Cosmos DB and the strong community evolving around developing applications for this database, it is fast acquiring a share in the NoSQL DB market. The use of data migration tools also makes the import of data from other databases to Cosmos DB rather easy. So Cosmos DB is here to stay and is a very good offering from Microsoft. Let’s see what the tech giant offers next.


Please enter your comment!
Please enter your name here