Yugabyte was first developed as closed source commercial software. However, by 2019, its founders realised the power of the open source community and made it available under the Apache 2.0 license. The timing was right as developers were on the lookout for a cloud native database. The adoption and popularity of this database has increased significantly in recent years for a number of reasons.
yugabyteDB was developed by former Facebook software engineers nearly six years ago. It is today widely adopted across enterprises and industries as it is a fully functional open source distributed SQL database, which supports cloud-native applications.
Distributed SQL acts as a single logical database, whereas behind the scenes it is a database cluster of one or more nodes. A distributed SQL database has a three-layer architecture:
- SQL API is the primary interface for applications to model relational data. Like RDBMS indexes, foreign key constraints, JOIN queries and multi-row ACID transactions are unique data models for distributed SQL.
- Distributed query execution takes the responsibility for distributed query processing by sharing query requests across other nodes in the cluster. The execution across multiple nodes minimises latency and keeps a check on the amount of data transferred amongst them.
- Distributed data storage ensures data is distributed and replicated, avoiding bottlenecks and ensuring high availability, leading to high performance.
Figure 1 gives a reference implementation of distributed SQL across three nodes. An application connecting to node 1 leverages node 2 and 3 for query processing. In case node 1 goes down, node 2 or node 3 picks up the request and continues to process the application, ensuring high availability at all times.
There are many distributed SQL implementations. YugabyteDB is one of the first and most successful ones and is being used by enterprises of all sizes. For a distributed database to be successful there are typical technology considerations like scale, consistency, resiliency, geo-replication and data locality. Given the growing popularity of multi-cloud, it’s also critical for a distributed SQL database to be able to support multi-network deployment. YugabyteDB meets all these criteria.
YugabyteDB has a layered architecture.
Yugabyte Query Layer (YQL) is the top layer that interfaces with applications interacting directly with it using client drivers. This layer manages API specific aspects for query/command compilation and runtime data representations. YQL is built with extensibility, allowing new APIs to be added as use cases grow.
Yugabyte SQL (YSQL) is a distributed SQL API built on top of the PostgreSQL language layer code. It’s stateless SQL query engine is compatible with PostgreSQL.
Yugabyte Cloud QL (YCQL) is a semi-relational language that reflects Cassandra Query Language. Although it is SQL-like language, it’s built specifically to be aware of the clustering of data across nodes.
DocDB is the distributed document store that has strong write consistency and is resilient to failures. It has automatic sharding, load balancing and geo aware data placement policies.
Sharding is done to tables stored in DocDB and is transparent to users.
Replication is applied with a configurable replication factor using the Raft consensus algorithm. It is performed at the table level, ensuring single row linearizability in case of failure.
Persistence is achieved with log-structured row/document-oriented storage including many optimisations supporting a growing data set for efficiency.
What make YugabyteDB stand apart from other distributed databases are its features:
- Supports single- or multi-row transactions (OLTP apps)
- High performance and linear scalability
- Open source and cloud native with multi-cloud deployability
- Geo-location deployment and global data consistency
- Supports change data capture (CDC) mechanism
- Distributed and multi model
- Offers managed services
The need for data availability, consistency and optimum read/write performance is growing each day. YugaByteDB stands tall amongst all the distributed SQL databases. Its 5x storage sets it apart, given the growing cost of data storage, processing and analytics.