By harnessing the right open source database solutions for IoT, organisations can drive innovation, enhance data management, and get actionable insights that propel them ahead in the IoT landscape. Unlock the full potential of IoT by selecting the right set of open source databases.
The technology ecosystem of Internet of Things (IoT) is complex, and includes hardware, connectivity, platforms, middleware, analytics, operating systems, applications, and services.
As per IDC, there will be 41.6 billion connected IoT devices or ‘things’, generating 79.4 zettabytes (ZB) of data in 2025. Sixty-four per cent of those surveyed think open source is significant in deploying IoT, as per the IDC Global IoT Decision-Maker Survey. According to a survey by W3.Org, 91% of IoT developers use open source software, open hardware, or open data in at least one part of their development stack.
According to Statista, the total installed base of Internet of Things (IoT) connected devices is projected to amount to 75.44 billion worldwide by 2025. By 2030, IoT could enable US$ 5.5 trillion to $12.6 trillion in value globally, including the value captured by consumers and customers of IoT products and services, as per McKinsey.
With billions of devices generating trillions of bytes of data, it is very crucial for enterprises to organise, store and work with all the data that is generated.
In general, IoT applications leverage both relational and non-relational (NoSQL) databases. The selection of the type of the database depends on the application type. In most of the cases, a combination of both the databases can be utilised.
Most IoT applications are heterogeneous and domain-centric. Choosing the most efficient database for these applications can be challenging. The important parameters for choosing the right database for IoT applications are scalability, availability, ability to handle huge amounts of data, high processing speed, schema flexibility, integration with varied analytical tools, security and costs.
Open source adoption for IoT
The following are the major drivers that prompt enterprises to adopt open source technologies for IoT.
Cost: Adoption of open source IoT frameworks involves no money, as they are free for use.
Efficiency: Adoption of open source helps developers reduce development times, thereby reducing the development cost.
Scalability: Where billions of devices are supported using open source technology, it helps in affordable scaling.
Innovation: Open source adoption in building newer applications is permissionless and low risk, which leads to better innovation and agility.
Open source API: Open source APIs are a standard gateway for IoT framework-based applications. They help to improve the communication between various software components, hardware devices, and systems.
Libraries: An open source IoT framework offers a wide range of libraries, SDKs, and open source hardware. Raspberry Pi and Arduino are open source tools for customising IoT platforms.
Security: Open source software can protect individuals’ data by implementing powerful encryption (SSH, SSL, PGP, etc).
Interoperability: The adoption of open source helps in collaboration and interoperability.
Drivers for adopting open source databases for IoT
Most IoT solutions are distributed across various geographical locations. This necessitates solutions that adopt fog computing at the edge and cloud computing at the enterprise level. However, no single database product in the market can fulfil the IoT database requirements across the organisation. There is need for a collection of databases, potentially from a variety of vendors, used in one or more stages of the IoT life cycle.
The key business drivers for open source database adoption are:
- Flexibility to process the data at the edge
- Synchronising data between edge servers and the cloud
- Real-time data streaming and analytics
- Data filtering and aggregation
- Increasing cost of ownership for the database landscape
- Increased complexity integration and managing the databases to achieve IoT solutions
- Multiple databases with duplicate functionality and products that are under-utilised
- Need for a wide range of skills to support the IoT landscape
Selection of open source databases for IoT implementation depends on the following requirements:
- Nature and type of data to be collected
- Business criticality of the data
- Importance of the collected data
- High availability and disaster recovery considerations for database processing
- A database that addresses single points of failure
- Intensity of the data communication
- Integration with various sources of data for analytics
Characteristics of open source databases
An open source database for IoT should be fault-tolerant and highly available. It should have the following characteristics:
- No vendor lock-in and should ensure seamless integration of enterprise-wide tools, applications, products and systems developed or deployed by different organisations and vendors.
- These databases should increase productivity, speed up time-to-market, reduce risks and increase quality.
- No vendor monopoly allows use of free and open source databases. With data transferability and open data formats, there are greater opportunities to share data across interoperable platforms.
- Adoption of open source databases enhances the interoperability with other enterprise applications because of reuse of software stacks, libraries, and components.
Layers of IoT architecture and the database they need
The architecture of an IoT database analytics system has the following significant requirements.
- Context: Capturing the context 24 hours a day, 365 days a year
- Standards: Leveraging standard protocols of communication between IoT devices and enterprise systems
- Scalability: Responding to increased load by declining performance, not failure; increasing capacity proportionally as resources are added
- Data management: Efficiently managing enormous volumes of data
- Connectivity: Providing high network connectivity for large data payloads and continuous streaming
- Security: Moving and encrypting information securely even as IoT introduces new risks and vulnerabilities
- Interoperability: Networking all systems together and ensuring interoperability of all data
- Device management: The ability to connect to devices remotely at scale to manage devices such as updating security credentials and firmware.
- AI-driven analytics: Enriching and exploring prescriptive and predictive analytics data to deliver actionable insights
In an enterprise IoT solution, thousands of sensors and actuators are connected to the edge server, continuously collecting data. The database must efficiently perform data transformation operations. The IoT data stream normalises the data to a standard format and sends it to a central repository.
The various layers of IoT architecture are data layer, things layer, communication layer, query layer, management layer, and application layer.
Things layer: This layer consists of entities that produce data for the IoT application and its modules. It covers IoT sensors, actuators, and devices that act like data production objects. Data collection, processing, and real-time data aggregation are performed at this layer.
‘Data ingest’ collects and stores logs and messages from the devices. The database needs to support high-speed write operations and ensure that the data captured is not lost under any circumstances. MQTT, Kafka and REST service components are used to ingest the data from the devices to the database.
‘Edge analytics’ performs the data translation, aggregation and filtering on the incoming data, enabling real-time decision-making at the edge. The database needs to support high-speed reads and writes with sub-millisecond latency and perform complex analytical computations on the data.
Communication layer: The communication layer supports the transmission of requests, queries, data, and results (collection and delivery). This layer acts as a bridge between distributed data sources and data storage processing units. Inter-objects and objects-to-infrastructure communication technologies are used, and interoperation guarantees are provided at upper layers.
The device manager communicates messages to the devices. The database needs to access and deliver messages to the devices with minimum latency. It consists of an IoT/cloud gateway that provides endpoints for device connectivity, facilitating bi-directional communication with the platform and enterprise systems. It also implements edge intelligence with different processing capabilities, enabling connectivity between devices, routing, filtering, and protocol and identity translation.
Query layer: The query layer handles the details of query processing and optimisation in cooperation with the federation layer and the complementary transactions layer (processing, delivery). This layer handles the components for generating, optimising, and executing queries on the IoT database level. It can be deployed at both the central and local levels.
Data layer: The data layer is the core element in the data architecture. It performs the discovery of the data sources across the system. The catalogue of data sources, storage of data, and indexing of collected data are performed at this layer. In addition, filtering of data, pre-processing, and processing of data steps are handled at the data layer. It gathers local and autonomous data repositories to perform these activities.
A warm database is a high-speed in-memory database that reads and writes with the least latency. It provides real-time querying capabilities. This database is highly available and addresses disaster recovery.
Historical data of the IoT systems is stored in a cold database. Typical databases can be relational databases to a data lake.
Management layer: The management layer provides access and security to the various data stores in the data layer of the IoT framework. This layer consists of transaction, recovery, and security managers.
System analytics collects the data from the edge server and performs data transformation and analytics operation. The database provides the commands to perform analytical computations on the data, and stores the data as long as is required by the analytics engine.
Application layer: This layer is an important layer for IoT applications as it acts as an interface to interact with applications for end users. Devices like smartphones, computers and tablets are used as a medium to make IoT applications interactive. This layer handles the orchestration of device data collected into business processes and enterprise applications. It provides interfaces to end users, including operations.
It also implements domain-specific enterprise applications, enterprise data (includes a system of records, reference data, historical data, data warehouses, and transactional data), enterprise’s active directory (stores user profile data), and rules or decision support systems.
Enterprise BI (business intelligence) runs reports, queries, and interfaces from historical data. The database needs to store data cost-effectively for an extended period.
Key open source databases for IoT adoption
The following are some of the top open source databases available for IoT based applications.
MongoDB: This is a powerful, flexible, open source, document-oriented and scalable NoSQL database. It supports features like indexes, range queries, sorting, aggregations, and geospatial indexes. It supports JSON to store and transmit information. JSON, being standard protocol, is a great advantage for both the web and the database. MongoDB supports a rich query language to support read and write operations (CRUD) as well as data aggregation, text search and geospatial queries.
As an example, Bosch has built its IoT suite on MongoDB.
InfluxDB: This time series database is designed to handle high write and query loads. It provides an SQL-like query language called InfluxQL for interacting with data. InfluxDB has no external dependencies and SQL-like queries are used for querying a data structure comprising measurements, series, and points. Each point consists of varied key-value pairs called field set and timestamp. Values can be 64-bit integers, 64-bit floating points, strings, and Booleans. Points are indexed by their time and tagset. InfluxDB stores data via HTTP, TCP and UDP. It has plugin support for other data ingestion protocols like Graphite and Open TSDB.
BigchainDB: This is an embedded database for IoT devices. The assets created are stored as JSON documents in BigchainDB. It helps to convert physical objects into blockchain services by building blockchain-specific hardware compatible with any IoT device. It helps to define permission for reading and writing rights into the IoT device. BigchainDB can be integrated with every IoT scenario where there is a need for immutability and tamper-proof storage of data assets, along with search and query capability with high throughput.
MySQL: This open source relational database management system brings data consistency, scalability, high performance, availability, and flexibility to IoT solutions by efficient collection of data from IoT devices. It helps in data transformation through annotation and aggregation making it easier to understand data better.
Based on the IoT solution requirement, if the data format is fixed then MySQL is the preferred database.
GridDB: This is a container data model extending the NoSQL key-value store, and representing data in the form of a collection referenced by keys. It provides high scalability, reliability, and availability.
The two types of containers in GridDB are:
- Collection container: General-purpose container
- TimeSeries container: Data associated with timestamp providing functions like data compression and data aggregation
- Redis: This is an in-memory open source database. It is a popular choice for IoT solutions as a hot database. It is widely used by IoT solutions for data ingest real-time analytics, messaging, caching, and many other use cases. It helps in edge computing involving deep learning, image recognition and other innovative computing requirements.
- CrateDB: CrateDB is an open source distributed SQL database management system that fully integrates a searchable document-oriented data store. The CrateDB platform provides the distributed SQL query engine for faster joins, aggregations, and ad-hoc queries. This highly scalable and available database supports various types of data.
- Cassandra: This is a highly scalable and distributed open source database for managing voluminous amounts of structured data across many commodity servers. It provides availability, linear scale performance, simplicity and easy distribution of data across multiple database servers. It supports strong data consistency across distributed architecture.
- Hadoop: This open source software platform for distributed storage and distributed processing of very large data sets on computer clusters is built from commodity hardware. It helps in driving analytics from all IoT data. It easily ingests data from multiple data sources and supports both batch as well as real-time data ingest from sensors using tools such as Apache Kafka and Apache Flume. It handles multiple IoT data types, structures, and schemas. It supports real-time processing and applications on streaming data. Hadoop is a flexible, scalable and secure database.
A number of leading organisations, including leading automotive manufacturers, utilities, industrial automation companies, insurers, healthcare organisations, and telecom and technology leaders are adopting Hadoop.
Benefits of using open source databases in IoT
Open source databases in IoT have the following advantages over proprietary databases:
- Easy upgrade to new technologies with open source DB
- Ability to connect with upcoming device protocols and backend applications.
- Lower overall software cost, ease of change in technology, and open source APIs for integration
- Flexibility and easy change in architecture of solutions that are centred on microservices
Flexibility to change with a change in the cloud service provider
IoT is one of the most significant sources of Big Data. This data is rendered useless without analytics power. It is very important to choose the right set of open source databases for an IoT solution, as there are so many available in the market.
We need to first analyse the business problem, arrive at a solution, break this solution into services, and understand the database needs of these services. This will help to narrow down the database choices.
Most IoT solutions can depend on a hot database for real-time data collection, processing, messaging and analytics. Cold databases are better suited to store historical data and gather business intelligence. This will make the architecture simple, lean and robust.
Disclaimer: The views expressed in this article are that of the author and HCL does not subscribe to the substance, veracity, or truthfulness of the said opinion.