Twitter has released the code of Heron, a stream-processing engine that succeeded Apache Storm. Licensed under Apache v2.0, the distributed stream computation system has powered real-time analytics at Twitter for over past two years.
Heron was born out of challenges that Twitter faced with increases in volume, diversity of data and number of use cases. The internal team decided to develop build a streaming system that was easy to scale, debug, deploy and manage after encountering numerous problems with the previously deployed Storm.
“Heron represents a fundamental change in streaming architecture from a thread-based system to a process-based system,” said Karthik Ramasamy, engineering manager at Twiter, in a blog post.
Some startups have deployed Heron on their projects. Even Microsoft is using a modified version of Heron that runs on top of YARN cluster management component of Hadoop. Moreover, the technology supports use cases ranging from extract transform load (ETL) to advertising bidding and even augmented reality (AI) up to certain level.
Heron is written in languages like Java, C++ and Python to deliver efficiency, maintainability and easier adoption from the open source community. Further, the system supports modern cluster environments such as Apache Mesos, Aurora and REEF among others.
In addition to Heron, Twitter has open sourced couple more major projects like Diffy, Scalding and Summingbird. All these projects are live on GitHub.