Kestra, An Open Source Platform For Orchestration And Scheduling

0
914

Kestra, an open source orchestration and scheduling software, aids developers in the creation, execution, scheduling, and monitoring of complicated pipelines. It is based on popular tools such as Apache Kafka and ElasticSearch. The Kafka architecture allows for scalability: every worker in the Kestra cluster is a Kafka consumer, and the state of a workflow’s execution is handled by a Kafka Streams executor. ElasticSearch is a database that allows you to see, search, and aggregate all of your data.

The platform is built around the concept of a process, which is referred to as Flow in Kestra. It’s a list of tasks described in a yaml-based description language. It may be used to describe simple processes, but it also supports more advanced scenarios like dynamic tasks and flow dependencies.

Flows can be triggered by events such as the outcomes of previous flows, the discovery of files in Google Cloud Storage, or the results of a SQL query. A cron expression can also be used to schedule flows at regular intervals. Furthermore, Kestra provides an API that can be used to initiate a workflow from any application or straight from the Web UI.

In reality, Kestra has a robust online interface that allows developers to edit, run, and monitor flows in real time. Kestra can be used as a data orchestrator to handle complex workflows such as moving, transforming, and loading large datasets (ETL or ELT); as a distributed crontab to schedule work across multiple workers and monitor them all; or as an events-driven workflow to react to external events such as API calls.

It may be installed everywhere, including Kubernetes, Cloud Compute, Docker, and even on-premises. Additional features, such as interaction with Amazon S3, Apache Avro, Google BigQuery, and MongoDB, can be added using plugins thanks to its pluggable architecture.

The Kestra platform is comparable to Apache Airflow, except that Airflow uses Python workflows instead of yaml. The most recent release improves overall performance by lowering CPU consumption and latency, as well as introducing a new JDBC plugin that supports bulk queries. Since the team announced the first public release in February 2022, the programme is still relatively fresh. The most recent version, 0.4.2, is accessible on Github, but it is already being used in production by Leroy Merlin, one of Europe’s largest retailers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here