Kestra: A Scalable Open-Source Orchestration and Scheduling Platform – InfoQ.com

Kestra, a new open-source orchestration and scheduling platform, helps developers to build, run, schedule, and monitor complex pipelines.

It is built upon well-known tools like Apache Kafka and ElasticSearch. The Kafka architecture provides scalability: every worker in Kestra cluster is implemented as a Kafka consumer and the state of the execution of a workflow is managed by an executor implemented with Kafka Streams. ElasticSearch is used as a database that allows displaying, searching and aggregating all the data.

The concept of a workflow, called Flow in Kestra, is at the heart of the platform. It is a list of tasks defined with a descriptive language based on yaml. It can be used to describe simple workflows but it allows more complex scenarios such as dynamic tasks and flow dependencies.

Flows can be based on events such as results of other flows, detection of files from Google Cloud Storage or results of a SQL query. Flows can also be scheduled at regular intervals based on a cron expression. Furthermore, Kestra exposes an API to trigger a workflow from any application or simply start it directly from the Web UI.

Kestra, in fact, provides a rich web interface that allows developers to edit, run, and monitor flows in real-time.

A Kestra web interface is shown below:

Kestra can be employed as a data orchestrator: to handle complex workflow, moving, transforming and loading large datasets (ETL or ELT); as distributed crontab to schedule work on multiples workers and monitor all these processes; or as events driven workflow to react to external events like API calls.

It can be deployed anywhere, for example, on Kubernetes, Cloud Compute, Docker or even on-premises. And thanks to its pluggable architecture, additional features can be added with plugins such as integration with Amazon S3, Apache Avro, Google BigQuery and MongoDB.

The Kestra platform is similar to Apache Airflow, but the latter relies on workflows written in Python instead of yaml.

An example of a flow written in yaml is shown below:

The latest release improved the overall performance by reducing CPU usage and latency and introduced a new JDBC plugin that allows for bulk queries.

The software is still relatively new since the team announced the first public release in February 2022. The latest version, 0.4.2, is available on the Github repository, but it is already used in production by Leroy Merlin, one of the retail leaders in Europe.

Originally posted here:

Kestra: A Scalable Open-Source Orchestration and Scheduling Platform - InfoQ.com

Related Posts
This entry was posted in $1$s. Bookmark the permalink.