Distributed Scheduling

arjun dhar
3 min readJan 25, 2024

Scheduling is a common practice required by multiple types of applications be it Big Data wrangling or any other ETL jobs. While event driven stream processing middleware's hog the lime light, lets be honest scheduled jobs are the bread and butter of any robust enterprise application.

What’s wrong with CRON or CRON-like?

Some of you maybe lucky to get away using a CRON job at a localhost level or an equivalent (Cron-like); within the application using a framework like Quartz or any library of choice. However, in an event driven or high availability environment local scheduling has risks of:

  • Single point of failure
  • Duplicate firing of tasks
  • Lack of coordination if scheduled tasks are not mutually exclusive and have any real-time or state dependency.

Hence with distributed architectures, comes the need for a distributed scheduler as a service (not library).

Which Distributed Scheduler to go for?

I typically like to state “There is no one size fits all”; however through our experience we found a clear winner, barring a few nuances. The winner as per our research is Apache Dolphin Scheduler. This robust, modern scheduler is built in Java and internally uses Quartz.

What are the scheduler options Dolphin was compared with?

In order or preference:

  1. [Apache Dolphin]: For production one still needs MySQL or Postgres but local testing it can use an in memory DB. Uses ZooKeeper to elect and coordinate.
  2. [Apache Airflow scheduler]
  3. [Apache Oozie]: Oozie is a workflow scheduler system to manage Apache Hadoop jobs, that uses MySQL, PostGres or other databases. Uses Zookeeper to elect and coordinate.
  4. [Balllista]: A Distributed Scheduler for Apache Arrow.

You can check independent verifiable notes @ A Brief Comparison of Apache Dolphin Scheduler With Other Alternatives.

DISCLAIMER: Our own findings are not a reflection of other blogs, but our own independent findings from due diligence.

What makes Dolphin the better choice?

A lot of the other schedulers support the bigger framework. Example: Airflow scheduler is designed for Airflow, however can be used independently but that’s not its primary role. Oozie just has a tough configuration and setup and designed to specifically work in Hadoop ecosystems. However can still list the points we found:

  • Strong focus on UI and API proxies to manage the scheduler and monitor the jobs; including split DAG monitor task status.
  • The scheduler architecture minimizes dependencies on technologies that an organization may not want to adopt, including database choices for persistent task meta-data and ensuring that in flight data does not need to always go via the database. Some schedulers we found the DB to be the bottleneck for everything.
  • Clean docker based modern distributed architecture. Dolphin clearly separates API server, Master node, Worker Nodes.
  • Clear API for creating tasks, updating tasks, event callbacks to the main application (supporting various protocols)
  • Strong support for programmatic interaction via API or via language native options like PyDolphinScheduler.
  • .. and more which we are still discovering.

Sample implementation and designs

Usage and internals of Dolphin Scheduler

References

--

--

arjun dhar

Software development enthusiast since I was 8 yrs old. Love communicating on anything regarding innovation, community development … ∞