Schedule A Dag In Airflow To Run For Every 5 Minutes , Starting From Today I.e., 2019-12-18
Solution 1:
The datetime object you are passing to Airflow isn't timezone aware. Airflow uses UTC internally. The naive datetime object you are passing to Airflow may not be aligned with the scheduler's notion of time and this could be why the DAG isn't being scheduled to run midnight "today" (2019-12-18).
Instead of passing a naive datetime object like this:
'start_date': dt.datetime(2019, 12, 18, 10, 00, 00)
Try using pendulum to make your DAG timezone aware:
import pendulum
...
'start_date': pendulum.datetime(year=2019, month=12, day=10).astimezone('YOUR TIMEZONE'), # See list of tz database time zones here -> https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
The docs (https://airflow.apache.org/docs/stable/timezone.html) are quite useful got getting tips on how to handle datetimes in Airflow.
As for your other question on run frequency ... DAG runs are designed to do "Catchup" on all the intervals between your start and end date by default. To disable this behavior you will need to add catchup=False when instantiating your DAG.
From the Airflow docs
Backfill and Catchup
An Airflow DAG with a start_date, possibly an end_date, and a schedule_interval defines a series of intervals which the scheduler turn into individual Dag Runs and execute. A key capability of Airflow is that these DAG Runs are atomic, idempotent items, and the scheduler, by default, will examine the lifetime of the DAG (from start to end/now, one interval at a time) and kick off a DAG Run for any interval that has not been run (or has been cleared). This concept is called Catchup.
If your DAG is written to handle its own catchup (IE not limited to the interval, but instead to “Now” for instance.), then you will want to turn catchup off (Either on the DAG itself with dag.catchup = False) or by default at the configuration file level with catchup_by_default = False. What this will do, is to instruct the scheduler to only create a DAG Run for the most current instance of the DAG interval series.
I'd suggest going over the two pages I linked to get a better intuition of the basic Airflow concepts.
Post a Comment for "Schedule A Dag In Airflow To Run For Every 5 Minutes , Starting From Today I.e., 2019-12-18"