azure-templatesapache-airflow

Synchronize DAGs with a remote Git repository

The default DAGs directory is located at /opt/bitnami/airflow/dags. This directory is a shared filesystem accessible by all the instances of the deployment and is used to synchronize tasks.

To use DAG files from a Git repository and synchronize them automatically, follow these steps:

  • Clean the default DAGs directory in order to use a Git repository with the Python files for the DAGs. Git only allows cloning files into an existing directory if the directory is empty.

    $ cd /opt/bitnami/airflow/dags
    $ rm -rf *
    
  • Install Git and clone the repository with the DAG files:

    $ sudo apt-get update && sudo apt-get install git
    $ cd /opt/bitnami/airflow/dags
    $ git clone URL .
    
  • Make the DAG files available in the default directory for DAGS at /opt/bitnami/airflow/dags. After a few minutes, the Apache Airflow scheduler will automatically detect them and make them available in the Apache Airflow dashboard, from where they can be enabled.

    Apache Airflow

  • After completing the above task, you will probably also want to pull the latest DAG files and changes on an ongoing basis. A simple way to do this is use the system cron daemon to pull changes every 5 minutes. To do this, open the cron editor and add the line below.

    $ crontab -e
    
    * */5 * * * cd /opt/bitnami/airflow/dags && git pull
    

Learn more about building your own DAGs in the Apache Airflow documentation.

As an automated alternative to the explanation above, you can specify the Git repository when deploying Airflow:

DAGs git repository

IMPORTANT: Airflow will not create the shared filesystem if you specify a Git repository. Instead, it will clone the DAG files to each of the nodes, and sync them periodically with the remote repository.

Last modification May 28, 2019