azure-templatesapache-airflow

Synchronize DAGs with a remote Git repository

The default DAGs directory is located at /opt/bitnami/airflow/dags. This directory is a shared filesystem accessible by all the instances of the deployment and is used to synchronize tasks.

To use DAG files from a Git repository and synchronize them automatically, follow these steps:

  • Clean the default DAGs directory in order to use a Git repository with the Python files for the DAGs. Git only allows cloning files into an existing directory if the directory is empty.

    $ cd /opt/bitnami/airflow/dags
    $ rm -rf *
    
  • Install Git and clone the repository with the DAG files:

    $ sudo apt-get update && apt-get install git
    $ cd /opt/bitnami/airflow/dags
    $ git clone URL .
    
  • Make the DAG files available in the default directory for DAGS at /opt/bitnami/airflow/dags. After a few minutes, the Apache Airflow scheduler will automatically detect them and make them available in the Apache Airflow dashboard, from where they can be enabled.

    Apache Airflow

  • After completing the above task, you will probably also want to pull the latest DAG files and changes on an ongoing basis. A simple way to do this is use the system cron daemon to pull changes every 5 minutes. To do this, open the cron editor and add the line below.

    $ crontab -e
    
    * */5 * * * cd /opt/bitnami/airflow/dags && git pull
    

Learn more about building your own DAGs in the Apache Airflow documentation.

Last modification February 27, 2019