azure-templatesapache-airflow

Understand the default Apache Airflow configuration

This Bitnami Multi-Tier Solution uses two virtual machines for the application front-end and scheduler, plus a configurable number of worker virtual machines. It also uses Azure Database for PostgreSQL and Azure Cache for Redis to store application data and queue tasks.

  • Webserver instance: This instance hosts the frontend of the Apache Airflow application.
  • Scheduler instance: The Apache Airflow scheduler triggers tasks and provides tools to monitor task progress.
  • Worker instances: Apache Airflow workers listen to, and process, queues containing workflow tasks.
  • Azure Database for PostgreSQL: The PostgreSQL database service stores application data. Read more about Azure Database for PostgreSQL.
  • Azure Cache for Redis: This service is used as a queuing system for Apache Airflow tasks. Read more about Azure Cache for Redis.

Apache Airflow version

In order to see which application version your system is running, execute the following command:

$ airflow version

Apache Airflow DAGs directory

The default DAGs directory is located at /opt/bitnami/airflow/dags. This folder is a shared filesystem accessible by all the instances of the deployment and is used to synchronize tasks.

Apache Airflow configuration file

The Apache Airflow configuration file is located at /opt/bitnami/airflow/airflow.cfg.

The official Apache Airflow documentation has more details about how to add settings to this file.

Apache Airflow ports

The webserver is listening on port 8080. It is also remotely accesible through port 80 over the public IP address of the virtual machine.

There are other ports listening for internal communication between the workers but those ports are not remotely accessible.

Apache Airflow log files

The main Apache Airflow log files are at the /opt/bitnami/airflow/logs/ directory in each virtual machine.

Last modification February 21, 2019