Deploy Apache Airflow on Azure Kubernetes Service with Azure Database for PostgreSQL and Azure Cache for Redis
Apache Airflow is a powerful open source tool to manage and execute workflows, expressed as directed acyclic graphs of tasks. It is both extensible and scalable, making it suitable for many different use cases and workloads.
Bitnami's Apache Airflow Helm chart makes it quick and easy to deploy Apache Airflow on Kubernetes. This chart gives you a preconfigured Apache Airflow deployment that is up-to-date and compliant with current security best practices. It is also highly customizable, allowing you to (for example) integrate your Apache Airflow deployment with external services or scale out the solution with more nodes after deployment.
This guide gets you started with Bitnami's Apache Airflow Helm chart on Microsoft Azure, showing you how to deploy Apache Airflow on Azure Kubernetes Service (AKS) and connect it with Azure Database for PostgreSQL and Azure Cache for Redis to create a scalable, cloud-based Apache Airflow deployment.
Assumptions and prerequisites
This guide assumes that:
- You have provisioned an Azure Kubernetes Service cluster.
- You have the kubectl CLI and the Helm v3.x package manager installed and configured to work with your Kubernetes cluster. Learn how to install kubectl and Helm v3.x.
- You have a domain name and the ability to configure a DNS record for that domain name.
- You have access to the psql PostgreSQL client, either installed locally or via a Docker container like the Bitnami PostgreSQL Docker image. Learn more about psql.
Step 1: Create and configure an Azure Database for PostgreSQL service
The first step is to create an Azure Database for PostgreSQL service, as follows:
- Log in to the Microsoft Azure portal.
- Navigate to the Azure Database for PostgreSQL service page using the left navigation bar or the search field. Click the "Add" button to create a new service.
- Select the option for a "single server". Click "Create".
- On the service deployment page, enter a server name, administrator account username and administrator password. Modify the deployment location if required. Select the same deployment resource group as your AKS service. Click "Review + create".
- Review the details shown. Click "Create" to proceed.
A new Azure Database for PostgreSQL service will be created. This process may take a few minutes to complete. Once the service has been created, it will appear within the selected resource group. Select the newly-created service to be transferred to its detail page. Note the server host name and administrator username, as you will need this to interact further with the service.
It is now necessary to make some changes to the service's default configuration, to enable easier integration with both the Bitnami Apache Airflow Helm chart and external PostgreSQL client tools. Follow the steps below:
- From the service detail page, navigate to the "Settings -> Connection security" page.
- In the "Firewall rules" section:
- Set the "Allow access to Azure services" field to "Yes".
- Create a new firewall rule for the IP address of your psql client host/Docker host. This is a temporary rule only to enable you to connect to the database service and create a database and user account for Apache Airflow.
- Set the "Enforce SSL connection" field to "Disabled".
- Click "Save" to save the new configuration.
SSL access is disabled because at the time of writing, the Bitnami Apache Airflow Helm chart does not currently support SSL access to external PostgreSQL and Redis services.
You can now connect to the Azure Database for PostgreSQL service using the psql client and create a database and user for Apache Airflow.
Use the command below to initiate the connection, replacing the POSTGRES-HOST and POSTGRES-ADMIN-USER placeholders with the server name and administrator username obtained previously. When prompted for a password, enter the administrator password supplied at deployment-time.
psql "sslmode=disable host=POSTGRES-HOST user=POSTGRES-ADMIN-USER dbname=postgres"
Use the commands below at the psql client command prompt to create a new database and user account for Apache Airflow. Replace the POSTGRES-AIRFLOW-PASSWORD placeholder with a unique password for the new user account. Note this password as you will need it in Step 3.
CREATE DATABASE airflow; CREATE USER airflow_user WITH PASSWORD 'POSTGRES-AIRFLOW-PASSWORD'; GRANT ALL PRIVILEGES ON DATABASE airflow TO airflow_user; exit;
Delete the temporary firewall rule.
Step 2: Create and configure an Azure Cache for Redis service
Next, create an Azure Cache for Redis service, as follows:
- Log in to the Microsoft Azure portal (if you're not already logged in).
- Navigate to the Azure Cache for Redis service page using the left navigation bar or the search field. Click the "Add" button to create a new service.
- On the service deployment page, enter a DNS name for the service and check the box to unblock port 6379. Modify the deployment location and pricing tier if required. Select the same deployment resource group as your AKS service and your Azure Database for PostgreSQL service. Click "Create".
The Azure Cache for Redis service will be created. This process may take a few minutes to complete. Once the service has been created, it will appear within the selected resource group. Select the newly-created service to be transferred to its detail page. Navigate to the "Settings -> Access keys" section and note the primary access key, as you will need this in Step 3.
Step 3: Deploy Apache Airflow on Azure Kubernetes Service with Helm
You can now go ahead and deploy Apache Airflow on the AKS cluster. Follow these steps:
Add the Bitnami charts repository to Helm:
helm repo add bitnami https://charts.bitnami.com/bitnami
Deploy the Apache Airflow Helm chart using the command below. Replace the placeholders as explained below:
- Replace the DOMAIN placeholder with your domain name.
- Replace the POSTGRES-HOST placeholder with the host name of the Azure Database for PostgreSQL service (obtained in Step 1).
- Replace the POSTGRES-AIRFLOW-PASSWORD placeholder with the password assigned to the airflow_user user account (defined by you when creating the airflow database in Step 1).
- Replace the REDIS-HOST placeholder with the DNS name of the Azure Cache for Redis service (defined by you in Step 1 at deployment time).
- Replace the REDIS-KEY placeholder with the primary key for the Redis service (obtained in Step 2).
- Replace the AIRFLOW-PASSWORD placeholder with a unique password to access the Apache Airflow Web user interface.
helm install airflow-aks bitnami/airflow \ --set service.type=LoadBalancer \ --set airflow.loadExamples=true \ --set airflow.baseUrl=http://DOMAIN \ --set postgresql.enabled=false \ --set redis.enabled=false \ --set externalDatabase.host=POSTGRES-HOST \ --set externalDatabase.user=airflow_user@POSTGRES-HOST \ --set externalDatabase.database=airflow \ --set externalDatabase.password=POSTGRES-AIRFLOW-PASSWORD \ --set externalDatabase.port=5432 \ --set externalRedis.host=REDIS-HOST \ --set externalRedis.port=6379 \ --set externalRedis.password=REDIS-KEY \ --set airflow.auth.password=AIRFLOW-PASSWORD
Here is a brief explanation of the parameters supplied to the chart:
- The postgresql.enabled and redis.enabled parameters, when set to false, ensure that the chart does not create its own PostgreSQL and Redis service and instead uses external services.
- The service.type parameter makes the Apache Airflow service available at a public load balancer IP address.
- The airflow.auth.password parameter defines the password for the Apache Airflow Web control panel.
- The airflow.loadExamples parameter installs some example DAGs. If you already have custom DAGs, you can set this parameter to false and install your custom DAGs from the file system, a GitHub repository or a ConfigMap, as described in the chart documentation.
Review the complete list of parameters in the chart documentation.
Wait for the deployment to complete and obtain the public load balancer IP address using the command below:
kubectl get svc | grep airflow-aks
Update the DNS record for your domain name to point to the above load balancer IP address.
You should now be able to access the Apache Airflow Web control panel by browsing to http://DOMAIN:8080 and logging in with the username user and the password set in the AIRFLOW-PASSWORD placeholder. Here is an example of what you should see:
You can now proceed to use Apache Airflow to manage and execute your workflows.