How to easily install Apache Airflow on Windows?

VivekR
3 min readSep 13, 2023
Airflow on Windows

Apache Airflow is a powerful open-source platform used for orchestrating, scheduling, and monitoring complex workflows. It is an essential tool for managing data pipelines, ETL processes, and task automation. While it is commonly used on Unix-based systems, such as Linux and macOS, you can also set up Airflow on a Windows machine to harness its capabilities. In this article, we will walk you through the process of installing Apache Airflow on a Windows system.

Prerequisites

Before installing Apache Airflow on Windows, ensure that you have the following prerequisites in place:

  1. Python: Install Python 3.7 or later. You can download it from Python’s official website.
  2. Pip: Make sure you have pip, the Python package manager, installed.
  3. Windows Subsystem for Linux (WSL): Apache Airflow works best in a Unix-like environment. Enable WSL on your Windows system by following Microsoft’s official documentation.
  4. Virtual Environment (Optional): It’s a good practice to create a virtual environment to isolate your Airflow installation.

Installing Apache Airflow on Windows with WSL

Step 1: Activate Ubuntu or Other Linux System

  • If you haven’t already, make sure you’ve enabled Windows Subsystem for Linux (WSL) and installed a Linux distribution such as Ubuntu from the Microsoft Store. Launch the Linux terminal to proceed.

Step 2: Create a Virtual Environment and Activate It

  • To create a virtual environment, you can use Python’s built-in venv module. Navigate to your desired project directory and execute these commands:
#Create virtual environment
python3 -m venv airflow-env

#Activate the virtual environment
source airflow-env/bin/activate

Step 3: Set the $AIRFLOW_HOME Parameter

  • Set the $AIRFLOW_HOME environment variable to specify where Airflow will store its configuration and metadata files. Add this line to your shell profile file (e.g., .bashrc or .zshrc)
nano ~/.bashrc

#Type the following
AIRFLOW_HOME=/c/Users/vjadhav/airflow

#Press Ctrl+S and Ctrl+X to exit the editor

Step 4: Install Apache Airflow

  • Use pip to install Apache Airflow
pip install apache-airflow

Step 5: Initialize the Database

  • Initialize the Airflow database, which is essential for storing metadata about your workflows:
airflow db init

Step 6: Create an Admin User

  • To access the Airflow web interface, create an admin user with the following command:
airflow users create \
--username admin \
--password admin \
--firstname <YourFirstName> \
--lastname <YourLastName> \
--role Admin \
--email admin@example.com

Step 7: Run the Web Server and Scheduler

  • Start the Airflow web server and scheduler components in separate terminals:
airflow webserver --port 8080
airflow scheduler

The Airflow web server will be accessible at http://localhost:8080 in your web browser and log in using the above-created User.

Additional Tips

1. Disable Example DAGs:

In the Airflow configuration file (airflow.cfg), set load_examples = False to prevent the automatic loading of example DAGs during initialization. This can help keep your Airflow environment clean and focused on your specific use cases.

load_examples = False

2. Configure the DAGs Directory:

In the Airflow configuration file (airflow.cfg), specify the location where your DAG files will reside. This allows you to organize your DAGs in a specific directory.

dags_folder = ~/airflow/dags

3. Use a Reverse Proxy :

If you plan to deploy Airflow in a production environment, consider setting up a reverse proxy like Nginx or Apache in front of the Airflow web server. This adds an extra layer of security and can help with performance and scalability.

4. Monitoring and Logging:

Configure logging and monitoring solutions to keep an eye on your Airflow instance. Tools like Prometheus, Grafana, or third-party services can provide insights into your workflow performance.

5. Backup and Recovery Plan:

Regularly backup your Airflow metadata database to prevent data loss in case of system failures. Implement a recovery plan to restore your Airflow instance quickly.

If you found the article to be helpful, you can buy me a coffee here:
Buy Me A Coffee.

--

--

VivekR

Data Engineer, Big Data Enthusiast and Automation using Python