Apache Airflow is a powerful open-source platform used for orchestrating, scheduling, and monitoring complex workflows. It is an essential tool for managing data pipelines, ETL processes, and task automation. While it is commonly used on Unix-based systems, such as Linux and macOS, you can also set up Airflow on a Windows machine to harness its capabilities. In this article, we will walk you through the process of installing Apache Airflow on a Windows system.
Before installing Apache Airflow on Windows, ensure that you have the following prerequisites in place:
- Python: Install Python 3.7 or later. You can download it from Python’s official website.
- Pip: Make sure you have pip, the Python package manager, installed.
- Windows Subsystem for Linux (WSL): Apache Airflow works best in a Unix-like environment. Enable WSL on your Windows system by following Microsoft’s official documentation.
- Virtual Environment (Optional): It’s a good practice to create a virtual environment to isolate your Airflow installation.
Installing Apache Airflow on Windows with WSL
Step 1: Activate Ubuntu or Other Linux System
- If you haven’t already, make sure you’ve enabled Windows Subsystem for Linux (WSL) and installed a Linux distribution such as Ubuntu from the Microsoft Store. Launch the Linux terminal to proceed.
Step 2: Create a Virtual Environment and Activate It
- To create a virtual environment, you can use Python’s built-in
venvmodule. Navigate to your desired project directory and execute these commands:
#Create virtual environment
python3 -m venv airflow-env
#Activate the virtual environment
Step 3: Set the $AIRFLOW_HOME Parameter
- Set the
$AIRFLOW_HOMEenvironment variable to specify where Airflow will store its configuration and metadata files. Add this line to your shell profile file (e.g.,
#Type the following
#Press Ctrl+S and Ctrl+X to exit the editor
Step 4: Install Apache Airflow
pipto install Apache Airflow
pip install apache-airflow
Step 5: Initialize the Database
- Initialize the Airflow database, which is essential for storing metadata about your workflows:
airflow db init
Step 6: Create an Admin User
- To access the Airflow web interface, create an admin user with the following command:
airflow users create \
--username admin \
--password admin \
--firstname <YourFirstName> \
--lastname <YourLastName> \
--role Admin \
Step 7: Run the Web Server and Scheduler
- Start the Airflow web server and scheduler components in separate terminals:
airflow webserver --port 8080
The Airflow web server will be accessible at
http://localhost:8080 in your web browser and log in using the above-created User.
1. Disable Example DAGs:
In the Airflow configuration file (
load_examples = False to prevent the automatic loading of example DAGs during initialization. This can help keep your Airflow environment clean and focused on your specific use cases.
load_examples = False
2. Configure the DAGs Directory:
In the Airflow configuration file (
airflow.cfg), specify the location where your DAG files will reside. This allows you to organize your DAGs in a specific directory.
dags_folder = ~/airflow/dags
3. Use a Reverse Proxy :
If you plan to deploy Airflow in a production environment, consider setting up a reverse proxy like Nginx or Apache in front of the Airflow web server. This adds an extra layer of security and can help with performance and scalability.
4. Monitoring and Logging:
Configure logging and monitoring solutions to keep an eye on your Airflow instance. Tools like Prometheus, Grafana, or third-party services can provide insights into your workflow performance.
5. Backup and Recovery Plan:
Regularly backup your Airflow metadata database to prevent data loss in case of system failures. Implement a recovery plan to restore your Airflow instance quickly.
If you found the article to be helpful, you can buy me a coffee here:
Buy Me A Coffee.