Airflow Prometheus (0.4.2)

PyPI GitHub commit activity

This is an Airflow extension that adds support for generating Prometheus metrics. Package is extension of awesome airflow-promtheus-exporter project by Robinhood.

We extended the project, improved the code and added new features to enable better monitoring of your Airflow workloads. :rocket:

Installation

To install this package please do:

  $ python3 -m pip install "airflow-prometheus==0.4.2"

Or if you are using Poetry to run Apache Airflow:

  $ poetry add apache-airflow@latest
  $ poetry add "airflow-prometheus@0.4.2"

What this package provides?

  • Support for exporting Prometheus metrics
  • Support for exporting additional data into Grafana

Prometheus metrics

Metrics are exported on the /metrics endpoint:

PropertyLabelsDescriptions
dag_bag_statspropertyStatistics for the dag bag:
* property=loaded_dags_count - number of loaded DAGs
airflow_dag_statusdag_id, owner, statusShows the number of dag starts with this status
airflow_dag_run_durationdag_idDuration of successful dag_runs in seconds
airflow_dag_scheduler_delaydag_idAirflow DAG scheduling delay
airflow_task_statusdag_id, task_id, operator_name, owner, stateShows the number of task instances with particular status
airflow_task_durationaggregation, operator_name, task_id, dag_idDurations of tasks in seconds by operator:
* aggregation=max
* aggregation=min
* aggregation=avg
airflow_task_max_triesoperator_name, task_id, dag_idMax tries for tasks
airflow_last_dag_runstatus, task_id, dag_idTasks status for latest dag run
airflow_successful_task_durationtask_id, dag_id, execution_dateDuration of successful tasks in seconds
airflow_task_fail_countdag_id, task_idCount of failed tasks
airflow_xcom_parameterdag_id, task_idAirflow Xcom Parameter
airflow_task_scheduler_delayqueueAirflow Task scheduling delay
airflow_num_queued_tasks-Airflow Number of Queued Tasks

JSON metadata

You can use SimpleJson datasource to display states of DAGs. Install the plugin with the following command or via grafana.com:

    $ sudo grafana-cli plugins install grafana-simple-json-datasource

Now let’s create a json datasource and point it to /metrics/json/ (trailing slash is important and you may need to check skip TLS verify in order for it to work):

Now add ad-hoc variable:

Now you can see ad-hoc filter at the top of the dashboard. You can select DAGs with that filter. Now we need to add some visualizations.

We add new panel and select newly created json datasource. As metric we select dags and for visualization type: NodeGraph

Node graph will show the dependencies between tasks and their status for the latests instance of the DAG. DAGs can be selected with the ad-hoc variable you created. You can remove that ad-hoc filter to show all DAGs, but it’s not recommended as NodeGraph panel is fairly bad at zooming or paning the diagram.

Example dashboard

The example dashboard is available here: example/dashboard.json