-
Notifications
You must be signed in to change notification settings - Fork 220
docs: pipeline_task_performance automated monitor #2238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: docs
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,10 +1,14 @@ | ||
| <CardGroup cols={2}> | ||
| <Card title="Automated Freshness" href="/features/anomaly-detection/automated-freshness"> | ||
| Monitors updates to tables and how frequently a table is updated, | ||
| and fails if there is an unexpected delay. | ||
| Monitors updates to tables and how frequently a table is updated, | ||
| and fails if there is an unexpected delay. | ||
| </Card> | ||
| <Card title="Automated Volume" href="/features/anomaly-detection/automated-volume"> | ||
| Monitors how many rows were added or removed to a table on each update, | ||
| and fails if there is an unexpected drop or spike in rows. | ||
| and fails if there is an unexpected drop or spike in rows. | ||
| </Card> | ||
| <Card title="Pipeline Task Performance" href="/features/anomaly-detection/automated-pipeline-task-performance"> | ||
| Monitors execution duration of dbt models, seeds, and snapshots, and fails | ||
| on unexpected slowdowns or SLA breaches. | ||
| </Card> | ||
| </CardGroup> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| --- | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure we need this page |
||
| title: Performance Monitoring (dbt-package) | ||
| sidebarTitle: "Performance monitoring" | ||
| --- | ||
|
|
||
| <Info> | ||
| **For Elementary Cloud users:** the recommended approach for pipeline performance monitoring is the automated [`pipeline_task_performance`](/features/anomaly-detection/automated-pipeline-task-performance) monitor — see [Performance Alerts](/features/performance-monitoring/performance-alerts). This page documents the manual dbt-test alternative using the Elementary dbt package. | ||
| </Info> | ||
|
|
||
| ## Overview | ||
|
|
||
| Elementary's dbt package exposes the `model_run_results` view, which contains run results for every dbt model enriched with model metadata. You can write dbt singular tests against this view to assert performance SLAs directly in your dbt project. | ||
|
|
||
| This approach requires no additional infrastructure — it runs as part of your existing dbt test invocations. | ||
|
|
||
| ## `model_run_results` schema | ||
|
|
||
| `model_run_results` is a view that joins `dbt_run_results` with `dbt_models`. Key columns: | ||
|
|
||
| | Column | Type | Description | | ||
| |---|---|---| | ||
| | `unique_id` | string | dbt node unique ID | | ||
| | `alias` | string | Model alias (table name) | | ||
| | `schema_name` | string | Schema the model is materialized in | | ||
| | `execution_time` | float | Execution duration in seconds | | ||
| | `status` | string | Run status (`success`, `error`, etc.) | | ||
| | `invocation_id` | string | dbt invocation ID | | ||
| | `generated_at` | timestamp | When the run result was recorded | | ||
|
|
||
| Full schema is documented in [Elementary package models](/dbt/package-models#model_run_results). | ||
|
|
||
| ## Writing a performance SLA test | ||
|
|
||
| Create a singular test that queries `model_run_results` and returns rows where execution time exceeds your threshold. A dbt singular test fails when any rows are returned. | ||
|
|
||
| ```sql tests/assert_model_performance_sla.sql | ||
| -- Fail if any model exceeded its execution time SLA in the last run | ||
| with latest_run as ( | ||
| select max(generated_at) as latest_ts | ||
| from {{ ref('model_run_results') }} | ||
| ), | ||
|
|
||
| violations as ( | ||
| select | ||
| r.unique_id, | ||
| r.alias, | ||
| r.execution_time, | ||
| r.generated_at | ||
| from {{ ref('model_run_results') }} r | ||
| cross join latest_run lr | ||
| where r.generated_at >= lr.latest_ts | ||
| and r.status = 'success' | ||
| and r.execution_time > 600 -- 10 minute SLA | ||
| ) | ||
|
|
||
| select * from violations | ||
| ``` | ||
|
|
||
| Adjust the threshold (`600`) per model by parameterising the test or writing per-model singular tests. | ||
|
|
||
| ## Per-model SLA tests | ||
|
|
||
| To enforce different thresholds per model, write a separate singular test for each: | ||
|
|
||
| ```sql tests/assert_orders_model_sla.sql | ||
| with latest_run as ( | ||
| select max(generated_at) as latest_ts | ||
| from {{ ref('model_run_results') }} | ||
| ) | ||
|
|
||
| select | ||
| r.unique_id, | ||
| r.execution_time | ||
| from {{ ref('model_run_results') }} r | ||
| cross join latest_run lr | ||
| where r.generated_at >= lr.latest_ts | ||
| and r.alias = 'orders' | ||
| and r.execution_time > 300 -- 5 minute SLA for orders model | ||
| ``` | ||
|
|
||
| ## Alerts | ||
|
|
||
| Failures from these singular tests appear in Elementary's alert pipeline alongside other dbt test failures. Use [alert rules](/features/alerts-and-incidents/alert-rules) and tags/owners to route them to the right channel. | ||
|
|
||
| Add a tag to the test to make routing easier: | ||
|
|
||
| ```yaml tests/schema.yml | ||
| version: 2 | ||
|
|
||
| singular_tests: | ||
| - name: assert_orders_model_sla | ||
| tags: | ||
| - performance_sla | ||
| config: | ||
| severity: warn | ||
| ``` | ||
|
|
||
| ## Limitations | ||
|
|
||
| - Runs only as part of `dbt test` invocations — not evaluated in real time after each model run. | ||
| - Requires manual threshold configuration per model. | ||
| - No anomaly detection or baseline learning — purely static thresholds. | ||
|
|
||
| For automated, baseline-aware performance monitoring without manual configuration, use the [pipeline_task_performance automated monitor](/features/performance-monitoring/performance-alerts) in Elementary Cloud. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| --- | ||
| title: Pipeline Task Performance Monitor | ||
| sidebarTitle: "Pipeline task performance" | ||
| --- | ||
|
|
||
| <Snippet file="cloud/cloud-feature-tag.mdx" /> | ||
|
|
||
| The pipeline task performance monitor tracks the **execution duration** of dbt models, seeds, and snapshots on each run, and alerts when a node takes significantly longer than expected — or exceeds a fixed SLA threshold. | ||
|
|
||
| This is an automated monitor: Elementary creates it for every dbt node in your project, with no configuration required to get started. | ||
|
|
||
| ## Monitoring modes | ||
|
|
||
| ### Anomaly detection (default) | ||
|
|
||
| Elementary uses a z-score seasonal model to learn normal execution duration for each node. The model accounts for time-of-day and day-of-week patterns, so a model that reliably runs longer on Monday mornings will not be flagged during that window. | ||
|
|
||
| No configuration is required. Elementary builds the baseline from your pipeline run history automatically. | ||
|
|
||
| ### Static SLA | ||
|
|
||
| Set a fixed duration threshold. The monitor fails whenever the observed execution time exceeds that threshold, regardless of historical norms. | ||
|
|
||
| Use static SLA when you have a hard operational or contractual limit (e.g., "this model must finish within 10 minutes"). | ||
|
|
||
| ## Understand the monitor result | ||
|
|
||
| The result shows the execution duration of the dbt node for the latest run alongside the historical baseline. | ||
|
|
||
| - **Anomaly detection** — data points outside the expected range (grey band) are flagged. The expected range is derived from the seasonal model trained on historical durations. | ||
| - **Static SLA** — a horizontal line marks the threshold. Any run that crosses it is a failure. | ||
|
|
||
| Use the **Anomaly settings** button to adjust mode, sensitivity, or training period after the monitor is created. | ||
|
|
||
| ## Anomaly settings | ||
|
|
||
| | Setting | Description | Default | | ||
| |---|---|---| | ||
| | `mode` | `"anomaly"` or `"static"` | `"anomaly"` | | ||
| | `sensitivity` | `"low"`, `"medium"`, or `"high"` (anomaly mode only) | `"medium"` | | ||
| | `training_period` | Days of history used to train the model (anomaly mode only) | `14` | | ||
| | `anomaly_direction` | `"spike"` (duration longer than expected), `"drop"`, or `"both"` (anomaly mode only) | `"spike"` | | ||
| | `excluded_time_ranges` | Time ranges excluded from training and detection (anomaly mode only) | `[]` | | ||
| | `fixed_threshold` | Max allowed execution duration in seconds (static SLA mode only) | — | | ||
|
|
||
| <Snippet file="cloud/features/anomaly-detection/all-anomalies-configuration.mdx" /> | ||
|
|
||
| ## Alerts | ||
|
|
||
| Failures raise alerts in the **`pipeline_performance`** alert category, routed through [alert rules](/features/alerts-and-incidents/alert-rules) to all configured integrations (Slack, PagerDuty, Teams, OpsGenie, email). | ||
|
|
||
| By default, alerts are not active. To enable them, go to **Setup → Alert Rules** and add **Pipeline performance** to the relevant rule's alert categories. | ||
|
|
||
| ## Related | ||
|
|
||
| - [Performance Alerts](/features/performance-monitoring/performance-alerts) — setup guide and API reference for creating `pipeline_task_performance` monitors | ||
| - [Automated monitors overview](/features/anomaly-detection/automated-monitors) — how automated monitors work across freshness, volume, and performance |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| --- | ||
| title: Pipeline Performance Alerts | ||
| sidebarTitle: "Performance alerts" | ||
| --- | ||
|
|
||
| <Snippet file="cloud/cloud-feature-tag.mdx" /> | ||
|
|
||
| Elementary Cloud automatically monitors the execution duration of your dbt models, seeds, and snapshots — detecting unexpected slowdowns before they affect your data consumers. | ||
|
|
||
| The `pipeline_task_performance` monitor runs after each dbt job execution and compares the observed duration against a learned baseline or a fixed SLA threshold. | ||
|
|
||
| ## Monitoring modes | ||
|
|
||
| ### Anomaly detection (default) | ||
|
|
||
| The default mode uses a z-score seasonal model to learn normal execution duration for each dbt node. No configuration is required — Elementary automatically builds a baseline from your pipeline history. | ||
|
|
||
| The model accounts for time-of-day and day-of-week seasonality. For example, a model that normally runs longer on Monday morning will not be flagged during that window. | ||
|
|
||
| ### Static SLA | ||
|
|
||
| Static SLA mode lets you define a fixed duration threshold. The monitor fails whenever the observed execution time exceeds that threshold, regardless of historical behavior. | ||
|
|
||
| Use static SLA when you have a contractual or operational hard limit (e.g., "this model must complete within 10 minutes"). | ||
|
|
||
| ## Creating a monitor | ||
|
|
||
| ### Add Test wizard | ||
|
|
||
| <Note> | ||
| The Add Test wizard supports **models only**. To add a `pipeline_task_performance` monitor for seeds or snapshots, use the [API](#api). | ||
| </Note> | ||
|
|
||
| 1. Open the model in Elementary Cloud. | ||
| 2. Click **Add Test** → select **Pipeline Task Performance**. | ||
| 3. Choose the mode: **Anomaly detection** or **Static SLA**. | ||
| 4. Configure parameters (see [Config reference](#config-reference) below). | ||
| 5. Save. | ||
|
|
||
| ### API | ||
|
|
||
| ```http | ||
| POST /{env_id}/tests | ||
| ``` | ||
|
|
||
| ```json | ||
| { | ||
| "node_id": "<dbt_node_id>", | ||
| "test_type": "pipeline_task_performance", | ||
| "params": { | ||
| "mode": "anomaly", | ||
| "sensitivity": "medium" | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| For static SLA: | ||
|
|
||
| ```json | ||
| { | ||
| "node_id": "<dbt_node_id>", | ||
| "test_type": "pipeline_task_performance", | ||
| "params": { | ||
| "mode": "static", | ||
| "fixed_threshold": 600 | ||
| } | ||
| } | ||
| ``` | ||
|
Comment on lines
+40
to
+68
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove it |
||
|
|
||
| ## Config reference | ||
|
|
||
| ### Anomaly detection mode | ||
|
|
||
| | Parameter | Description | Default | | ||
| |---|---|---| | ||
| | `mode` | `"anomaly"` | `"anomaly"` | | ||
| | `sensitivity` | Detection sensitivity: `"low"`, `"medium"`, `"high"` | `"medium"` | | ||
| | `training_period` | Number of days used to build the baseline | `14` | | ||
| | `anomaly_direction` | `"spike"` (duration longer than expected), `"drop"`, or `"both"` | `"spike"` | | ||
| | `excluded_time_ranges` | List of time ranges to exclude from training and detection | `[]` | | ||
|
|
||
| ### Static SLA mode | ||
|
|
||
| | Parameter | Description | Default | | ||
| |---|---|---| | ||
| | `mode` | `"static"` | — | | ||
| | `fixed_threshold` | Maximum allowed execution duration in seconds | Required | | ||
|
|
||
| ## Alerts and incidents | ||
|
|
||
| Failures from `pipeline_task_performance` raise alerts in the **`pipeline_performance`** alert category. | ||
|
|
||
| Alerts are routed through [alert rules](/features/alerts-and-incidents/alert-rules) and delivered to all configured integrations: | ||
| - [Slack](/cloud/integrations/alerts/slack) | ||
| - [Microsoft Teams](/cloud/integrations/alerts/ms-teams) | ||
| - [PagerDuty](/cloud/integrations/alerts/pagerduty) | ||
| - [OpsGenie](/cloud/integrations/alerts/opsgenie) | ||
|
|
||
| Incidents are created and managed through the standard [incidents workflow](/features/alerts-and-incidents/incidents). | ||
|
|
||
| ### Activating alerts | ||
|
|
||
| By default, `pipeline_task_performance` alerts are not active. To enable them: | ||
|
|
||
| 1. Go to **Setup → Alert Rules**. | ||
| 2. Edit the default rule (or create a new one) and add **Pipeline performance** to the alert categories. | ||
|
|
||
| <Frame> | ||
| <div className="dark:bg-white rounded-md p-1"> | ||
| <img | ||
| src="https://res.cloudinary.com/diuctyblm/image/upload/v1725443568/alert_categories_ikzjqr.png" | ||
| alt="Alert categories in alert rules" | ||
| width="420" | ||
| /> | ||
| </div> | ||
|
Comment on lines
+110
to
+116
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| </Frame> | ||
|
|
||
| ## Related | ||
|
|
||
| - [Pipeline task performance monitor](/features/anomaly-detection/automated-pipeline-task-performance) — full monitor reference under Automated Monitors | ||
| - [Performance monitoring dashboard](/features/performance-monitoring/performance-monitoring) — view execution trends in the UI | ||
| - [Manual performance monitoring (dbt-package)](/data-tests/performance-monitoring) — manual dbt-test alternative using `model_run_results` | ||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.