feat: add column_value_anomalies test for row-level outlier detection by devin-ai-integration[bot] · Pull Request #982 · elementary-data/dbt-data-reliability

devin-ai-integration · 2026-04-12T11:05:41Z

Summary

Adds a new elementary.column_value_anomalies dbt test that performs row-level outlier detection on raw column values using z-score against historical distribution. This is fundamentally different from the existing column_anomalies test, which operates on aggregated metrics (min, max, avg, null_count) per time bucket.

How it works:

Splits data into training and detection periods using timestamp_column
Computes baseline statistics (mean + stddev) from training period values (optionally per day-of-week when seasonality is configured)
For each row in the detection period, computes z-score: (value - mean) / stddev
Flags rows where z-score exceeds anomaly_sensitivity threshold

New files:

macros/edr/tests/test_column_value_anomalies.sql — the test macro
integration_tests/tests/test_column_value_anomalies.py — integration tests (TDD-style, written first)

The test reuses the existing get_anomalies_test_configuration, get_read_anomaly_scores_query, store_anomaly_test_results, and get_anomaly_query infrastructure. It skips the intermediate "metrics" table and directly produces an "anomaly_scores" table since there are no bucketed aggregate metrics.

Related: docs PR at elementary-data/elementary#2183.

Review & Testing Checklist for Human

Run integration tests against at least Postgres to verify the generated SQL is valid and the test behavior matches expectations. The tests were written TDD-style but have not been executed against a real database yet. This is the highest-risk item.
Compatibility with get_read_anomaly_scores_query: The reused anomaly query expects bucket-based data. bucket_start = bucket_end = row_timestamp is a novel pattern that may interact unexpectedly with the backfill_days filter (bucket_end > max_bucket_end - backfill_days) and the buckets CTE join. Verify this works correctly on an actual database.
No metrics table created / no store_metrics_table_in_cache call: Unlike all other anomaly tests, this test skips the metrics persistence step. Confirm this doesn't break the on_run_end hook or Elementary Cloud ingestion.
time_bucket hardcoded to {"period": "day", "count": 1}: This satisfies validate_mandatory_configuration but leaks into metric_properties and the test configuration dict even though it's unused. Consider whether to make time_bucket non-mandatory for this test type.

Notes

The initial commit had a Jinja2 block-scoping bug in column validation, surrogate key collision risk, and inconsistent column quoting — all fixed in the follow-up commit (uses get_column_obj_and_monitors, adds row_idx to surrogate key, uses column_obj.quoted).
get_column_obj_and_monitors is called with ["column_value"] as the monitor list. This isn't a real monitor type in Elementary's registry, so the returned monitors list will be empty — but the column object lookup still works correctly since it's based on column name matching, not monitor filtering.

Link to Devin session: https://app.devin.ai/sessions/97ebe636d57244fb82a7452e1521604e
Requested by: @arbiv

Summary by CodeRabbit

New Features
- Added a column-value anomaly test for numeric time-series with configurable sensitivity, directional detection (spike/drop), seasonality bucketing, filtering, detection/training windowing, and backfill support.
Tests
- Added integration tests covering normal baseline, outlier detection, directionality, filtering behavior, sensitivity thresholds, seasonality handling, and training-period windowing.

…irst) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

…er detection Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration · 2026-04-12T11:05:43Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

github-actions · 2026-04-12T11:05:50Z

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

coderabbitai · 2026-04-12T11:05:58Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e529144b-6e93-4b3f-b7b4-1b1ce64dcaf9

📥 Commits

Reviewing files that changed from the base of the PR and between 073b456 and 3ce18ec.

📒 Files selected for processing (1)

integration_tests/tests/test_column_value_anomalies.py

🚧 Files skipped from review as they are similar to previous changes (1)

integration_tests/tests/test_column_value_anomalies.py

📝 Walkthrough

Walkthrough

Adds a new integration test module and a dbt test macro implementing z-score–based column value anomaly detection with configurable timestamping, sensitivity, direction, seasonality, and training/detection windows; tests exercise multiple detection scenarios and configuration permutations.

Changes

Cohort / File(s)	Summary
Integration Tests `integration_tests/tests/test_column_value_anomalies.py`	Adds a new test module with nine top-level pytest functions that synthesize timestamped numeric series and invoke the dbt test `elementary.column_value_anomalies`, asserting pass/fail across: baseline, injected outliers, spike/drop directionality, `where` filtering, sensitivity thresholds, seasonality (`day_of_week`), and `training_period` windowing.
Dbt Test Macro `macros/edr/tests/test_column_value_anomalies.sql`	Adds `{% test column_value_anomalies(...) %}` macro. Validates execution gating and model relation, enforces required `timestamp_column`, verifies target `column_name` existence, computes detection/training windows (detection_delay, days_back, backfill_days, training_period, detection_period), builds seasonality bucketing, computes training baselines (avg, stddev, count) optionally by bucket, computes per-row `anomaly_score` and threshold bounds from `anomaly_sensitivity`/`anomaly_direction`, materializes an elementary test table, and persists/returns results via `elementary` helpers. New macro params include `timestamp_column`, `column_name`, `where_expression`, `anomaly_sensitivity`, `anomaly_direction`, `min_training_set_size`, `days_back`, `backfill_days`, `seasonality`, `detection_delay`, `detection_period`, `training_period`, `exclude_detection_period_from_training`, etc.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant TestRunner as pytest/dbt_test_invoker
participant dbt as dbt
participant Macro as column_value_anomalies macro
participant DB as Database
participant Elementary as elementary storage

TestRunner->>dbt: run test `elementary.column_value_anomalies`
dbt->>Macro: invoke macro with model, column, timestamp, config
Macro->>DB: resolve relation & validate column
Macro->>DB: compute windows, seasonality buckets
Macro->>DB: generate training & detection row sets
Macro->>DB: compute baselines and anomaly_scores
Macro->>DB: create elementary test table
Macro->>Elementary: store_anomaly_test_results
Elementary-->>dbt: return aggregated test results
dbt-->>TestRunner: return pass/fail status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I count the days and watch each score,
Timestamps hum and outliers roar.
Spike or drop, I nibble clues,
Buckets, windows, sensible hues.
A rabbit cheers when tests pass through.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a new dbt test macro for row-level outlier detection on column values.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch devin/1775991629-column-value-anomalies

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…uoting, formatting - Use get_column_obj_and_monitors() instead of manual loop (fixes Jinja2 block scoping bug) - Add row_idx to surrogate key to prevent collisions for duplicate value+timestamp rows - Use column_obj.quoted instead of adapter.quote(column_name) for consistent quoting - Apply black and sqlfmt formatting Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 68-124: The test test_column_value_anomalies_spike_direction
currently sends both a spike and a drop in every run so it cannot verify
anomaly_direction; change the test to run two controlled scenarios (1) only a
high outlier (spike-only) with anomaly_direction="spike" and verify failure and
that the returned anomalous rows in test_result (from dbt_project.test) contain
the high value, and (2) only a low outlier (drop-only) with
anomaly_direction="drop" and verify failure and that the returned anomalous rows
contain the low value; alternatively add negative-control runs where spike
direction is run against drop-only data and drop direction against spike-only
data and assert those runs pass (check test_result["status"] and inspect
test_result's anomalous rows) so the test actually exercises the direction
filtering in dbt_project.test.
- Around line 234-290: The test function
test_column_value_anomalies_with_seasonality creates a dataset that is too
forgiving and may pass even when seasonality is ignored; fix it by adding either
(A) a control run that invokes dbt_project.test with no "seasonality" (e.g.,
build test_args_nonseasonal from DBT_TEST_ARGS without the "seasonality" key)
and assert its result["status"] == "fail", or (B) tighten the synthetic data in
the function so weekday values and weekend values are further apart (e.g., make
weekdays much smaller and weekends much larger) such that a non-seasonal
baseline would flag the detection row as anomalous while the seasonal run (using
test_args_seasonal and dbt_project.test) still asserts pass; reference
test_column_value_anomalies_with_seasonality, test_args_seasonal, DBT_TEST_ARGS,
and dbt_project.test to locate where to add the control run or adjust the
fixture.

In `@macros/edr/tests/test_column_value_anomalies.sql`:
- Around line 302-305: The read-anomaly SQL is being built without the test
context so it loses config like anomaly_direction; change the call to
get_read_anomaly_scores_query to pass the flattened test context (i.e., call
elementary.get_read_anomaly_scores_query(flattened_test)) so the
anomaly_scores_sql is generated using the same flattened_test used for
store_anomaly_test_results, ensuring both the query and the stored results share
the same test config.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b69152d-f7d3-41d9-a369-a920aca620c3

📥 Commits

Reviewing files that changed from the base of the PR and between fbf557b and 0110563.

📒 Files selected for processing (2)

integration_tests/tests/test_column_value_anomalies.py
macros/edr/tests/test_column_value_anomalies.sql

coderabbitai · 2026-04-12T11:22:13Z

+        {% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
+        {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}
+        {% do elementary.store_anomaly_test_results(
+            flattened_test, anomaly_scores_sql


⚠️ Potential issue | 🟠 Major

Pass flattened_test into the read-anomaly query.

Line 303 drops the same test context you pass at Line 308. Since this temp table does not materialize config like anomaly_direction, store_anomaly_test_results() can cache a status under different anomaly rules than the query this test finally returns.

Proposed fix

- {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %} + {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query( + flattened_test + ) %}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}

{% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}

{% do elementary.store_anomaly_test_results(

flattened_test, anomaly_scores_sql

{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}

{% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query(

flattened_test

) %}

{% do elementary.store_anomaly_test_results(

flattened_test, anomaly_scores_sql

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@macros/edr/tests/test_column_value_anomalies.sql` around lines 302 - 305, The read-anomaly SQL is being built without the test context so it loses config like anomaly_direction; change the call to get_read_anomaly_scores_query to pass the flattened test context (i.e., call elementary.get_read_anomaly_scores_query(flattened_test)) so the anomaly_scores_sql is generated using the same flattened_test used for store_anomaly_test_results, ensuring both the query and the stored results share the same test config.

This is not a bug — get_read_anomaly_scores_query() without flattened_test falls back to elementary.flatten_test(model) on line 73-75 of get_anomaly_query.sql, which correctly resolves the test config including anomaly_direction.

All existing anomaly tests use this same pattern:

test_column_anomalies.sql:180

test_table_anomalies.sql:178

test_all_columns_anomalies.sql:193

The flattened_test parameter is only passed explicitly inside get_anomaly_query itself (line 7), which is the function called on line 308 of this macro. So anomaly_direction is correctly applied in both the store_anomaly_test_results and get_anomaly_query paths.

Skipped: comment is from another GitHub bot.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

integration_tests/tests/test_column_value_anomalies.py (1)

312-368: ⚠️ Potential issue | 🟠 Major

Seasonality case still needs a non-seasonal counterfactual.

This fixture can pass even if seasonality="day_of_week" is ignored, so it doesn’t reliably validate the feature. Add a paired run without seasonality that is expected to fail (or tighten the fixture so seasonal/non-seasonal outcomes must diverge).

Suggested test-shape update

     test_result = dbt_project.test(
         test_id,
         DBT_TEST_NAME,
         test_args_seasonal,
         data=data,
         test_column="amount",
     )
     assert test_result["status"] == "pass"
+
+    # Control run without seasonality should fail once fixture is tightened
+    test_args_nonseasonal = {
+        **DBT_TEST_ARGS,
+        "anomaly_sensitivity": 3,
+        "training_period": {"period": "day", "count": 60},
+    }
+    control_result = dbt_project.test(
+        test_id,
+        DBT_TEST_NAME,
+        test_args_nonseasonal,
+        test_column="amount",
+        test_vars={"force_metrics_backfill": True},
+    )
+    assert control_result["status"] == "fail"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_column_value_anomalies.py` around lines 312 -
368, The test test_column_value_anomalies_with_seasonality currently only
asserts the seasonal run passes, which would also pass if seasonality is
ignored; add a paired non-seasonal run that is expected to fail to validate the
seasonality behavior. Create test_args_nonseasonal by copying test_args_seasonal
but removing or setting "seasonality" to None/"" (and keep same
anomaly_sensitivity/training_period), call dbt_project.test(test_id,
DBT_TEST_NAME, test_args_nonseasonal, data=data, test_column="amount"), and
assert that its result["status"] == "fail"; this ensures the code must respect
"seasonality" to pass the first run (references:
test_column_value_anomalies_with_seasonality, test_args_seasonal,
dbt_project.test, DBT_TEST_NAME).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 371-426: The test test_column_value_anomalies_with_training_period
currently only asserts a pass for training_period count=7 and doesn't prove
windowing; add a contrasting control run (call dbt_project.test again with
either no training_period or a longer training_period, e.g.,
{"period":"day","count":999} or omit training_period so baseline uses old low
values) and assert that this control run returns "fail". Ensure the data balance
still forces divergent outcomes (keep the 30 days of low 95/100/105 and 7 days
of high 490/500/510 as-is or increase separation) so that the short
training_period run (test_args with anomaly_sensitivity and training_period)
passes while the control run (default/long window) deterministically fails; use
the same test_id, DBT_TEST_NAME, test_column and data and name the second result
variable (e.g., control_result) to assert its status == "fail".

---

Duplicate comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 312-368: The test test_column_value_anomalies_with_seasonality
currently only asserts the seasonal run passes, which would also pass if
seasonality is ignored; add a paired non-seasonal run that is expected to fail
to validate the seasonality behavior. Create test_args_nonseasonal by copying
test_args_seasonal but removing or setting "seasonality" to None/"" (and keep
same anomaly_sensitivity/training_period), call dbt_project.test(test_id,
DBT_TEST_NAME, test_args_nonseasonal, data=data, test_column="amount"), and
assert that its result["status"] == "fail"; this ensures the code must respect
"seasonality" to pass the first run (references:
test_column_value_anomalies_with_seasonality, test_args_seasonal,
dbt_project.test, DBT_TEST_NAME).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a83a427b-e94c-4de0-86f6-c130431105f5

📥 Commits

Reviewing files that changed from the base of the PR and between 0110563 and 073b456.

📒 Files selected for processing (1)

integration_tests/tests/test_column_value_anomalies.py

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration Bot and others added 2 commits April 12, 2026 11:01

test: add integration tests for column_value_anomalies (TDD - tests f…

76cf72d

…irst) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

feat: implement column_value_anomalies test macro for row-level outli…

fbf557b

…er detection Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration Bot assigned arbiv Apr 12, 2026

This comment was marked as resolved.

Sign in to view

devin-ai-integration Bot commented Apr 12, 2026

View reviewed changes

test: add direction-specificity tests and fix z-score docstring accuracy

073b456

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

test: tighten seasonality test with wider weekday/weekend gap

4663088

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

coderabbitai Bot reviewed Apr 12, 2026

View reviewed changes

Comment thread integration_tests/tests/test_column_value_anomalies.py

devin-ai-integration Bot and others added 2 commits April 12, 2026 11:27

test: add control run to training_period test to prove windowing works

db200e3

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

fix: add variance to where_expression test training data (stddev=0 fix)

3ce18ec

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>

devin-ai-integration Bot mentioned this pull request Apr 13, 2026

docs: add column_value_anomalies test documentation elementary-data/elementary#2183

Open

4 tasks

Conversation

devin-ai-integration Bot commented Apr 12, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Review & Testing Checklist for Human

Notes

Summary by CodeRabbit

Uh oh!

devin-ai-integration Bot commented Apr 12, 2026

🤖 Devin AI Engineer

Uh oh!

github-actions Bot commented Apr 12, 2026

Uh oh!

coderabbitai Bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

This comment was marked as resolved.

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

devin-ai-integration Bot commented Apr 12, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 12, 2026 •

edited

Loading

coderabbitai Bot Apr 12, 2026 •

edited

Loading