Skip to content

feat: add column_value_anomalies test for row-level outlier detection#982

Open
devin-ai-integration[bot] wants to merge 7 commits intomasterfrom
devin/1775991629-column-value-anomalies
Open

feat: add column_value_anomalies test for row-level outlier detection#982
devin-ai-integration[bot] wants to merge 7 commits intomasterfrom
devin/1775991629-column-value-anomalies

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot commented Apr 12, 2026

Summary

Adds a new elementary.column_value_anomalies dbt test that performs row-level outlier detection on raw column values using z-score against historical distribution. This is fundamentally different from the existing column_anomalies test, which operates on aggregated metrics (min, max, avg, null_count) per time bucket.

How it works:

  1. Splits data into training and detection periods using timestamp_column
  2. Computes baseline statistics (mean + stddev) from training period values (optionally per day-of-week when seasonality is configured)
  3. For each row in the detection period, computes z-score: (value - mean) / stddev
  4. Flags rows where z-score exceeds anomaly_sensitivity threshold

New files:

  • macros/edr/tests/test_column_value_anomalies.sql — the test macro
  • integration_tests/tests/test_column_value_anomalies.py — integration tests (TDD-style, written first)

The test reuses the existing get_anomalies_test_configuration, get_read_anomaly_scores_query, store_anomaly_test_results, and get_anomaly_query infrastructure. It skips the intermediate "metrics" table and directly produces an "anomaly_scores" table since there are no bucketed aggregate metrics.

Related: docs PR at elementary-data/elementary#2183.

Review & Testing Checklist for Human

  • Run integration tests against at least Postgres to verify the generated SQL is valid and the test behavior matches expectations. The tests were written TDD-style but have not been executed against a real database yet. This is the highest-risk item.
  • Compatibility with get_read_anomaly_scores_query: The reused anomaly query expects bucket-based data. bucket_start = bucket_end = row_timestamp is a novel pattern that may interact unexpectedly with the backfill_days filter (bucket_end > max_bucket_end - backfill_days) and the buckets CTE join. Verify this works correctly on an actual database.
  • No metrics table created / no store_metrics_table_in_cache call: Unlike all other anomaly tests, this test skips the metrics persistence step. Confirm this doesn't break the on_run_end hook or Elementary Cloud ingestion.
  • time_bucket hardcoded to {"period": "day", "count": 1}: This satisfies validate_mandatory_configuration but leaks into metric_properties and the test configuration dict even though it's unused. Consider whether to make time_bucket non-mandatory for this test type.

Notes

  • The initial commit had a Jinja2 block-scoping bug in column validation, surrogate key collision risk, and inconsistent column quoting — all fixed in the follow-up commit (uses get_column_obj_and_monitors, adds row_idx to surrogate key, uses column_obj.quoted).
  • get_column_obj_and_monitors is called with ["column_value"] as the monitor list. This isn't a real monitor type in Elementary's registry, so the returned monitors list will be empty — but the column object lookup still works correctly since it's based on column name matching, not monitor filtering.

Link to Devin session: https://app.devin.ai/sessions/97ebe636d57244fb82a7452e1521604e
Requested by: @arbiv

Summary by CodeRabbit

  • New Features

    • Added a column-value anomaly test for numeric time-series with configurable sensitivity, directional detection (spike/drop), seasonality bucketing, filtering, detection/training windowing, and backfill support.
  • Tests

    • Added integration tests covering normal baseline, outlier detection, directionality, filtering behavior, sensitivity thresholds, seasonality handling, and training-period windowing.

devin-ai-integration bot and others added 2 commits April 12, 2026 11:01
…irst)

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…er detection

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link
Copy Markdown
Contributor

👋 @devin-ai-integration[bot]
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e529144b-6e93-4b3f-b7b4-1b1ce64dcaf9

📥 Commits

Reviewing files that changed from the base of the PR and between 073b456 and 3ce18ec.

📒 Files selected for processing (1)
  • integration_tests/tests/test_column_value_anomalies.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • integration_tests/tests/test_column_value_anomalies.py

📝 Walkthrough

Walkthrough

Adds a new integration test module and a dbt test macro implementing z-score–based column value anomaly detection with configurable timestamping, sensitivity, direction, seasonality, and training/detection windows; tests exercise multiple detection scenarios and configuration permutations.

Changes

Cohort / File(s) Summary
Integration Tests
integration_tests/tests/test_column_value_anomalies.py
Adds a new test module with nine top-level pytest functions that synthesize timestamped numeric series and invoke the dbt test elementary.column_value_anomalies, asserting pass/fail across: baseline, injected outliers, spike/drop directionality, where filtering, sensitivity thresholds, seasonality (day_of_week), and training_period windowing.
Dbt Test Macro
macros/edr/tests/test_column_value_anomalies.sql
Adds {% test column_value_anomalies(...) %} macro. Validates execution gating and model relation, enforces required timestamp_column, verifies target column_name existence, computes detection/training windows (detection_delay, days_back, backfill_days, training_period, detection_period), builds seasonality bucketing, computes training baselines (avg, stddev, count) optionally by bucket, computes per-row anomaly_score and threshold bounds from anomaly_sensitivity/anomaly_direction, materializes an elementary test table, and persists/returns results via elementary helpers. New macro params include timestamp_column, column_name, where_expression, anomaly_sensitivity, anomaly_direction, min_training_set_size, days_back, backfill_days, seasonality, detection_delay, detection_period, training_period, exclude_detection_period_from_training, etc.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant TestRunner as pytest/dbt_test_invoker
participant dbt as dbt
participant Macro as column_value_anomalies macro
participant DB as Database
participant Elementary as elementary storage

TestRunner->>dbt: run test `elementary.column_value_anomalies`
dbt->>Macro: invoke macro with model, column, timestamp, config
Macro->>DB: resolve relation & validate column
Macro->>DB: compute windows, seasonality buckets
Macro->>DB: generate training & detection row sets
Macro->>DB: compute baselines and anomaly_scores
Macro->>DB: create elementary test table
Macro->>Elementary: store_anomaly_test_results
Elementary-->>dbt: return aggregated test results
dbt-->>TestRunner: return pass/fail status

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I count the days and watch each score,
Timestamps hum and outliers roar.
Spike or drop, I nibble clues,
Buckets, windows, sensible hues.
A rabbit cheers when tests pass through.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new dbt test macro for row-level outlier detection on column values.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch devin/1775991629-column-value-anomalies

Comment @coderabbitai help to get the list of available commands and usage tips.

…uoting, formatting

- Use get_column_obj_and_monitors() instead of manual loop (fixes Jinja2 block scoping bug)
- Add row_idx to surrogate key to prevent collisions for duplicate value+timestamp rows
- Use column_obj.quoted instead of adapter.quote(column_name) for consistent quoting
- Apply black and sqlfmt formatting

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
coderabbitai[bot]

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor Author

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 5 additional findings.

Open in Devin Review

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 68-124: The test test_column_value_anomalies_spike_direction
currently sends both a spike and a drop in every run so it cannot verify
anomaly_direction; change the test to run two controlled scenarios (1) only a
high outlier (spike-only) with anomaly_direction="spike" and verify failure and
that the returned anomalous rows in test_result (from dbt_project.test) contain
the high value, and (2) only a low outlier (drop-only) with
anomaly_direction="drop" and verify failure and that the returned anomalous rows
contain the low value; alternatively add negative-control runs where spike
direction is run against drop-only data and drop direction against spike-only
data and assert those runs pass (check test_result["status"] and inspect
test_result's anomalous rows) so the test actually exercises the direction
filtering in dbt_project.test.
- Around line 234-290: The test function
test_column_value_anomalies_with_seasonality creates a dataset that is too
forgiving and may pass even when seasonality is ignored; fix it by adding either
(A) a control run that invokes dbt_project.test with no "seasonality" (e.g.,
build test_args_nonseasonal from DBT_TEST_ARGS without the "seasonality" key)
and assert its result["status"] == "fail", or (B) tighten the synthetic data in
the function so weekday values and weekend values are further apart (e.g., make
weekdays much smaller and weekends much larger) such that a non-seasonal
baseline would flag the detection row as anomalous while the seasonal run (using
test_args_seasonal and dbt_project.test) still asserts pass; reference
test_column_value_anomalies_with_seasonality, test_args_seasonal, DBT_TEST_ARGS,
and dbt_project.test to locate where to add the control run or adjust the
fixture.

In `@macros/edr/tests/test_column_value_anomalies.sql`:
- Around line 302-305: The read-anomaly SQL is being built without the test
context so it loses config like anomaly_direction; change the call to
get_read_anomaly_scores_query to pass the flattened test context (i.e., call
elementary.get_read_anomaly_scores_query(flattened_test)) so the
anomaly_scores_sql is generated using the same flattened_test used for
store_anomaly_test_results, ensuring both the query and the stored results share
the same test config.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7b69152d-f7d3-41d9-a369-a920aca620c3

📥 Commits

Reviewing files that changed from the base of the PR and between fbf557b and 0110563.

📒 Files selected for processing (2)
  • integration_tests/tests/test_column_value_anomalies.py
  • macros/edr/tests/test_column_value_anomalies.sql

Comment on lines +302 to +305
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
{% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}
{% do elementary.store_anomaly_test_results(
flattened_test, anomaly_scores_sql
Copy link
Copy Markdown

@coderabbitai coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pass flattened_test into the read-anomaly query.

Line 303 drops the same test context you pass at Line 308. Since this temp table does not materialize config like anomaly_direction, store_anomaly_test_results() can cache a status under different anomaly rules than the query this test finally returns.

Proposed fix
-        {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}
+        {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query(
+            flattened_test
+        ) %}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
{% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}
{% do elementary.store_anomaly_test_results(
flattened_test, anomaly_scores_sql
{% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %}
{% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query(
flattened_test
) %}
{% do elementary.store_anomaly_test_results(
flattened_test, anomaly_scores_sql
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@macros/edr/tests/test_column_value_anomalies.sql` around lines 302 - 305, The
read-anomaly SQL is being built without the test context so it loses config like
anomaly_direction; change the call to get_read_anomaly_scores_query to pass the
flattened test context (i.e., call
elementary.get_read_anomaly_scores_query(flattened_test)) so the
anomaly_scores_sql is generated using the same flattened_test used for
store_anomaly_test_results, ensuring both the query and the stored results share
the same test config.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a bug — get_read_anomaly_scores_query() without flattened_test falls back to elementary.flatten_test(model) on line 73-75 of get_anomaly_query.sql, which correctly resolves the test config including anomaly_direction.

All existing anomaly tests use this same pattern:

The flattened_test parameter is only passed explicitly inside get_anomaly_query itself (line 7), which is the function called on line 308 of this macro. So anomaly_direction is correctly applied in both the store_anomaly_test_results and get_anomaly_query paths.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skipped: comment is from another GitHub bot.

Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
integration_tests/tests/test_column_value_anomalies.py (1)

312-368: ⚠️ Potential issue | 🟠 Major

Seasonality case still needs a non-seasonal counterfactual.

This fixture can pass even if seasonality="day_of_week" is ignored, so it doesn’t reliably validate the feature. Add a paired run without seasonality that is expected to fail (or tighten the fixture so seasonal/non-seasonal outcomes must diverge).

Suggested test-shape update
     test_result = dbt_project.test(
         test_id,
         DBT_TEST_NAME,
         test_args_seasonal,
         data=data,
         test_column="amount",
     )
     assert test_result["status"] == "pass"
+
+    # Control run without seasonality should fail once fixture is tightened
+    test_args_nonseasonal = {
+        **DBT_TEST_ARGS,
+        "anomaly_sensitivity": 3,
+        "training_period": {"period": "day", "count": 60},
+    }
+    control_result = dbt_project.test(
+        test_id,
+        DBT_TEST_NAME,
+        test_args_nonseasonal,
+        test_column="amount",
+        test_vars={"force_metrics_backfill": True},
+    )
+    assert control_result["status"] == "fail"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@integration_tests/tests/test_column_value_anomalies.py` around lines 312 -
368, The test test_column_value_anomalies_with_seasonality currently only
asserts the seasonal run passes, which would also pass if seasonality is
ignored; add a paired non-seasonal run that is expected to fail to validate the
seasonality behavior. Create test_args_nonseasonal by copying test_args_seasonal
but removing or setting "seasonality" to None/"" (and keep same
anomaly_sensitivity/training_period), call dbt_project.test(test_id,
DBT_TEST_NAME, test_args_nonseasonal, data=data, test_column="amount"), and
assert that its result["status"] == "fail"; this ensures the code must respect
"seasonality" to pass the first run (references:
test_column_value_anomalies_with_seasonality, test_args_seasonal,
dbt_project.test, DBT_TEST_NAME).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 371-426: The test test_column_value_anomalies_with_training_period
currently only asserts a pass for training_period count=7 and doesn't prove
windowing; add a contrasting control run (call dbt_project.test again with
either no training_period or a longer training_period, e.g.,
{"period":"day","count":999} or omit training_period so baseline uses old low
values) and assert that this control run returns "fail". Ensure the data balance
still forces divergent outcomes (keep the 30 days of low 95/100/105 and 7 days
of high 490/500/510 as-is or increase separation) so that the short
training_period run (test_args with anomaly_sensitivity and training_period)
passes while the control run (default/long window) deterministically fails; use
the same test_id, DBT_TEST_NAME, test_column and data and name the second result
variable (e.g., control_result) to assert its status == "fail".

---

Duplicate comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 312-368: The test test_column_value_anomalies_with_seasonality
currently only asserts the seasonal run passes, which would also pass if
seasonality is ignored; add a paired non-seasonal run that is expected to fail
to validate the seasonality behavior. Create test_args_nonseasonal by copying
test_args_seasonal but removing or setting "seasonality" to None/"" (and keep
same anomaly_sensitivity/training_period), call dbt_project.test(test_id,
DBT_TEST_NAME, test_args_nonseasonal, data=data, test_column="amount"), and
assert that its result["status"] == "fail"; this ensures the code must respect
"seasonality" to pass the first run (references:
test_column_value_anomalies_with_seasonality, test_args_seasonal,
dbt_project.test, DBT_TEST_NAME).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a83a427b-e94c-4de0-86f6-c130431105f5

📥 Commits

Reviewing files that changed from the base of the PR and between 0110563 and 073b456.

📒 Files selected for processing (1)
  • integration_tests/tests/test_column_value_anomalies.py

devin-ai-integration bot and others added 2 commits April 12, 2026 11:27
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant