feat: add column_value_anomalies test for row-level outlier detection#982
feat: add column_value_anomalies test for row-level outlier detection#982devin-ai-integration[bot] wants to merge 7 commits intomasterfrom
Conversation
…irst) Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
…er detection Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
👋 @devin-ai-integration[bot] |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a new integration test module and a dbt test macro implementing z-score–based column value anomaly detection with configurable timestamping, sensitivity, direction, seasonality, and training/detection windows; tests exercise multiple detection scenarios and configuration permutations. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
…uoting, formatting - Use get_column_obj_and_monitors() instead of manual loop (fixes Jinja2 block scoping bug) - Add row_idx to surrogate key to prevent collisions for duplicate value+timestamp rows - Use column_obj.quoted instead of adapter.quote(column_name) for consistent quoting - Apply black and sqlfmt formatting Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 68-124: The test test_column_value_anomalies_spike_direction
currently sends both a spike and a drop in every run so it cannot verify
anomaly_direction; change the test to run two controlled scenarios (1) only a
high outlier (spike-only) with anomaly_direction="spike" and verify failure and
that the returned anomalous rows in test_result (from dbt_project.test) contain
the high value, and (2) only a low outlier (drop-only) with
anomaly_direction="drop" and verify failure and that the returned anomalous rows
contain the low value; alternatively add negative-control runs where spike
direction is run against drop-only data and drop direction against spike-only
data and assert those runs pass (check test_result["status"] and inspect
test_result's anomalous rows) so the test actually exercises the direction
filtering in dbt_project.test.
- Around line 234-290: The test function
test_column_value_anomalies_with_seasonality creates a dataset that is too
forgiving and may pass even when seasonality is ignored; fix it by adding either
(A) a control run that invokes dbt_project.test with no "seasonality" (e.g.,
build test_args_nonseasonal from DBT_TEST_ARGS without the "seasonality" key)
and assert its result["status"] == "fail", or (B) tighten the synthetic data in
the function so weekday values and weekend values are further apart (e.g., make
weekdays much smaller and weekends much larger) such that a non-seasonal
baseline would flag the detection row as anomalous while the seasonal run (using
test_args_seasonal and dbt_project.test) still asserts pass; reference
test_column_value_anomalies_with_seasonality, test_args_seasonal, DBT_TEST_ARGS,
and dbt_project.test to locate where to add the control run or adjust the
fixture.
In `@macros/edr/tests/test_column_value_anomalies.sql`:
- Around line 302-305: The read-anomaly SQL is being built without the test
context so it loses config like anomaly_direction; change the call to
get_read_anomaly_scores_query to pass the flattened test context (i.e., call
elementary.get_read_anomaly_scores_query(flattened_test)) so the
anomaly_scores_sql is generated using the same flattened_test used for
store_anomaly_test_results, ensuring both the query and the stored results share
the same test config.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7b69152d-f7d3-41d9-a369-a920aca620c3
📒 Files selected for processing (2)
integration_tests/tests/test_column_value_anomalies.pymacros/edr/tests/test_column_value_anomalies.sql
| {% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %} | ||
| {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %} | ||
| {% do elementary.store_anomaly_test_results( | ||
| flattened_test, anomaly_scores_sql |
There was a problem hiding this comment.
Pass flattened_test into the read-anomaly query.
Line 303 drops the same test context you pass at Line 308. Since this temp table does not materialize config like anomaly_direction, store_anomaly_test_results() can cache a status under different anomaly rules than the query this test finally returns.
Proposed fix
- {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %}
+ {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query(
+ flattened_test
+ ) %}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| {% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %} | |
| {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query() %} | |
| {% do elementary.store_anomaly_test_results( | |
| flattened_test, anomaly_scores_sql | |
| {% set flattened_test = elementary.flatten_test(elementary.get_test_model()) %} | |
| {% set anomaly_scores_sql = elementary.get_read_anomaly_scores_query( | |
| flattened_test | |
| ) %} | |
| {% do elementary.store_anomaly_test_results( | |
| flattened_test, anomaly_scores_sql |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@macros/edr/tests/test_column_value_anomalies.sql` around lines 302 - 305, The
read-anomaly SQL is being built without the test context so it loses config like
anomaly_direction; change the call to get_read_anomaly_scores_query to pass the
flattened test context (i.e., call
elementary.get_read_anomaly_scores_query(flattened_test)) so the
anomaly_scores_sql is generated using the same flattened_test used for
store_anomaly_test_results, ensuring both the query and the stored results share
the same test config.
There was a problem hiding this comment.
This is not a bug — get_read_anomaly_scores_query() without flattened_test falls back to elementary.flatten_test(model) on line 73-75 of get_anomaly_query.sql, which correctly resolves the test config including anomaly_direction.
All existing anomaly tests use this same pattern:
The flattened_test parameter is only passed explicitly inside get_anomaly_query itself (line 7), which is the function called on line 308 of this macro. So anomaly_direction is correctly applied in both the store_anomaly_test_results and get_anomaly_query paths.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
integration_tests/tests/test_column_value_anomalies.py (1)
312-368:⚠️ Potential issue | 🟠 MajorSeasonality case still needs a non-seasonal counterfactual.
This fixture can pass even if
seasonality="day_of_week"is ignored, so it doesn’t reliably validate the feature. Add a paired run without seasonality that is expected to fail (or tighten the fixture so seasonal/non-seasonal outcomes must diverge).Suggested test-shape update
test_result = dbt_project.test( test_id, DBT_TEST_NAME, test_args_seasonal, data=data, test_column="amount", ) assert test_result["status"] == "pass" + + # Control run without seasonality should fail once fixture is tightened + test_args_nonseasonal = { + **DBT_TEST_ARGS, + "anomaly_sensitivity": 3, + "training_period": {"period": "day", "count": 60}, + } + control_result = dbt_project.test( + test_id, + DBT_TEST_NAME, + test_args_nonseasonal, + test_column="amount", + test_vars={"force_metrics_backfill": True}, + ) + assert control_result["status"] == "fail"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@integration_tests/tests/test_column_value_anomalies.py` around lines 312 - 368, The test test_column_value_anomalies_with_seasonality currently only asserts the seasonal run passes, which would also pass if seasonality is ignored; add a paired non-seasonal run that is expected to fail to validate the seasonality behavior. Create test_args_nonseasonal by copying test_args_seasonal but removing or setting "seasonality" to None/"" (and keep same anomaly_sensitivity/training_period), call dbt_project.test(test_id, DBT_TEST_NAME, test_args_nonseasonal, data=data, test_column="amount"), and assert that its result["status"] == "fail"; this ensures the code must respect "seasonality" to pass the first run (references: test_column_value_anomalies_with_seasonality, test_args_seasonal, dbt_project.test, DBT_TEST_NAME).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 371-426: The test test_column_value_anomalies_with_training_period
currently only asserts a pass for training_period count=7 and doesn't prove
windowing; add a contrasting control run (call dbt_project.test again with
either no training_period or a longer training_period, e.g.,
{"period":"day","count":999} or omit training_period so baseline uses old low
values) and assert that this control run returns "fail". Ensure the data balance
still forces divergent outcomes (keep the 30 days of low 95/100/105 and 7 days
of high 490/500/510 as-is or increase separation) so that the short
training_period run (test_args with anomaly_sensitivity and training_period)
passes while the control run (default/long window) deterministically fails; use
the same test_id, DBT_TEST_NAME, test_column and data and name the second result
variable (e.g., control_result) to assert its status == "fail".
---
Duplicate comments:
In `@integration_tests/tests/test_column_value_anomalies.py`:
- Around line 312-368: The test test_column_value_anomalies_with_seasonality
currently only asserts the seasonal run passes, which would also pass if
seasonality is ignored; add a paired non-seasonal run that is expected to fail
to validate the seasonality behavior. Create test_args_nonseasonal by copying
test_args_seasonal but removing or setting "seasonality" to None/"" (and keep
same anomaly_sensitivity/training_period), call dbt_project.test(test_id,
DBT_TEST_NAME, test_args_nonseasonal, data=data, test_column="amount"), and
assert that its result["status"] == "fail"; this ensures the code must respect
"seasonality" to pass the first run (references:
test_column_value_anomalies_with_seasonality, test_args_seasonal,
dbt_project.test, DBT_TEST_NAME).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a83a427b-e94c-4de0-86f6-c130431105f5
📒 Files selected for processing (1)
integration_tests/tests/test_column_value_anomalies.py
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Co-Authored-By: Yosef Arbiv <yosef.arbiv@gmail.com>
Summary
Adds a new
elementary.column_value_anomaliesdbt test that performs row-level outlier detection on raw column values using z-score against historical distribution. This is fundamentally different from the existingcolumn_anomaliestest, which operates on aggregated metrics (min, max, avg, null_count) per time bucket.How it works:
timestamp_columnseasonalityis configured)(value - mean) / stddevanomaly_sensitivitythresholdNew files:
macros/edr/tests/test_column_value_anomalies.sql— the test macrointegration_tests/tests/test_column_value_anomalies.py— integration tests (TDD-style, written first)The test reuses the existing
get_anomalies_test_configuration,get_read_anomaly_scores_query,store_anomaly_test_results, andget_anomaly_queryinfrastructure. It skips the intermediate "metrics" table and directly produces an "anomaly_scores" table since there are no bucketed aggregate metrics.Related: docs PR at elementary-data/elementary#2183.
Review & Testing Checklist for Human
get_read_anomaly_scores_query: The reused anomaly query expects bucket-based data.bucket_start = bucket_end = row_timestampis a novel pattern that may interact unexpectedly with thebackfill_daysfilter (bucket_end > max_bucket_end - backfill_days) and thebucketsCTE join. Verify this works correctly on an actual database.store_metrics_table_in_cachecall: Unlike all other anomaly tests, this test skips the metrics persistence step. Confirm this doesn't break theon_run_endhook or Elementary Cloud ingestion.time_buckethardcoded to{"period": "day", "count": 1}: This satisfiesvalidate_mandatory_configurationbut leaks intometric_propertiesand the test configuration dict even though it's unused. Consider whether to maketime_bucketnon-mandatory for this test type.Notes
get_column_obj_and_monitors, addsrow_idxto surrogate key, usescolumn_obj.quoted).get_column_obj_and_monitorsis called with["column_value"]as the monitor list. This isn't a real monitor type in Elementary's registry, so the returned monitors list will be empty — but the column object lookup still works correctly since it's based on column name matching, not monitor filtering.Link to Devin session: https://app.devin.ai/sessions/97ebe636d57244fb82a7452e1521604e
Requested by: @arbiv
Summary by CodeRabbit
New Features
Tests