Skip to content

fix: include all failing dimension values in anomaly alert description#1032

Open
joostboon wants to merge 2 commits into
masterfrom
fix/dimension-anomaly-alert-description
Open

fix: include all failing dimension values in anomaly alert description#1032
joostboon wants to merge 2 commits into
masterfrom
fix/dimension-anomaly-alert-description

Conversation

@joostboon

@joostboon joostboon commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Anomaly alerts for dimension-based tests (e.g. column_anomalies with dimensions) previously showed only a single arbitrary dimension value in the result message
  • Now the alert lists all failing dimension values with their metric value and average, truncated to 5 with an "and N more" suffix when there are many failures
  • Example new format: In column INCIDENTS_OPENED, 2 anomalous max values for dimension account_name: Joejuice (214.0, avg 21.0), Acme (15.0, avg 6.3).
  • Non-dimension tests are unaffected (fall back to existing single-row description)

Test plan

  • Added assertions to test_column_anomalies_group_by to verify the description format for 1 and 2 failing dimension values
  • Verified locally against a real column_anomalies test with dimensions -- alert now shows all failing dimension values

before and after screenshots:
Screenshot 2026-06-28 at 14 14 54
Screenshot 2026-06-28 at 14 14 40

joostboon and others added 2 commits June 28, 2026 13:22
Previously, dimension-based anomaly tests only showed a single (arbitrary)
dimension value in the alert description. Now the description lists all
failing dimension values with their metric and average values, truncated
to 5 with an "and N more" suffix when there are many failures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…by tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

👋 @joostboon
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The SQL macro get_anomaly_test_result_row is updated to collect anomalous rows and render a dimension-aware test_results_description listing up to 5 anomalous dimension values with metric details. Integration tests add assertions verifying the new description format for grouped anomaly scenarios.

Changes

Dimension-aware anomaly description

Layer / File(s) Summary
Anomalous row collection and description rendering
macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql
Adds preprocessing to filter anomaly_scores_rows into filtered_anomaly_scores_rows and anomalous_rows, increments failures.data per anomalous row, and replaces the flat anomaly_description fallback with a conditional block that, when a dimension exists and anomalous rows are present, formats a description with up to 5 dimension entries showing dimension_value, rounded metric_value, and rounded training_avg, plus an "and N more" suffix.
Integration test assertions
integration_tests/tests/test_column_anomalies.py
Adds assertIn checks on test_results_description in test_column_anomalies_group_by for both sensitivity scenarios, verifying anomalous count ("1 anomalous"/"2 anomalous"), the "for dimension dimension" phrase, and expected dimension values ("dim1", "dim2").

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

A rabbit checks each dimension with care,
"dim1 and dim2—anomalies found there!"
The description now lists them, up to five,
With metric and average, keeping count alive.
🐇 Hop along, the tests all pass!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the main change: expanding anomaly alert descriptions to include all failing dimension values.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/dimension-anomaly-alert-description

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
integration_tests/tests/test_column_anomalies.py (1)

320-323: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add one case that exercises the max_shown / "and N more" branch.

These assertions only cover 1- and 2-dimension failures, but the new formatter has separate behavior once there are more than 5 anomalous dimensions. A >5 case would keep the truncation and suffix logic from regressing silently.

Also applies to: 343-347

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@integration_tests/tests/test_column_anomalies.py` around lines 320 - 323, The
current assertions in the anomaly description tests only cover the
small-dimension cases, so add a new test case in test_column_anomalies that
drives the formatter past the max_shown threshold and verifies the “and N more”
suffix. Use the existing description checks around
test_result["test_results_description"] to assert the truncated output and the
extra-count branch, ensuring the formatter logic in the test results description
path does not regress.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql`:
- Around line 79-107: The summary in store_anomaly_test_results.sql is counting
and listing duplicate dimension_value entries from anomalous_rows, which can
overstate failures for a single dimension. Update the test_results_description
construction to deduplicate anomalous_rows by dimension_value before generating
dim_parts and total, keeping the latest or highest-priority row per value. Then
apply the existing max_shown cap to the unique list so the reported count and
examples reflect distinct failing dimensions.

---

Nitpick comments:
In `@integration_tests/tests/test_column_anomalies.py`:
- Around line 320-323: The current assertions in the anomaly description tests
only cover the small-dimension cases, so add a new test case in
test_column_anomalies that drives the formatter past the max_shown threshold and
verifies the “and N more” suffix. Use the existing description checks around
test_result["test_results_description"] to assert the truncated output and the
extra-count branch, ensuring the formatter logic in the test results description
path does not regress.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 64f4da88-9811-415a-88a8-6e8bd30533e4

📥 Commits

Reviewing files that changed from the base of the PR and between 244f9db and df48e27.

📒 Files selected for processing (2)
  • integration_tests/tests/test_column_anomalies.py
  • macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql

Comment on lines +79 to +107
{% set anomalous_rows = [] %}
{% for row in anomaly_scores_rows %}
{% if row.anomaly_score is not none %}
{% do filtered_anomaly_scores_rows.append(row) %}
{% if row.is_anomalous %}
{% set failures.data = failures.data + 1 %}
{% do anomalous_rows.append(row) %}
{% endif %}
{% endif %}
{% endfor %}
{% set test_results_description %}
{% if rows_with_score %}
{% set first_scored_row = rows_with_score[0] %}
{% set dimension = elementary.insensitive_get_dict_value(first_scored_row, 'dimension') %}
{% if dimension and anomalous_rows %}
{% set max_shown = 5 %}
{% set shown_rows = anomalous_rows[:max_shown] %}
{% set dim_parts = [] %}
{% for row in shown_rows %}
{% set dim_val = elementary.insensitive_get_dict_value(row, 'dimension_value') %}
{% set dim_val_str = dim_val if dim_val is not none else 'NULL' %}
{% set m_val = elementary.insensitive_get_dict_value(row, 'metric_value') %}
{% set t_val = elementary.insensitive_get_dict_value(row, 'training_avg') %}
{% set m_str = (m_val | float | round(3)) if m_val is not none else 'N/A' %}
{% set t_str = (t_val | float | round(3)) if t_val is not none else 'N/A' %}
{% do dim_parts.append(dim_val_str ~ " (" ~ m_str ~ ", avg " ~ t_str ~ ")") %}
{% endfor %}
{% set total = anomalous_rows | length %}
{% if column_name %}In column {{ column_name | upper }}, {% endif %}{{ total }} anomalous {{ metric_name }} value{% if total > 1 %}s{% endif %} for dimension {{ dimension }}: {{ dim_parts | join(", ") }}{% if total > max_shown %}, and {{ total - max_shown }} more{% endif %}.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Deduplicate dimension_value before building the summary.

anomalous_rows collects every anomalous scored row in the (full_table_name, column_name, metric_name) group. Because those rows are bucket-based, the same dimension can appear multiple times here, so total and dim_parts can repeat a single failing dimension and overstate the “N anomalous … values” count. Build the description from unique dimension_values first (keeping the latest or highest-priority row per value), then calculate total and apply the 5-item cap.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@macros/edr/data_monitoring/anomaly_detection/store_anomaly_test_results.sql`
around lines 79 - 107, The summary in store_anomaly_test_results.sql is counting
and listing duplicate dimension_value entries from anomalous_rows, which can
overstate failures for a single dimension. Update the test_results_description
construction to deduplicate anomalous_rows by dimension_value before generating
dim_parts and total, keeping the latest or highest-priority row per value. Then
apply the existing max_shown cap to the unique list so the reported count and
examples reflect distinct failing dimensions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant