fix: flush OTEL spans between sequential studies and fix transform defaults by rdheekonda · Pull Request #33 · dreadnode/capabilities

rdheekonda · 2026-06-03T22:42:21Z

Summary

Two bugs fixed in the AI Red Teaming capability.

Bug 1: OTEL span loss in multi-study workflows

When running N+1 transform comparisons, multi-attack campaigns, or category sweeps, only the first study's traces appeared on the platform. The BatchSpanProcessor buffers spans on a background thread — later studies' spans were still in the buffer when dn.shutdown() raced against process teardown.

Fix: Added explicit force_flush(timeout_millis=10_000) between sequential assessment.run() calls in all 3 multi-study templates:

_TRANSFORM_STUDY_TEMPLATE (N+1 transform comparisons)
_CAMPAIGN_ATTACK_BLOCK (multi-attack campaigns)
Category attack template (multi-goal sweeps)

Bug 2: Unwanted baseline run with transforms

When user said "run crescendo with Telugu", the agent forced compare_transforms=true, creating an N+1 study with an unrequested baseline run. The user only wanted the transform applied.

Fix: Changed default to compare_transforms=false. Transforms are applied directly — no baseline added. N+1 comparison only triggers when user explicitly asks to "compare" or "benchmark" transforms.

Version bump: 1.3.2 → 1.3.5

Files changed (3 files, +40 / -4)

File	Change
`agents/ai-red-teaming-agent.md`	Transform default: `compare_transforms=false` unless user asks to compare
`scripts/attack_runner.py`	OTEL `force_flush()` between sequential studies in 3 templates
`capability.yaml`	Version bump 1.3.2 → 1.3.5

Testing

Reproduced: Ran Crescendo N+1 with adapt_language(Telugu) — only baseline appeared on platform
After fix: Re-ran with OTEL flush — both studies exported successfully
Verified generated workflows include the flush block

…faults Two bugs fixed: 1. OTEL span loss in multi-study workflows: When running N+1 transform comparisons, multi-attack campaigns, or category sweeps, only the first study's traces were reliably exported to the platform. The BatchSpanProcessor buffers spans on a background thread — later studies' spans were still in the buffer when dn.shutdown() raced against process teardown. Added explicit force_flush() between sequential assessment.run() calls in all 3 multi-study templates. 2. Unwanted baseline run: When user specified transforms (e.g. 'run crescendo with Telugu'), the agent was forced to set compare_transforms=true, creating an N+1 study with an unrequested baseline run. Changed default to compare_transforms=false — transforms are now applied directly. N+1 comparison only triggers when user explicitly asks to 'compare' or 'benchmark' transforms. Bump version: 1.3.2 → 1.3.5

rdheekonda merged commit d66ab2d into main Jun 3, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: flush OTEL spans between sequential studies and fix transform defaults#33

fix: flush OTEL spans between sequential studies and fix transform defaults#33
rdheekonda merged 1 commit into
mainfrom
fix/otel-flush-and-transform-defaults

rdheekonda commented Jun 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rdheekonda commented Jun 3, 2026

Summary

Bug 1: OTEL span loss in multi-study workflows

Bug 2: Unwanted baseline run with transforms

Version bump: 1.3.2 → 1.3.5

Files changed (3 files, +40 / -4)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant