Skip to content

fix(airt): normalize attacks arg in generate_category_attack#35

Merged
rdheekonda merged 3 commits into
mainfrom
fix/airt-category-attacks-arg-parsing
Jun 4, 2026
Merged

fix(airt): normalize attacks arg in generate_category_attack#35
rdheekonda merged 3 commits into
mainfrom
fix/airt-category-attacks-arg-parsing

Conversation

@rdheekonda
Copy link
Copy Markdown
Contributor

Problem

generate_category_attack's attacks parameter was iterated character-by-character when passed as a bare string, producing cryptic errors like Unknown attack: 't' / Unknown attack: '['. This made the entire category-sweep path effectively unusable — no format (bare string, full SDK name, JSON list) worked.

Root cause: the runner read attacks_raw = params.get("attacks", []) and looped for a in attacks_raw directly, so "tap" became ['t', 'a', 'p']. By contrast, generate_attack correctly does attack_type.split(",").

Fix

  • scripts/attack_runner.py: add _normalize_attack_names() accepting a list (["tap", "goat"]), comma-separated string ("tap,goat"), single name ("tap"), or stringified-list noise ("['tap']"). Return a clear validation error instead of a single-character failure.
  • tools/attacks.py: widen the attacks annotation to list[str] | str.
  • tests/test_attack_runner.py: add TestNormalizeAttackNames regression coverage (10 cases + explicit "must not split to chars" assertion).
  • skills/error-troubleshooting/SKILL.md: document the single-character failure signature.
  • agents/ai-red-teaming-agent.md: add category-tool auto-fallback guidance.

Verification

  • py_compile passes on all changed Python files.
  • Normalization logic validated against all parametrized cases.
  • End-to-end: re-ran the previously-failing generate_category_attack(attacks=["tap"], categories=["violence"], …) — it now parses correctly, loads bundled violence goals, and runs a real 3-goal sweep to completion.

The `attacks` parameter was iterated character-by-character when passed
as a bare string, producing cryptic "Unknown attack: 't'" errors and
making the entire category-sweep path unusable.

- Add _normalize_attack_names() to accept list, comma-separated string,
  single name, or stringified-list noise; mirrors generate_attack's
  attack_type.split(",") handling.
- Return a clear validation error instead of a single-character failure.
- Widen the tool annotation to list[str] | str.
- Add TestNormalizeAttackNames regression coverage.
- Document the failure signature in error-troubleshooting skill.
- Add category-tool auto-fallback guidance to the agent instructions.
attack_runner.py does not import typing as t, so the t.Any annotation
tripped ruff F821 (undefined name) in CI. Use the builtin object
annotation, which needs no import.
@rdheekonda rdheekonda merged commit b2803d9 into main Jun 4, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant