Azure AI Evaluation's `RedTeam.scan()` method decodes **all** encoded attack prompts when storing the result files

- **Package Name**: 
Name: azure-ai-evaluation
Version: 1.16.5
- **Package Version**: as above
- **Operating System**: MacOS
- **Python Version**: Python 3.12.13

**Describe the bug**
Azure AI Evaluation's `RedTeam.scan()` method decodes **all** encoded attack prompts before storing them in `evaluation_results.json` and `results.json`, regardless of encoding strategy (flip, base64, morse, etc.). This makes it impossible to verify what the target agent actually received. The `attack_technique` metadata is correctly set, but the conversation payloads in `attack_details[].conversation` are decoded back to their original form, losing fidelity regarding the actual attack surface.

**Screenshot**
Captured prompts received by target: flip-encoded - entries 3 and 4

<img width="1065" height="162" alt="Image" src="https://github.com/user-attachments/assets/fd5bdb96-0039-4d00-8986-8ed015e18d7e" />

corresponding evaluation_results.json

<img width="1444" height="972" alt="Image" src="https://github.com/user-attachments/assets/aedf1bdc-7425-486d-93f1-fde8291f8521" />

**To Reproduce**
Steps to reproduce the behavior:
1. Create a PyRIT `RedTeam` scan targeting any remote agent/endpoint with multiple encoding-based attack strategies (e.g., `[AttackStrategy.Flip, AttackStrategy.Base64, AttackStrategy.Morse]`)
2. Implement a custom target callback that logs/captures the raw prompts it receives before processing
3. Run the scan: `await red_team.scan(target=callback, attack_strategies=[AttackStrategy.Flip, ...])`
4. Save the raw received prompts to a reference file (e.g., JSONL)
5. Compare the raw prompts in your reference file with `evaluation_results.json[attack_details][].conversation[].content`

**Expected behavior**
- If a flip-encoded prompt was sent to the target: `"...edocne txet si siht"`, it should appear in evaluation results as flip-encoded
- If a base64-encoded prompt was sent: `"...aW52YWxpZCBiYXNlNjQ="`, it should appear as base64-encoded
- If a morse-encoded prompt was sent: `".... . .-.. .-.. ---"`, it should appear in morse
- The `attack_details[].conversation` should contain the **exact payloads** that were sent to the target
- Attack technique metadata should correlate with the actual prompt encoding in the conversation

**Current behavior**
- **All encoding strategies** (flip, base64, morse, etc.) are decoded back to plaintext in `evaluation_results.json`
- The raw encoded prompts appear correctly when logged at the target callback level
- Same prompts appear **decoded to plaintext** in `evaluation_results.json[attack_details][].conversation`
- The `attack_technique` labels are correct (flip, base64, morse), but the conversation content doesn't match the encoding
- This creates an audit/transparency gap: you cannot verify which variant the agent actually saw

**Impact**
- Impossible to audit/verify the attack surface post-scan
- Cannot debug why specific encoding variants succeeded/failed
- Cannot correlate target responses to the exact encoded prompts they received
- Breaks reproducibility of attack attempts (encoded prompt in metadata doesn't match actual prompt in results)

**Additional context**
- Issue occurs across all encoding attack strategies, not limited to specific ones
- Baseline (non-encoded) prompts remain unchanged (as expected)
- Only encoded prompts are affected—all are normalized back to plaintext before storage in results JSON


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure AI Evaluation's `RedTeam.scan()` method decodes all encoded attack prompts when storing the result files #47228

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Azure AI Evaluation's RedTeam.scan() method decodes **all** encoded attack prompts when storing the result files #47228

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Azure AI Evaluation's `RedTeam.scan()` method decodes all encoded attack prompts when storing the result files #47228