Fix sensitive_data_leakage tool context not reaching agent callback in Foundry path#46151
Conversation
b24dd5b to
5e13209
Compare
5e13209 to
73d409c
Compare
There was a problem hiding this comment.
Pull request overview
This PR addresses a Foundry-path regression where agent tool context (e.g., tool_name/context_type context items used by sensitive-data leakage objectives) was not making it into the callback context parameter, leading to false-negative scoring. The changes attempt to propagate tool context by creating context SeedPrompts and extracting tool context from conversation history in _CallbackChatTarget.
Changes:
- Create context
SeedPrompts for standard attacks so tool context can be recovered from prepended conversation history. - Extract tool context from conversation history
prompt_metadatain_CallbackChatTargetas a fallback when labels don’t contain context. - Expand unit/E2E coverage (new seed objectives with tool context; assertions that callback receives tool context; PyRIT memory reset between E2E tests).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_dataset_builder.py | Adds standard-attack context SeedPrompt creation and an objective SeedPrompt to drive Foundry context/tool propagation. |
| sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_callback_chat_target.py | Extracts tool context from conversation history prompt_metadata and passes it via context['contexts'] to the callback. |
| sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_dataset_builder_binary_path.py | Adds unit tests verifying context SeedPrompt creation and sequencing expectations. |
| sdk/evaluation/azure-ai-evaluation/tests/unittests/test_redteam/test_callback_chat_target.py | Adds unit test verifying tool context extraction from prepended conversation history. |
| sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_red_team_foundry.py | Adds an autouse fixture to reset PyRIT singleton memory; updates sensitive-data-leakage E2E to assert tool context delivery. |
| sdk/evaluation/azure-ai-evaluation/tests/e2etests/data/redteam_seeds/sensitive_data_leakage_seeds.json | Adds seed objectives that include tool context payloads for E2E validation. |
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_dataset_builder.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_foundry/_dataset_builder.py
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_callback_chat_target.py
Outdated
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/tests/e2etests/test_red_team_foundry.py
Outdated
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/red_team/_callback_chat_target.py
Show resolved
Hide resolved
…y path In the Foundry execution path, agent-specific context items (with tool_name fields like document_client_smode) were stored in SeedObjective.metadata but PyRIT discards SeedObjective.metadata during attack execution -- only objective.value is sent to the target. The target never saw the sensitive data, causing all sensitive_data_leakage objectives to score 0.0. Fix by creating context as SeedPrompt objects at lower sequence numbers so PyRIT places them in prepended_conversation (conversation history). A user SeedPrompt for the objective text is added at a higher sequence so it becomes next_message (the actual prompt). _CallbackChatTarget filters these context pieces out of the messages list (so the model doesn't see raw sensitive data as prior user messages) and instead reconstructs context['contexts'] with tool_name fields. This enables the ACA runtime agent_callback to build FunctionTool injections without any changes to the ACA code -- the model must call the tool to access the sensitive data, matching the intended attack semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
73d409c to
99f4746
Compare
nagkumar91
left a comment
There was a problem hiding this comment.
LGTM — two-part fix is clean. Context SeedPrompts + callback extraction solves the tool context gap. Good unit + e2e coverage.
- Use underscores in risk-type ('sensitive_data_leakage') to match SDK
validator. The service uses hyphens but the SDK expects underscores;
seeds with hyphens were silently skipped, leaving no tool-context
objectives in the test.
- Wrap CentralMemory.get_memory_instance() in try/except since it throws
if called before any instance is set.
- Add CHANGELOG entry for 1.16.5.
- Merge upstream main.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The tool names document_client_smode and email_client_smode come from the RAI service's attack objectives for sensitive_data_leakage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Reset PyRIT database (drop/recreate tables) before each test instead of using :memory: DB that gets overwritten by RedTeam.__init__ - Filter is_context pieces in FoundryResultProcessor._build_messages_from_pieces so context SeedPrompts don't appear as extra user messages in conversations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Use isinstance(pm, dict) and pm.get('is_context') is True instead of
truthy checks. MagicMock objects return truthy values for any attribute
access, causing all conversation pieces to be filtered out in unit tests.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
nagkumar91
left a comment
There was a problem hiding this comment.
Re-reviewed after updates. Good improvements — stricter metadata checks, context filtering in _build_messages_from_pieces (was a gap in v1), is_objective tagging, and seed data fix. LGTM.
…n Foundry path (#46151) * Fix sensitive_data_leakage tool context not reaching target in Foundry path In the Foundry execution path, agent-specific context items (with tool_name fields like document_client_smode) were stored in SeedObjective.metadata but PyRIT discards SeedObjective.metadata during attack execution -- only objective.value is sent to the target. The target never saw the sensitive data, causing all sensitive_data_leakage objectives to score 0.0. Fix by creating context as SeedPrompt objects at lower sequence numbers so PyRIT places them in prepended_conversation (conversation history). A user SeedPrompt for the objective text is added at a higher sequence so it becomes next_message (the actual prompt). _CallbackChatTarget filters these context pieces out of the messages list (so the model doesn't see raw sensitive data as prior user messages) and instead reconstructs context['contexts'] with tool_name fields. This enables the ACA runtime agent_callback to build FunctionTool injections without any changes to the ACA code -- the model must call the tool to access the sensitive data, matching the intended attack semantics. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix SDL seed risk-type, memory fixture, add changelog - Use underscores in risk-type ('sensitive_data_leakage') to match SDK validator. The service uses hyphens but the SDK expects underscores; seeds with hyphens were silently skipped, leaving no tool-context objectives in the test. - Wrap CentralMemory.get_memory_instance() in try/except since it throws if called before any instance is set. - Add CHANGELOG entry for 1.16.5. - Merge upstream main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Add 'smode' to cspell ignoreWords The tool names document_client_smode and email_client_smode come from the RAI service's attack objectives for sensitive_data_leakage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Fix conversation contamination between Foundry E2E tests - Reset PyRIT database (drop/recreate tables) before each test instead of using :memory: DB that gets overwritten by RedTeam.__init__ - Filter is_context pieces in FoundryResultProcessor._build_messages_from_pieces so context SeedPrompts don't appear as extra user messages in conversations Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Use strict is_context check to avoid MagicMock false positives Use isinstance(pm, dict) and pm.get('is_context') is True instead of truthy checks. MagicMock objects return truthy values for any attribute access, causing all conversation pieces to be filtered out in unit tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
In the Foundry execution path, agent-specific context items (with tool_name fields like document_client_smode) were stored in SeedObjective.metadata but never propagated to the callback's context parameter. The agent received an empty context dict and could not recognize the injected tools, causing all sensitive_data_leakage objectives to score 0.0 (false negative).
Add a fallback in _CallbackChatTarget._send_prompt_impl() that extracts context_items from request.prompt_metadata when labels['context'] is empty. This matches the ACA runtime behavior where FunctionTool definitions are dynamically created from context items.