Skip to content

Conversation

@ValbuenaVC
Copy link
Contributor

@ValbuenaVC ValbuenaVC commented Nov 10, 2025

Description

Adds a cybersecurity harms scenario to pyrit called the CyberScenario, which tests a model's willingness to generate malware via single-turn or multi-turn (red teaming) attack methods. Changes listed below:

  • Added CyberScenario and CyberStrategy classes
  • Added generic malware-oriented prompts to induce cyber harms as seed prompts
  • Added true/false scoring YAML for malware-oriented prompts
  • Added composite scoring logic in CyberScenario
  • Fixed minor typo in grounded.yaml
  • Added unit tests for CyberScenario

This PR is meant to be a starting point for additional cybersecurity harm scaffolding as there are still many places CyberScenario can be expanded on.

Tests and Documentation

Unit tests focus on initialization, attack generation, execution, and scenario properties, similarly to other scenarios.

@ValbuenaVC ValbuenaVC marked this pull request as ready for review November 12, 2025 00:23
@ValbuenaVC ValbuenaVC changed the title [DRAFT] FEAT: Cyber scenario FEAT: Cyber scenario Nov 12, 2025
@hannahwestra25
Copy link
Contributor

hannahwestra25 commented Nov 12, 2025

This looks good! i'm wondering if there are ways to incorporate like xpia attacks or converters (MaliciousQuestionGeneratorConverter, there might be more just at first glance) to be a bit more creative rather than just updating prompts

@ValbuenaVC
Copy link
Contributor Author

This looks good! i'm wondering if there are ways to incorporate like xpia attacks or converters (MaliciousQuestionGeneratorConverter, there might be more just at first glance) to be a bit more creative rather than just updating prompts

There definitely are! CyberStrategy is used very sparsely here, which I don't like, but I haven't found a way to reconcile the nature of cybersecurity harms (which are often sequential, iterative, and don't rely on conversions as much) with the tag-based system. But it's definitely something I want to drive in a second PR

Copy link
Contributor

@rlundeen2 rlundeen2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I recommend incorporating the changes first but they are small

ValbuenaVC and others added 2 commits November 19, 2025 17:09
Typo

Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>
@ValbuenaVC ValbuenaVC merged commit f02d3f2 into Azure:main Nov 20, 2025
37 of 38 checks passed
@ValbuenaVC ValbuenaVC deleted the cyber_scenario branch December 4, 2025 23:14
@ValbuenaVC ValbuenaVC restored the cyber_scenario branch December 4, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants