-
Notifications
You must be signed in to change notification settings - Fork 665
FEAT: Cyber scenario #1180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Cyber scenario #1180
Conversation
…into cyber_scenario Resolving merge conflict
|
This looks good! i'm wondering if there are ways to incorporate like xpia attacks or converters (MaliciousQuestionGeneratorConverter, there might be more just at first glance) to be a bit more creative rather than just updating prompts |
There definitely are! CyberStrategy is used very sparsely here, which I don't like, but I haven't found a way to reconcile the nature of cybersecurity harms (which are often sequential, iterative, and don't rely on conversions as much) with the tag-based system. But it's definitely something I want to drive in a second PR |
rlundeen2
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I recommend incorporating the changes first but they are small
Typo Co-authored-by: hannahwestra25 <hannahwestra@microsoft.com>
Description
Adds a cybersecurity harms scenario to pyrit called the CyberScenario, which tests a model's willingness to generate malware via single-turn or multi-turn (red teaming) attack methods. Changes listed below:
This PR is meant to be a starting point for additional cybersecurity harm scaffolding as there are still many places CyberScenario can be expanded on.
Tests and Documentation
Unit tests focus on initialization, attack generation, execution, and scenario properties, similarly to other scenarios.