Skip to content

Conversation

@ValbuenaVC
Copy link
Contributor

@ValbuenaVC ValbuenaVC commented Jan 30, 2026

Description

Adding more features to the Jailbreak scenario! Major changes:

  • JailbreakStrategy now supports multiple different attack types via ManyShot, PromptSending, Crescendo, and RedTeaming values.
  • New attack strategies can be collected using SINGLE_TURN and MULTI_TURN aggregates; PYRIT has been deprecated.
  • The initializer now accepts k, n, and jailbreaks; these allow you to choose a random number of jailbreaks, how many times to try each jailbreak, and to choose which jailbreaks specifically you'd like to use respectively. Note that k and jailbreaks are mutually exclusive.
  • A default adversarial target has been added to support the relevant attack strategies.

Tests and Documentation

  • Expanded to support new strategies.

@ValbuenaVC ValbuenaVC changed the title Jailbreak Scenario Expansion [DRAFT] Jailbreak Scenario Expansion Feb 5, 2026
@ValbuenaVC ValbuenaVC changed the title [DRAFT] Jailbreak Scenario Expansion [DRAFT] FEAT: Jailbreak Scenario Expansion Feb 5, 2026
@ValbuenaVC ValbuenaVC marked this pull request as ready for review February 10, 2026 01:16
@ValbuenaVC ValbuenaVC changed the title [DRAFT] FEAT: Jailbreak Scenario Expansion FEAT: Jailbreak Scenario Expansion Feb 10, 2026
Copy link
Contributor

@fdubut fdubut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few comments. The main one is the question mark around running multi-turn attacks on single-turn jailbreaks (which all of them are in the main template directory). It would be worth testing what it looks like and if that even makes sense.


@classmethod
def get_all_jailbreak_templates(cls, n: Optional[int] = None) -> List[str]:
def get_all_jailbreak_templates(cls, k: Optional[int] = None) -> List[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this function is called directly anywhere and if it would be a breaking change, but it would make sense to me to rename it get_jailbreak_templates since by definition you are not always returning them all.

# Strategies for tweaking jailbreak efficacy through attack patterns
ManyShot = ("many_shot", {"single_turn"})
PromptSending = ("prompt_sending", {"single_turn"})
Crescendo = ("crescendo", {"multi_turn"})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try running these jailbreaks with multi-turn attacks? Most of the jailbreaks in PyRIT are single-turn jailbreaks, I'm not even quite sure how they would look like with a multi-turn attack...

Strategy for jailbreak attacks.
"""

# Aggregate members (special markers that expand to strategies with matching tags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the default strategy? Some other scenarios have a get_default_strategy function but I don't see one here. I would recommend to make the default PromptSending because it's the one that makes the most sense for the jailbreaks we have in PyRIT so far.

scenario_result_id: Optional[str] = None,
n_jailbreaks: Optional[int] = 3,
k: Optional[int] = None,
n: int = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k and n are not quite self-explanatory... how about we keep n_jailbreaks and add n_attempts as something that's clearer in the code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(btw I do agree with the new default value of None which means all jailbreaks)

n_jailbreaks (Optional[int]): Choose n random jailbreaks rather than using all of them.
k (Optional[int]): Choose k random jailbreaks rather than using all of them.
n (Optional[int]): Number of times to try each jailbreak.
jailbreaks (Optional[int]): Dedicated list of jailbreaks to run.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Should be Optional[List[str]]
  2. Can we clarify in the docstring (or wherever else you think is appropriate) that these are the names of the jailbreaks from our template list in PyRIT? At first glance I thought these are custom jailbreaks, and the strings were the full text of the jailbreaks.

jailbreaks (List[str]): List of jailbreak names.

Raises:
ValueError: If jailbreaks not discovered.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would recommend to make it a bit more explicit (e.g. "if at least one provided jailbreak does not exist").

List[AtomicAttack]: List of atomic attacks to execute, one per jailbreak template.

Raises:
ValueError: If self._jailbreaks is not a subset of all jailbreak templates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically it's true but if I understand correctly, it would be thrown by the TextJailbreakConverter initializer at this point, and it doesn't seem to be caught anywhere, so do we need to mention it here? Also we've already done that validation as part of _validate_jailbreaks_subset so I'm not sure it's worth re-mentioning it here.

Returns:
DatasetConfiguration: Configuration with airt_harms dataset.
"""
return DatasetConfiguration(dataset_names=["airt_harms"], max_dataset_size=4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of a random question, if we are now allowing n_attempts > 1, is it worth also relaxing max_dataset_size=4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants