Skip to content

gh-151819: Clarify backtracking semantics of (?(id/name)yes|no) in re docs (gh-151819)#151913

Open
Dodothereal wants to merge 1 commit into
python:mainfrom
Dodothereal:docs/gh-151819-document-conditional-reroll
Open

gh-151819: Clarify backtracking semantics of (?(id/name)yes|no) in re docs (gh-151819)#151913
Dodothereal wants to merge 1 commit into
python:mainfrom
Dodothereal:docs/gh-151819-document-conditional-reroll

Conversation

@Dodothereal

@Dodothereal Dodothereal commented Jun 22, 2026

Copy link
Copy Markdown

Fixes #151819

Summary

The :py:doc:re-syntax documentation for (?(id/name)yes-pattern|no-pattern)
contains two issues:

  1. The documented example pattern
    (<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) is claimed to "not match with
    '<user@host.com'". That is not what the regex engine does. With
    that input the engine first consumes < to satisfy group 1,
    cannot match the closing >, and backtracks. Backtracking drops
    the < capture and the engine then matches 'user@host.com'
    starting at position 1, with group(1) is None.

  2. The general interaction between the conditional construct and
    backtracking is undocumented. The simplest illustration is
    re.search('(<)?\\w+(?(1)>)', '<3'): it returns match='3' at
    span=(1, 2) with group(1) is None.

This PR corrects the email example wording and adds a paragraph that
documents the backtracking-clears-the-capture behavior, plus a
regression test that locks in the visible behavior across the
relevant cases.

Test plan

  • New ReTests.test_re_conditional_drops_capture_on_backtrack in
    Lib/test/test_re.py exercises the issue reproducer and several
    closely related inputs. All four assertions were verified locally
    against the regex engine; the test will be run as part of the CPython
    CI test suite for this PR.
  • No C code or build-affecting change. Documentation-only and tests.

AI assistance

This patch was drafted with the help of an AI coding assistant. The
diff was reviewed line by line before submission and the regex
behavior was verified independently against CPython's re engine.
Per the CPython AI policy this disclosure is offered; it is not
required.

…ongh-151819)

Closes python#151819

The (?(id/name)yes-pattern|no-pattern) documentation claims an
example pattern (<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) will not match
'<user@host.com'.  In fact the engine matches 'user@host.com'
because the leading < capture group is rerolled when the
yes-pattern cannot consume the trailing >.  The same backtracking
behaviour occurs in simpler cases such as (<)?\w+(?(1)>) matching
only '3' from '<3'.

This change documents the backtracking semantics explicitly and
corrects the embedded example.  Adds a regression test that locks
in the visible behavior.
@bedevere-app bedevere-app Bot added the tests Tests in the Lib/test dir label Jun 22, 2026
@python-cla-bot

Copy link
Copy Markdown

The following commit authors need to sign the Contributor License Agreement:

CLA not signed

@read-the-docs-community

Copy link
Copy Markdown

Documentation build overview

📚 cpython-previews | 🛠️ Build #33244964 | 📁 Comparing ba3cc47 against main (476b649)

  🔍 Preview build  

2 files changed
± library/re.html
± whatsnew/changelog.html

@StanFromIreland StanFromIreland changed the title docs: clarify backtracking semantics of (?(id/name)yes|no) (gh-151819) gh-151819: Clarify backtracking semantics of (?(id/name)yes|no) in re docs (gh-151819) Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review tests Tests in the Lib/test dir

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regex yes/no-pattern has undocumented implication

1 participant