Skip to content

Commit ba3cc47

Browse files
committed
docs: document backtracking clears capture in conditional regex (gh-151819)
Closes #151819 The (?(id/name)yes-pattern|no-pattern) documentation claims an example pattern (<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$) will not match '<user@host.com'. In fact the engine matches 'user@host.com' because the leading < capture group is rerolled when the yes-pattern cannot consume the trailing >. The same backtracking behaviour occurs in simpler cases such as (<)?\w+(?(1)>) matching only '3' from '<3'. This change documents the backtracking semantics explicitly and corrects the embedded example. Adds a regression test that locks in the visible behavior.
1 parent 476b649 commit ba3cc47

3 files changed

Lines changed: 46 additions & 3 deletions

File tree

Doc/library/re.rst

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -508,9 +508,17 @@ The special characters are:
508508
Will try to match with ``yes-pattern`` if the group with given *id* or
509509
*name* exists, and with ``no-pattern`` if it doesn't. ``no-pattern`` is
510510
optional and can be omitted. For example,
511-
``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a poor email matching pattern, which
512-
will match with ``'<user@host.com>'`` as well as ``'user@host.com'``, but
513-
not with ``'<user@host.com'`` nor ``'user@host.com>'``.
511+
``(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)`` is a poor email matching pattern,
512+
which will match with ``'<user@host.com>'`` as well as ``'user@host.com'``,
513+
and will not match with ``'user@host.com>'``.
514+
515+
Note that when ``yes-pattern`` is not matched while the captured group
516+
was set, backtracking clears the capture (the optional group falls
517+
back to its no-match state). For example,
518+
``(<)?\w+(?(1)>)`` applied to ``'<3'`` matches only ``'3'`` at
519+
position 1 with ``group(1) is None``: the engine first consumes the
520+
leading ``<`` to satisfy group 1, fails to match ``>`` at position
521+
2, then retries without consuming ``<``.
514522

515523
.. versionchanged:: 3.12
516524
Group *id* can only contain ASCII digits.

Lib/test/test_re.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -706,6 +706,37 @@ def test_re_groupref_exists_errors(self):
706706
self.checkPatternError(r'()(?(2)a)',
707707
"invalid group reference 2", 5)
708708

709+
def test_re_conditional_drops_capture_on_backtrack(self):
710+
# Issue: a captured optional group is cleared when backtracking
711+
# causes the ``yes-pattern`` of a (?(id/name)yes|no) construct
712+
# to not match after the capture was set. See:
713+
# https://github.com/python/cpython/issues/151819
714+
# Minimal reproduction from the issue:
715+
m = re.search(r'(<)?\w+(?(1)>)', '<3')
716+
self.assertEqual(m.group(), '3')
717+
self.assertEqual(m.span(), (1, 2))
718+
self.assertEqual(m.group(1), None)
719+
720+
# The successful case keeps the capture intact:
721+
m = re.search(r'(<)?\w+(?(1)>)', '<body>')
722+
self.assertEqual(m.group(), '<body>')
723+
self.assertEqual(m.span(), (0, 6))
724+
self.assertEqual(m.group(1), '<')
725+
726+
# Same effect with ``\w`` style groups and a longer input:
727+
m = re.search(r'(<)?[A-Za-z]+(?(1)>)', '<abcXYZ')
728+
self.assertEqual(m.group(), 'abcXYZ')
729+
self.assertEqual(m.span(), (1, 7))
730+
self.assertEqual(m.group(1), None)
731+
732+
# The pattern documented in Re.rst: with yes-pattern failing
733+
# the leading "<" is rerolled.
734+
m = re.search(r'(<)?(\w+@\w+(?:\.\w+)+)(?(1)>|$)', '<user@host.com')
735+
self.assertEqual(m.group(), 'user@host.com')
736+
self.assertEqual(m.span(), (1, 14))
737+
self.assertEqual(m.group(1), None)
738+
self.assertEqual(m.group(2), 'user@host.com')
739+
709740
def test_re_groupref_exists_validation_bug(self):
710741
for i in range(256):
711742
with self.subTest(code=i):
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Clarify the :ref:`re-syntax` for ``(?(id/name)yes-pattern|no-pattern)``:
2+
add a paragraph documenting that backtracking can clear an optional
3+
capture group whose ``yes-pattern`` fails to match. Also correct the
4+
embedded example pattern's expected matches. Issue #151819.

0 commit comments

Comments
 (0)