fix: correct \c escape handling in replacements (#403)#449
Conversation
GNU sed takes the character after \c raw, with one exception: a backslash may be escaped as \\ to denote a literal backslash, so \c\\ yields Ctrl-\ (0x1c). Any other backslash escape after \c is rejected with 'recursive escaping after \c not allowed'. Previously \c\\ errored as an unterminated replacement and \c\d was silently accepted as 'd'. parse_char_escape now consumes the escaped backslash correctly and returns a compilation error for recursive escaping; its signature becomes UResult<Option<char>> so the error can propagate through all call sites (regex, replacement, transliteration, text commands, character class). Verified against GNU sed 4.10. Adds unit tests for \cx, \c\\, \c\d, and trailing-backslash cases. Closes uutils#403
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #449 +/- ##
==========================================
+ Coverage 82.20% 82.31% +0.10%
==========================================
Files 13 13
Lines 5542 5582 +40
Branches 310 313 +3
==========================================
+ Hits 4556 4595 +39
- Misses 983 984 +1
Partials 3 3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dspinellis
left a comment
There was a problem hiding this comment.
Thanks. Please see the comments. When done please force-push the squashed commits.
|
|
||
| 'c' => { | ||
| // Control character escape: \cC | ||
| // Control character escape: \cX. The character after \c is taken |
There was a problem hiding this comment.
There is considerable logic here. Consider abstracting it into a new function parse_control_char() in delimited_parser.rs, as we do with parse_numeric_escape().
| // Nothing to control-escape; treat the lone 'c' literally. | ||
| Some('c') | ||
| } else if line.current() == '\\' { | ||
| line.advance(); // consume the first backslash |
There was a problem hiding this comment.
Write as complete sentence: Consume the first backslash.
| } | ||
| // \\ -> literal backslash as the control argument. | ||
| let decoded = create_control_char('\\'); | ||
| line.advance(); // consume the second backslash |
There was a problem hiding this comment.
See above regarding comment. Follow this in all comments.
| 'c' => { | ||
| // Control character escape: \cC | ||
| // Control character escape: \cX. The character after \c is taken | ||
| // raw (no further escape processing). GNU sed allows exactly one |
There was a problem hiding this comment.
Please specify GNU sed version.
| } | ||
|
|
||
| #[test] | ||
| fn test_control_escape_escaped_backslash() { |
There was a problem hiding this comment.
You also need to test at EOL, right?
Summary
Fixes the two
\creplacement-escape bugs in #403, verified against GNU sed 4.10.\cXtakes the character after\craw. GNU allows exactly one exception: a backslash escaped as\\denotes a literal backslash, so\c\\yields Ctrl-\(0x1c). Any other backslash escape after\cis rejected.Before:
After (matches GNU sed 4.10):
Implementation
\carm ofparse_char_escapenow consumes its argument correctly: a leading\must be followed by another\(literal backslash → control char); otherwise it raises the recursive-escaping error.parse_char_escapereturnsUResult<Option<char>>so the error propagates. All five call sites (regex, replacement, transliteration, character class, GNU text commands) threadlinesand?.\ckeeps its prior literal-cfallback (byte-level GNU behavior is out of scope for a char-based parser).Testing
cargo test: full suite green (285 unit + 181 integration).cargo clippy --all-targets -- -D warningsandcargo fmt --check: clean.\cx,\c\\,\c\d, and trailing-backslash.\cA,\c\\,\cx,\c\d, and mixed cases.Out of scope
\cimmediately followed by the closing delimiter (e.g.s/./\c/) is a separate, delimiter-dependent edge case (GNU emits a literal\); it already errored onmainand is not part of #403.Closes #403