Skip to content

Scrub invalid UTF-8 in FailureFormatter#to_s#403

Merged
acaron merged 1 commit intomainfrom
scrub-invalid-utf8-failure-formatter
Apr 30, 2026
Merged

Scrub invalid UTF-8 in FailureFormatter#to_s#403
acaron merged 1 commit intomainfrom
scrub-invalid-utf8-failure-formatter

Conversation

@acaron
Copy link
Copy Markdown
Member

@acaron acaron commented Apr 29, 2026

Problem

Minitest::Queue::FailureFormatter#to_s uses String#encode!(Encoding::UTF_8, invalid: :replace, undef: :replace) to sanitize test output before it gets JSON-serialized in BuildStatusReporter#write_failure_file. Per the Ruby docs, encode! is a no-op when the source and destination encodings are the same. When test output contains a UTF-8-tagged string with invalid byte sequences — common when WebMock surfaces a protobuf request body in an error message — the call silently leaves the bad bytes intact.

JSON.dump then crashes with Encoding::UndefinedConversionError, and the entire minitest-queue report step exits before writing any failure file. Builds with many failures lose every single failure record, breaking flake-triage workflows.

Fix

Add scrub! after encode! to catch the case encode! misses. encode! still handles transcoding from non-UTF-8 sources (ASCII-8BIT, ISO-8859-1, etc.), and scrub! cleans any remaining invalid byte sequences in UTF-8-tagged strings. Both use the same default replacement character (U+FFFD).

Reference

Original ASCII-8BIT fix: ed4ae26 — this PR closes the remaining gap.

Closes https://github.com/shop/issues/issues/39402

co-authored by AI

@acaron acaron force-pushed the scrub-invalid-utf8-failure-formatter branch from 82e79e0 to 546dc6b Compare April 29, 2026 23:11
`String#encode!(Encoding::UTF_8, invalid: :replace, ...)` is a no-op when
the source encoding is already UTF-8. When `body` interpolation produces
a UTF-8-tagged string with invalid byte sequences (e.g. binary protobuf
bodies surfaced through WebMock failure messages), the existing fix lets
those bytes through and `JSON.dump` later crashes in
`BuildStatusReporter#write_failure_file` with
`Encoding::UndefinedConversionError`, taking down the entire report step.

Switch to `force_encoding(Encoding::UTF_8) + scrub!` which handles both
the original ASCII-8BIT-source case and the new UTF-8-tagged-with-invalid-
bytes case. Adds a regression test for the second case alongside the
existing one.
@acaron acaron force-pushed the scrub-invalid-utf8-failure-formatter branch from 546dc6b to ac53686 Compare April 29, 2026 23:12
@acaron acaron merged commit c01d784 into main Apr 30, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants