Skip to content

Decode base64/quoted-printable a block at a time#62

Open
iliaal wants to merge 1 commit into
php:masterfrom
iliaal:pr/bulk-decode
Open

Decode base64/quoted-printable a block at a time#62
iliaal wants to merge 1 commit into
php:masterfrom
iliaal:pr/bulk-decode

Conversation

@iliaal
Copy link
Copy Markdown

@iliaal iliaal commented Jun 7, 2026

Transfer-decoding during extraction ran every input byte through two function pointers (the per-char filter function and its output callback) plus a smart_string append per output byte. For a multi-megabyte base64 or quoted-printable body that is millions of indirect calls.

This adds buffer-at-a-time decoders (mb_convert_filter_feed_block / _flush_block) that consume a whole input block in one tight loop and append decoded bytes straight to the work buffer, reusing the filter's status/cache as the carry state between blocks. The decoded output is byte-for-byte identical to the per-char path. php_mimepart_decoder_feed/_finish now use them, and the now-unused filter_into_work_buffer output callback is removed. The per-char encode path used by mailparse_stream_encode is unchanged.

Verified byte-identical output against the current master build across 61 base64/QP cases (padding variants, embedded whitespace, invalid chars, partial quartets, soft line breaks, lower/upper hex, all 256 byte values), plus 4000 random round-trips. Decoding a 16 MB base64 body in a debug build went from 349 ms to 197 ms (~1.8x).

Transfer-decoding during extraction ran one byte through two function
pointers (the per-char filter function and its output callback) and a
smart_string append for every output byte. For a multi-megabyte base64
or quoted-printable body that is millions of indirect calls.

Add buffer-at-a-time decoders that consume a whole input block in one
loop and append decoded bytes straight to the work buffer, using the
filter's status/cache as the carry state between blocks. The output is
byte-for-byte identical to the per-char path. php_mimepart_decoder_feed
and _finish now use these; the now-unused filter_into_work_buffer
output callback is removed. The per-char encode path used by
mailparse_stream_encode is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant