Fix Base45 dropping trailing bytes on non-ASCII input by gaoflow · Pull Request #45 · dhondta/python-codext

gaoflow · 2026-06-15T22:50:31Z

Problem

The Base45 codec silently drops trailing bytes when the input contains non-ASCII content, producing output that is too short and no longer round-trips.

>>> import codext
>>> codext.encode(b'\xcf\xb1\x1b', 'base45')
b'OBQ'          # should be b'OBQR0'
>>> codext.decode(codext.encode(b'\xcf\xb1\x1b', 'base45'), 'base45')
b'\xcf\xb1'      # the trailing 0x1b byte is lost

Cause

base45_encode/base45_decode iterate with range(0, len(text), step) but index into t = b(text). Because the codec layer converts a bytes input to str (UTF-8) before the codec runs, b(text) is longer than text whenever the content is non-ASCII, so len(text) ends the loop early and the final group is never emitted. For b'\xcf\xb1\x1b' (\xcf\xb1 decodes to the single character U+03F1), text has length 2 while the byte sequence has length 3, dropping the third byte.

Fix

Iterate over len(t) (the actual byte sequence) in both functions. Encoded output now matches RFC 9285 and the reference base45 implementation, and encoding round-trips for arbitrary byte input.

Tests

Added test_codec_base45 covering the RFC 9285 vectors (AB, Hello!!, base-45), the exact regression value (b'\xcf\xb1\x1b' -> b'OBQR0'), and round-trips for binary inputs. The full test suite passes.

I worked on this with AI assistance under my direction and reviewed the change myself.

The Base45 encoder and decoder iterated with range(0, len(text), step) while indexing into t = b(text). codext converts a bytes input to str (UTF-8) before the codec runs, so for any non-ASCII content b(text) is longer than text and len(text) stops the loop early, silently dropping the trailing byte(s). For example encode(b'\xcf\xb1\x1b') returned 'OBQ' instead of 'OBQR0' and the value no longer round-tripped. Iterate over len(t) (the actual byte sequence) instead. Output now matches RFC 9285 and the reference base45 implementation, and encoding round-trips for arbitrary byte input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Base45 dropping trailing bytes on non-ASCII input#45

Fix Base45 dropping trailing bytes on non-ASCII input#45
gaoflow wants to merge 1 commit into
dhondta:mainfrom
gaoflow:fix-base45-multibyte-dataloss

gaoflow commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gaoflow commented Jun 15, 2026

Problem

Cause

Fix

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant