Skip to content

Fix Base58 dropping leading zero bytes#44

Open
gaoflow wants to merge 1 commit into
dhondta:mainfrom
gaoflow:fix-base58-leading-zeros
Open

Fix Base58 dropping leading zero bytes#44
gaoflow wants to merge 1 commit into
dhondta:mainfrom
gaoflow:fix-base58-leading-zeros

Conversation

@gaoflow

@gaoflow gaoflow commented Jun 15, 2026

Copy link
Copy Markdown

Problem

Base58 (and the other big-integer base codecs) silently drop leading null bytes:

import codext
codext.encode(b"\x00abc", "base58")  # 'ZiCa'  -> should be '1ZiCa'
codext.encode(b"\x00",    "base58")  # ''      -> should be '1'

base_encode/base_decode in src/codext/base/_base.py convert the whole input to a single integer (s2i) and back via divmod, so leading 0x00 bytes (high-order zeros) vanish. Per the Base58 specification the codec cites (and every reference implementation, e.g. the base58 PyPI library / Bitcoin Core), each leading 0x00 byte must map to a leading charset[0] character ('1' for the bitcoin alphabet). This also broke round-tripping for any value beginning with a null byte.

Fix

Preserve the leading-zero count on encode (prepend one charset[0] per leading \x00) and restore it on decode (prepend one \x00 per leading charset[0]). Both changes are guarded to the byte-input path so the integer recode used internally is untouched.

codext.encode(b"\x00abc", "base58")        # '1ZiCa'
codext.decode("1ZiCa", "base58")           # b'\x00abc'
codext.encode(b"\x01\x00", "base58")       # '5R'  (internal/trailing zeros unaffected)

Verified against the base58 reference library: 0 mismatches and 0 round-trip failures across random inputs (every leading-zero input failed before).

Test

Extended test_codec_base58 in tests/test_base.py with leading-null-byte encode/decode/round-trip assertions (str and bytes paths). Verified red→green: the test fails without the source change (AssertionError) and passes with it; the full test suite stays green (103 passed).


Disclosure: I use AI assistance (under my direction) for my contributions; I review and verify every change before submitting.

The generic base_encode/base_decode convert the whole input to a single
integer, so leading null bytes (high-order zeros) were silently lost: e.g.
Base58 encoded b'\x00abc' to 'ZiCa' instead of '1ZiCa', and b'\x00' to an
empty string. Per the Base58 spec each leading 0x00 byte maps to a leading
charset[0] character. Preserve the leading-zero count on encode and restore
it on decode, so values round-trip and match reference implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant