Skip to content

tscore: optional simdutf path for ats_base64 encode/decode#13166

Draft
phongn wants to merge 1 commit into
apache:masterfrom
phongn:simdutf-base64
Draft

tscore: optional simdutf path for ats_base64 encode/decode#13166
phongn wants to merge 1 commit into
apache:masterfrom
phongn:simdutf-base64

Conversation

@phongn
Copy link
Copy Markdown
Collaborator

@phongn phongn commented May 14, 2026

Summary

Wire simdutf in as an opt-in SIMD backend for ats_base64_encode and ats_base64_decode (also exposed via the TSBase64Encode / TSBase64Decode plugin API). Roughly an order-of-magnitude speedup on medium and larger inputs on AVX2 hardware; behavior-preserving for every in-tree caller.

How it's wired

  • auto_option(SIMDUTF FEATURE_VAR TS_USE_SIMDUTF PACKAGE_DEPENDS simdutf) — default AUTO, same shape as HWLOC / UNWIND. Builds without simdutf installed are unaffected and fall back to the scalar path.
  • src/tscore/ink_base64.cc becomes a thin hybrid wrapper: scalar helpers in an anonymous namespace (always compiled), simdutf used only when inBufferSize exceeds an empirically chosen per-direction threshold. Tiny-input cases (e.g. the 8-byte SnowflakeID encode) stay on the scalar path to avoid simdutf's per-call dispatch overhead.
  • include/tscore/ink_config.h.cmake.in gains #cmakedefine01 TS_USE_SIMDUTF.

Performance (Xeon E5-2683 v4, AVX2)

Op Size Scalar only simdutf only Hybrid (this PR)
encode 8 B 15.7 ns 25.5 ns 16.8 ns
encode 32 B 45.8 29.5 30.7
encode 200 B 256 47.9 50.2
encode 4096 B 5128 525 534
decode 12 B b64 21.8 66.5 22.5
decode 44 B b64 70.8 84.3 68.4
decode 268 B b64 385 94.1 113
decode 5464 B b64 7295 583 572

Behavior

Both paths preserve the existing public contract:

  • Encode: standard +/= alphabet, no line breaks, trailing NUL written at outBuffer[length].
  • Decode: accepts both +/ and -_ in the same input, tolerates missing padding, truncates silently on invalid characters, trailing NUL written.
  • In-place decode (used by plugins/experimental/magick) is preserved.

One behavioral delta when the simdutf path is taken: simdutf silently skips ASCII whitespace (space, tab, CR, LF, FF) inside the input, whereas the scalar path stops at the first whitespace byte. None of the in-tree callers feed whitespace to these functions; flagged in the file's header comment.

Test plan

  • Catch2 microbench tools/benchmark/benchmark_ink_base64 covers both correctness and performance. Locks the byte-exact fixture from InkAPITest.cc::SDK_API_ENCODING as a regression test.
  • 46 correctness assertions pass with ENABLE_SIMDUTF=AUTO (hybrid) and ENABLE_SIMDUTF=OFF (scalar-only).
  • cmake --build build -t format clean.
  • Jenkins CI green.
  • Manual smoke of traffic_server against a workload exercising OCSP stapling and the S3 origin_server_auth plugin (encode hot paths).

Notes for reviewers

  • Thresholds (BASE64_ENCODE_SIMD_THRESHOLD=24, BASE64_DECODE_SIMD_THRESHOLD=48) were chosen from the benchmark data and documented in the file. The crossover shifts on different cores but the thresholds are robust within an order of magnitude.
  • The scalar decoder contains a latent out-of-bounds read when inBufferSize is 1 or 2 (the existing inBuffer[-2] access in the trailing-bytes adjustment). I preserved this rather than smuggle in a behavior change. Worth a follow-up issue but out of scope here.

🤖 Generated with Claude Code

The hand-rolled base64 implementation in ink_base64.cc is a measurable
hotspot in places that encode or decode larger payloads (OCSP DER
requests, S3 auth HMACs, signed URL segments). simdutf provides
SIMD-accelerated kernels that run roughly an order of magnitude faster
on medium-and-larger inputs on AVX2/AVX-512 hardware.

Wire simdutf in as an opt-in dependency through the existing
auto_option machinery (ENABLE_SIMDUTF, default AUTO). When the package
is available, the wrapper dispatches to simdutf for inputs above an
empirically chosen threshold and keeps the scalar path for smaller
inputs, where simdutf's per-call overhead would otherwise be a
regression (notably the 8-byte SnowflakeID encode).

Both paths preserve the existing public contract: standard '+/=' encode
alphabet, accepts both '+/' and '-_' on decode in the same call,
tolerates missing padding, truncates silently on invalid input, and
always writes a trailing NUL. A new microbenchmark under tools/benchmark
locks the InkAPITest SDK_API_ENCODING fixture as a regression test and
provides the throughput numbers used to choose the thresholds.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant