tscore: optional simdutf path for ats_base64 encode/decode#13166
Draft
phongn wants to merge 1 commit into
Draft
Conversation
The hand-rolled base64 implementation in ink_base64.cc is a measurable hotspot in places that encode or decode larger payloads (OCSP DER requests, S3 auth HMACs, signed URL segments). simdutf provides SIMD-accelerated kernels that run roughly an order of magnitude faster on medium-and-larger inputs on AVX2/AVX-512 hardware. Wire simdutf in as an opt-in dependency through the existing auto_option machinery (ENABLE_SIMDUTF, default AUTO). When the package is available, the wrapper dispatches to simdutf for inputs above an empirically chosen threshold and keeps the scalar path for smaller inputs, where simdutf's per-call overhead would otherwise be a regression (notably the 8-byte SnowflakeID encode). Both paths preserve the existing public contract: standard '+/=' encode alphabet, accepts both '+/' and '-_' on decode in the same call, tolerates missing padding, truncates silently on invalid input, and always writes a trailing NUL. A new microbenchmark under tools/benchmark locks the InkAPITest SDK_API_ENCODING fixture as a regression test and provides the throughput numbers used to choose the thresholds. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire simdutf in as an opt-in SIMD backend for
ats_base64_encodeandats_base64_decode(also exposed via theTSBase64Encode/TSBase64Decodeplugin API). Roughly an order-of-magnitude speedup on medium and larger inputs on AVX2 hardware; behavior-preserving for every in-tree caller.How it's wired
auto_option(SIMDUTF FEATURE_VAR TS_USE_SIMDUTF PACKAGE_DEPENDS simdutf)— defaultAUTO, same shape asHWLOC/UNWIND. Builds without simdutf installed are unaffected and fall back to the scalar path.src/tscore/ink_base64.ccbecomes a thin hybrid wrapper: scalar helpers in an anonymous namespace (always compiled), simdutf used only wheninBufferSizeexceeds an empirically chosen per-direction threshold. Tiny-input cases (e.g. the 8-byteSnowflakeIDencode) stay on the scalar path to avoid simdutf's per-call dispatch overhead.include/tscore/ink_config.h.cmake.ingains#cmakedefine01 TS_USE_SIMDUTF.Performance (Xeon E5-2683 v4, AVX2)
Behavior
Both paths preserve the existing public contract:
+/=alphabet, no line breaks, trailing NUL written atoutBuffer[length].+/and-_in the same input, tolerates missing padding, truncates silently on invalid characters, trailing NUL written.plugins/experimental/magick) is preserved.One behavioral delta when the simdutf path is taken: simdutf silently skips ASCII whitespace (space, tab, CR, LF, FF) inside the input, whereas the scalar path stops at the first whitespace byte. None of the in-tree callers feed whitespace to these functions; flagged in the file's header comment.
Test plan
tools/benchmark/benchmark_ink_base64covers both correctness and performance. Locks the byte-exact fixture fromInkAPITest.cc::SDK_API_ENCODINGas a regression test.ENABLE_SIMDUTF=AUTO(hybrid) andENABLE_SIMDUTF=OFF(scalar-only).cmake --build build -t formatclean.traffic_serveragainst a workload exercising OCSP stapling and the S3origin_server_authplugin (encode hot paths).Notes for reviewers
BASE64_ENCODE_SIMD_THRESHOLD=24,BASE64_DECODE_SIMD_THRESHOLD=48) were chosen from the benchmark data and documented in the file. The crossover shifts on different cores but the thresholds are robust within an order of magnitude.inBufferSizeis 1 or 2 (the existinginBuffer[-2]access in the trailing-bytes adjustment). I preserved this rather than smuggle in a behavior change. Worth a follow-up issue but out of scope here.🤖 Generated with Claude Code