feat(api): redactSecrets util for LLM input from observability data by alex-fedotyev · Pull Request #2188 · hyperdxio/hyperdx

alex-fedotyev · 2026-05-04T22:53:10Z

Summary

Adds a reusable best-effort secret redactor at packages/api/src/utils/redactSecrets.ts. Internal-only utility; no consumer in this PR. The next AI-summarize PR (HDX-3992 split) imports it; future LLM endpoints that ingest observability data should also.

The file header codifies the design rule for HyperDX AI endpoints:

Any LLM input derived from observability data passes through redactSecrets before leaving the API process. User-authored prose (the chart-builder assistant where the user types their own question) does NOT, because redacting the user's own input would strip exactly what they meant to ask.

Patterns covered

name	shape
`pem`	`-----BEGIN ... PRIVATE KEY-----` blocks (RSA, EC, DSA, OPENSSH, PKCS#8)
`basic-auth-url`	`https://user:pass@host`
`key-value`	`password=...`, `api_key=...`, `token=...`, etc.
`json-quoted`	`{"password":"..."}` and similar
`http-header`	`X-Api-Key:`, `X-Auth-Token:`, `Api-Key:`
`bearer`	`Authorization: Bearer ...`
`basic`	`Authorization: Basic ...`
`jwt`	`eyJ...` three-segment base64url
`aws-access-key`	`AKIA[16]`, `ASIA[16]`
`slack-token`	`xox[a-z]-...`
`github-token`	`ghp_`, `gho_`, `ghu_`, `ghs_`, `ghr_`

Known gaps (deferred to follow-ups)

URL percent-encoded values (beyond what query-string parsing already catches)
Vendor-specific tokens not listed above (Stripe, Twilio, Datadog)
Generic high-entropy hex blobs (too many false positives without surrounding context)

Why this is its own PR

Splits cleanly from the larger AI summarize work (HDX-3992 / #2108). Lands as a small, isolated, test-heavy change so review is fast and the util is in place before downstream consumers arrive.

Tests

38 cases in packages/api/src/utils/__tests__/redactSecrets.test.ts covering: each pattern with a positive case, a "looks similar but isn't" negative, and at least one multi-secret payload. Pattern-coverage assertion exposes the registry shape so future additions get a compile-time signal.

yarn jest src/utils/__tests__/redactSecrets.test.ts
# Test Suites: 1 passed, 1 total
# Tests:       38 passed, 38 total

All neighboring api utils tests still pass (8 suites, 61 tests).

No user-facing change

The util is internal API code with no production consumer in this PR. No changeset.

References

Linear: HDX-3992
Related (do-not-merge while we split): feat: AI summarize with extensible subjects, trace context, and security hardening #2108

Adds a reusable best-effort secret redactor with conservative allowlist patterns covering: PEM blocks, basic-auth URLs, key=value pairs, JSON-shaped secrets, HTTP secret headers, Bearer/Basic auth values, JWTs, AWS access keys, Slack tokens, and GitHub token shapes. Codifies the design rule for HyperDX AI endpoints in the file header: LLM input derived from observability data passes through redactSecrets; user-authored prose does not. Internal-only; no consumer in this commit. Imported by the upcoming /ai/summarize endpoint and any future LLM endpoints that ingest observability data. Refs HDX-3992.

vercel · 2026-05-04T22:53:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
hyperdx-oss	Ready	Preview, Comment	May 5, 2026 4:34am

changeset-bot · 2026-05-04T22:53:19Z

🦋 Changeset detected

Latest commit: 0829acd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages

Name	Type
@hyperdx/api	Patch
@hyperdx/app	Patch
@hyperdx/otel-collector	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2026-05-04T22:53:37Z

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats

Production files changed: 1
Production lines changed: 144 (+ 359 in test files, excluded from tier calculation)
Branch: alex/HDX-3992-redact-secrets
Author: alex-fedotyev

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

github-actions · 2026-05-04T22:55:50Z

PR Review

⚠️ auth in SECRET_KEY_TOKENS will redact auth=success, auth=failed, auth=pending etc. which are common non-secret log values → Consider removing auth from the list or making it require a longer suffix (e.g. require at least 8 non-word chars in the value, or use \bauth_?key\b instead of bare \bauth)
⚠️ Module-level PATTERNS array holds stateful RegExp objects (compiled with g flag). Currently safe because only String.prototype.replace() is used (which resets lastIndex), but any future caller adding .test() or .exec() on these pattern objects will see incorrect results due to stale lastIndex → Worth a short one-line comment on the exported array warning against direct regex use, or compile patterns fresh inside redactSecrets() (minor perf cost but eliminates the footgun entirely)
ℹ️ No input length guard — 11 sequential regex passes over a very large payload (e.g. a multi-MB span blob) could be slow. Not a blocker for this PR since the consumer isn't wired yet, but worth adding a max-length check (e.g. early return or truncation) before the downstream AI-summarize PR lands.

✅ No critical security bugs. PEM bounded lazy quantifier ({0,16000}?) correctly prevents ReDoS on unmatched BEGIN blocks, and the test verifies this explicitly. Test coverage is thorough.

github-actions · 2026-05-04T23:01:07Z

E2E Test Results

✅ All tests passed • 161 passed • 3 skipped • 1143s

Status	Count
✅ Passed	161
❌ Failed	0
⚠️ Flaky	2
⏭️ Skipped	3

Tests ran across 4 shards in parallel.

View full report →

Address review comments on #2188: - basic-auth-url now handles "@" in passwords. Previous regex stopped at the first "@", leaving any password tail before the host visible. New regex greedily consumes the password and backtracks to the last "@" before the host; host is captured and preserved in the replacement. New test: a password containing "@" must be fully redacted, with the host intact. - key-value pattern now matches shell-style quoted values: PASSWORD="hunter2 with spaces" and API_KEY='abc 123' are redacted. Previously the unquoted character class stopped at the leading quote, so neither pattern fired. Two new tests cover both quote styles. - pem pattern is bounded by {0,16000}? on the lazy match so an unmatched BEGIN does not scan an unbounded amount of trailing input. Real PEM blocks are well under 16KB; the API caps the whole request body at 50KB. New test asserts unchanged output and sub-500ms wall-clock on a 50KB unmatched-BEGIN payload. - Header "Known gaps" comment now mentions raw "@" in basic-auth usernames (ambiguous to parse without percent-encoding). 44 tests pass; eight new cases for the items above. No changes to the public surface. Refs HDX-3992.

alex-fedotyev · 2026-05-04T23:24:42Z

Thanks for the review. Pushed fixes in 9753dc1.

⚠️ basic-auth-url with @ in password — fixed. The pattern now greedily consumes the password and backtracks to the last @ before the host; the host is captured and kept in the replacement. The previous test was too weak (it asserted .toContain('[REDACTED]:[REDACTED]') and .not.toContain('svc:p@ss'), both of which passed even though ss was leaking). Replaced with a stricter assertion: a password containing @ must be fully gone and the host must be intact. Good catch.
⚠️ key-value misses quoted values — fixed. The value group now alternates over "…", '…', and the unquoted class, so PASSWORD="hunter2" and API_KEY='abc 123' both get redacted. Two new tests.
ℹ️ PEM lazy match unbounded — fixed. The [\s\S]*? is now bounded by {0,16000}?. Real PEM blocks are well under 16KB; the API also caps the whole request body at 50KB so this is purely defensive. New test asserts unchanged output and sub-500ms wall-clock on a 50KB unmatched-BEGIN payload.
Also added a note to the "Known gaps" header about raw @ in basic-auth usernames (ambiguous without percent-encoding).

44 tests pass; eight new cases.

The previous review-fix commit pushed prod lines from 139 to 153, just over the Tier 2 threshold (< 150 prod lines). Compressing the verbose comments on PEM, basic-auth-url, and key-value patterns brings prod back to 144. No behavior change.

Co-Authored-By: Claude Opus <model> <noreply@anthropic.com>

github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 4, 2026

vercel Bot deployed to Preview May 4, 2026 22:55 View deployment

github-actions Bot added review/tier-3 Standard — full human review required and removed review/tier-2 Low risk — AI review + quick human skim labels May 4, 2026

vercel Bot deployed to Preview May 4, 2026 23:27 View deployment

github-actions Bot added review/tier-2 Low risk — AI review + quick human skim and removed review/tier-3 Standard — full human review required labels May 4, 2026

vercel Bot deployed to Preview May 4, 2026 23:41 View deployment

alex-fedotyev marked this pull request as draft May 5, 2026 03:13

chore: add changeset for redactSecrets utility

0829acd

Co-Authored-By: Claude Opus <model> <noreply@anthropic.com>

alex-fedotyev marked this pull request as ready for review May 5, 2026 04:32

vercel Bot deployed to Preview May 5, 2026 04:34 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): redactSecrets util for LLM input from observability data#2188

feat(api): redactSecrets util for LLM input from observability data#2188
alex-fedotyev wants to merge 4 commits intomainfrom
alex/HDX-3992-redact-secrets

alex-fedotyev commented May 4, 2026

Uh oh!

vercel Bot commented May 4, 2026 •

edited

Loading

Uh oh!

changeset-bot Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 4, 2026 •

edited

Loading

Uh oh!

alex-fedotyev commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alex-fedotyev commented May 4, 2026

Summary

Patterns covered

Known gaps (deferred to follow-ups)

Why this is its own PR

Tests

No user-facing change

References

Uh oh!

vercel Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔵 Tier 2 — Low Risk

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review

Uh oh!

github-actions Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Test Results

Uh oh!

alex-fedotyev commented May 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 4, 2026 •

edited

Loading

changeset-bot Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading

github-actions Bot commented May 4, 2026 •

edited

Loading