Skip to content

feat(api): redactSecrets util for LLM input from observability data#2188

Open
alex-fedotyev wants to merge 4 commits intomainfrom
alex/HDX-3992-redact-secrets
Open

feat(api): redactSecrets util for LLM input from observability data#2188
alex-fedotyev wants to merge 4 commits intomainfrom
alex/HDX-3992-redact-secrets

Conversation

@alex-fedotyev
Copy link
Copy Markdown
Contributor

Summary

Adds a reusable best-effort secret redactor at packages/api/src/utils/redactSecrets.ts. Internal-only utility; no consumer in this PR. The next AI-summarize PR (HDX-3992 split) imports it; future LLM endpoints that ingest observability data should also.

The file header codifies the design rule for HyperDX AI endpoints:

Any LLM input derived from observability data passes through redactSecrets before leaving the API process. User-authored prose (the chart-builder assistant where the user types their own question) does NOT, because redacting the user's own input would strip exactly what they meant to ask.

Patterns covered

name shape
pem -----BEGIN ... PRIVATE KEY----- blocks (RSA, EC, DSA, OPENSSH, PKCS#8)
basic-auth-url https://user:pass@host
key-value password=..., api_key=..., token=..., etc.
json-quoted {"password":"..."} and similar
http-header X-Api-Key:, X-Auth-Token:, Api-Key:
bearer Authorization: Bearer ...
basic Authorization: Basic ...
jwt eyJ... three-segment base64url
aws-access-key AKIA[16], ASIA[16]
slack-token xox[a-z]-...
github-token ghp_, gho_, ghu_, ghs_, ghr_

Known gaps (deferred to follow-ups)

  • URL percent-encoded values (beyond what query-string parsing already catches)
  • Vendor-specific tokens not listed above (Stripe, Twilio, Datadog)
  • Generic high-entropy hex blobs (too many false positives without surrounding context)

Why this is its own PR

Splits cleanly from the larger AI summarize work (HDX-3992 / #2108). Lands as a small, isolated, test-heavy change so review is fast and the util is in place before downstream consumers arrive.

Tests

38 cases in packages/api/src/utils/__tests__/redactSecrets.test.ts covering: each pattern with a positive case, a "looks similar but isn't" negative, and at least one multi-secret payload. Pattern-coverage assertion exposes the registry shape so future additions get a compile-time signal.

yarn jest src/utils/__tests__/redactSecrets.test.ts
# Test Suites: 1 passed, 1 total
# Tests:       38 passed, 38 total

All neighboring api utils tests still pass (8 suites, 61 tests).

No user-facing change

The util is internal API code with no production consumer in this PR. No changeset.

References

Adds a reusable best-effort secret redactor with conservative
allowlist patterns covering: PEM blocks, basic-auth URLs, key=value
pairs, JSON-shaped secrets, HTTP secret headers, Bearer/Basic auth
values, JWTs, AWS access keys, Slack tokens, and GitHub token shapes.

Codifies the design rule for HyperDX AI endpoints in the file header:
LLM input derived from observability data passes through redactSecrets;
user-authored prose does not.

Internal-only; no consumer in this commit. Imported by the upcoming
/ai/summarize endpoint and any future LLM endpoints that ingest
observability data.

Refs HDX-3992.
@vercel
Copy link
Copy Markdown

vercel Bot commented May 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
hyperdx-oss Ready Ready Preview, Comment May 5, 2026 4:34am

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 4, 2026

🦋 Changeset detected

Latest commit: 0829acd

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 3 packages
Name Type
@hyperdx/api Patch
@hyperdx/app Patch
@hyperdx/otel-collector Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions github-actions Bot added the review/tier-2 Low risk — AI review + quick human skim label May 4, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

🔵 Tier 2 — Low Risk

Small, isolated change with no API route or data model modifications.

Why this tier:

  • Standard feature/fix — introduces new logic or modifies core functionality

Review process: AI review + quick human skim (target: 5–15 min). Reviewer validates AI assessment and checks for domain-specific concerns.
SLA: Resolve within 4 business hours.

Stats
  • Production files changed: 1
  • Production lines changed: 144 (+ 359 in test files, excluded from tier calculation)
  • Branch: alex/HDX-3992-redact-secrets
  • Author: alex-fedotyev

To override this classification, remove the review/tier-2 label and apply a different review/tier-* label. Manual overrides are preserved on subsequent pushes.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR Review

  • ⚠️ auth in SECRET_KEY_TOKENS will redact auth=success, auth=failed, auth=pending etc. which are common non-secret log values → Consider removing auth from the list or making it require a longer suffix (e.g. require at least 8 non-word chars in the value, or use \bauth_?key\b instead of bare \bauth)

  • ⚠️ Module-level PATTERNS array holds stateful RegExp objects (compiled with g flag). Currently safe because only String.prototype.replace() is used (which resets lastIndex), but any future caller adding .test() or .exec() on these pattern objects will see incorrect results due to stale lastIndex → Worth a short one-line comment on the exported array warning against direct regex use, or compile patterns fresh inside redactSecrets() (minor perf cost but eliminates the footgun entirely)

  • ℹ️ No input length guard — 11 sequential regex passes over a very large payload (e.g. a multi-MB span blob) could be slow. Not a blocker for this PR since the consumer isn't wired yet, but worth adding a max-length check (e.g. early return or truncation) before the downstream AI-summarize PR lands.

✅ No critical security bugs. PEM bounded lazy quantifier ({0,16000}?) correctly prevents ReDoS on unmatched BEGIN blocks, and the test verifies this explicitly. Test coverage is thorough.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

E2E Test Results

All tests passed • 161 passed • 3 skipped • 1143s

Status Count
✅ Passed 161
❌ Failed 0
⚠️ Flaky 2
⏭️ Skipped 3

Tests ran across 4 shards in parallel.

View full report →

Address review comments on #2188:

- basic-auth-url now handles "@" in passwords. Previous regex stopped
  at the first "@", leaving any password tail before the host visible.
  New regex greedily consumes the password and backtracks to the last
  "@" before the host; host is captured and preserved in the
  replacement. New test: a password containing "@" must be fully
  redacted, with the host intact.

- key-value pattern now matches shell-style quoted values:
  PASSWORD="hunter2 with spaces" and API_KEY='abc 123' are redacted.
  Previously the unquoted character class stopped at the leading
  quote, so neither pattern fired. Two new tests cover both quote
  styles.

- pem pattern is bounded by {0,16000}? on the lazy match so an
  unmatched BEGIN does not scan an unbounded amount of trailing
  input. Real PEM blocks are well under 16KB; the API caps the whole
  request body at 50KB. New test asserts unchanged output and
  sub-500ms wall-clock on a 50KB unmatched-BEGIN payload.

- Header "Known gaps" comment now mentions raw "@" in basic-auth
  usernames (ambiguous to parse without percent-encoding).

44 tests pass; eight new cases for the items above. No changes to the
public surface. Refs HDX-3992.
@alex-fedotyev
Copy link
Copy Markdown
Contributor Author

Thanks for the review. Pushed fixes in 9753dc1.

  • ⚠️ basic-auth-url with @ in password — fixed. The pattern now greedily consumes the password and backtracks to the last @ before the host; the host is captured and kept in the replacement. The previous test was too weak (it asserted .toContain('[REDACTED]:[REDACTED]') and .not.toContain('svc:p@ss'), both of which passed even though ss was leaking). Replaced with a stricter assertion: a password containing @ must be fully gone and the host must be intact. Good catch.

  • ⚠️ key-value misses quoted values — fixed. The value group now alternates over "…", '…', and the unquoted class, so PASSWORD="hunter2" and API_KEY='abc 123' both get redacted. Two new tests.

  • ℹ️ PEM lazy match unbounded — fixed. The [\s\S]*? is now bounded by {0,16000}?. Real PEM blocks are well under 16KB; the API also caps the whole request body at 50KB so this is purely defensive. New test asserts unchanged output and sub-500ms wall-clock on a 50KB unmatched-BEGIN payload.

  • Also added a note to the "Known gaps" header about raw @ in basic-auth usernames (ambiguous without percent-encoding).

44 tests pass; eight new cases.

@github-actions github-actions Bot added review/tier-3 Standard — full human review required and removed review/tier-2 Low risk — AI review + quick human skim labels May 4, 2026
The previous review-fix commit pushed prod lines from 139 to 153, just
over the Tier 2 threshold (< 150 prod lines). Compressing the verbose
comments on PEM, basic-auth-url, and key-value patterns brings prod
back to 144. No behavior change.
@github-actions github-actions Bot added review/tier-2 Low risk — AI review + quick human skim and removed review/tier-3 Standard — full human review required labels May 4, 2026
@alex-fedotyev alex-fedotyev marked this pull request as draft May 5, 2026 03:13
Co-Authored-By: Claude Opus <model> <noreply@anthropic.com>
@alex-fedotyev alex-fedotyev marked this pull request as ready for review May 5, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/tier-2 Low risk — AI review + quick human skim

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant