Sensitive data (credentials, PII, PHI, and other private information) ends up in logs more often than it should.
data-sanitization masks or removes sensitive field values before they leave your application.
Use it in log pipelines, request handlers, and error reporters to catch what might otherwise slip through.
It matches field names across objects, arrays, and strings, and lets you extend the built-in defaults with your own patterns for PII, PHI, or any domain-specific fields.
const input = {
username: 'mark',
password: 'super-secret',
api_key: 'sk_live_abc123',
};
sanitizeData(input);
// => { username: 'mark', password: '**********', api_key: '**********' }- Zero runtime dependencies, with compiled JS and full TypeScript declarations
- Sanitizes nested structures at any depth, preserving types and class instances
- Matches sensitive field names across any data shape without requiring exact path declarations
- Detects circular references and throws without leaking input; never silently returns partial data
- Sanitization errors never expose the original input payload
- Drop-in adapters for pino and winston via
data-sanitization-log-providers
Tools like fast-redact and
pino's built-in redaction are excellent choices when
you control your data shape. They require you to declare the exact paths to
redact upfront (user.password, req.headers.authorization) and compile
those paths into a specialized function at initialization, achieving near-zero
overhead.
The tradeoff is that you must know the shape of your data ahead of time. That works well for application-level logging where you own the data models, but falls short when sanitizing third-party library payloads, error objects with arbitrary attached metadata, or log entries assembled from sources you don't control.
data-sanitization takes a pattern-based approach instead. A single
'password' entry matches password, db_password, resetPasswordToken,
and any other key containing that substring, at any depth, in any structure,
without path declarations. The cost is a small per-call overhead versus
path-based tools; the benefit is that it works on data whose shape you don't
fully know.
If you control your data shape exactly and need maximum throughput, reach for
fast-redact. If you need to sanitize data you don't fully control,
data-sanitization is the right tool.
data-sanitization is a best-effort defensive layer, not a security boundary or compliance
proof.
It will miss sensitive data when:
- A field name is not covered by the configured patterns
- Values appear in unsupported serialization formats (binary blobs, protocol buffers, custom encoding)
- Sensitive content is embedded inside values in ways the configured matchers cannot recognize
- The input arrives in a form
sanitizeDatacannot introspect (encrypted payloads, opaque strings)
Masking also leaks that a field is present and sensitive. If minimizing that signal matters, use
removeMatches: true rather than the default mask.
Use data-sanitization to catch accidental leakage in logs and request payloads, not as a
substitute for access controls, network security, or data-handling policies.
data-sanitization-log-providers
is a companion package with pre-built adapters that wire data-sanitization directly into your
logging pipeline:
| Adapter | Import path | How it works |
|---|---|---|
| Pino hook | data-sanitization-log-providers/pino-hook |
Registers a pino.hooks.logMethod hook that sanitizes arguments before they reach pino |
| Pino transport | data-sanitization-log-providers/pino-transport |
A pino-abstract-transport stream you can pass to pino({ transport: ... }) |
| Winston transport | data-sanitization-log-providers/winston |
A winston-transport subclass that sanitizes each log entry before forwarding it |
Install the companion package alongside your logger:
npm install data-sanitization-log-providersSee the data-sanitization-log-providers README for usage examples and configuration options.
- data-sanitization: protect credentials and personal data from accidental exposure
npm install data-sanitizationyarn add data-sanitizationpnpm add data-sanitizationbun add data-sanitizationimport { sanitizeData, DataSanitizationError } from 'data-sanitization';import sanitizeData from 'data-sanitization';const { sanitizeData } = require('data-sanitization');Utility helpers for log middleware are available on a separate subpath; see docs/utils.md.
import {
diffSanitizedFields,
buildSanitizedWarning,
} from 'data-sanitization/utils';import { sanitizeData } from 'data-sanitization';
const input = {
username: 'mark',
password: 'super-secret',
api_key: 'sk_live_abc123',
};
const result = sanitizeData(input);
// => { username: 'mark', password: '**********', api_key: '**********' }Pass a string directly and it will be sanitized in place. This is useful for sanitizing serialized data before logging. For example, a raw request body, a form-encoded payload, or a JSON string you have not yet parsed:
sanitizeData('{"password":"secret","username":"mark"}');
// => '{"password":"**********","username":"mark"}'
sanitizeData('password=secret&username=mark');
// => 'password=**********&username=mark'By default, valid JSON object and array strings are parsed first and sanitized the same way an object would be. This correctly handles all value types, including numeric-valued sensitive fields:
sanitizeData('{"password":12345,"username":"mark"}');
// => '{"password":9999999999,"username":"mark"}'Non-JSON strings fall back to text-based pattern matching automatically.
Note
Output is re-serialized with JSON.stringify, which does not preserve
original whitespace or formatting. Set parseJsonStrings: false to use
text-based matching instead when formatting fidelity is required or when
the input is never JSON:
sanitizeData('{"password":12345,"username":"mark"}', {
parseJsonStrings: false,
});
// => '{"password":12345,"username":"mark"}' (numeric value not masked on regex path)If the string cannot be parsed as JSON, sanitizeData silently falls back to
text-based pattern matching. If you need strict behavior (fail or redact on
parse failure), open an issue
This is tracked for a future release.
sanitizeData(
{ password: 'secret', token: 'abc', username: 'mark' },
{ removeMatches: true },
);
// => { username: 'mark' }Use the exported piiPatterns and phiPatterns constants, or build your own
list, and pass them via customPatterns.
import { sanitizeData, piiPatterns, phiPatterns } from 'data-sanitization';
const patient = {
accountId: 'acct_123',
full_name: 'Avery Example',
email: 'avery@example.com',
phone: '+1-555-0100',
date_of_birth: '1989-04-12',
health_card: 'HC-1234-5678',
medications: ['example-medication'],
};
sanitizeData(patient, {
customPatterns: [...piiPatterns, ...phiPatterns],
useDefaultPatterns: false,
});
// => {
// accountId: 'acct_123',
// full_name: '**********',
// email: '**********',
// phone: '**********',
// date_of_birth: '**********',
// health_card: '**********',
// medications: '**********',
// }Use removeMatches with the same patterns to remove those fields instead of
masking them.
sanitizeData(patient, {
customPatterns: [...piiPatterns, ...phiPatterns],
removeMatches: true,
useDefaultPatterns: false,
});
// => { accountId: 'acct_123' }No configuration needed. Out of the box, sanitizeData covers common credential fields and HTTP
authentication headers:
sanitizeData({ password: 'secret', token: 'abc', username: 'mark' });
// => { password: '**********', token: '**********', username: 'mark' }import { sanitizeData, piiPatterns } from 'data-sanitization';
sanitizeData(data, { customPatterns: piiPatterns });import { sanitizeData, phiPatterns } from 'data-sanitization';
sanitizeData(data, { customPatterns: phiPatterns });Masking reveals that a field is sensitive. Removal is more appropriate when field presence itself must not appear in logs:
import { sanitizeData, piiPatterns, phiPatterns } from 'data-sanitization';
sanitizeData(data, {
customPatterns: [...piiPatterns, ...phiPatterns],
removeMatches: true,
});Enable sanitizeCollections: true to traverse Map and Set instances.
Each collection is sanitized and returned as a new instance; the original
is never mutated.
const session = new Map([
['token', 'abc123'],
['username', 'mark'],
]);
sanitizeData({ session }, { sanitizeCollections: true });
// => { session: Map { 'token' => '**********', 'username' => 'mark' } }const tags = new Set(['api_key=hunter2', 'env=production']);
sanitizeData({ tags }, { sanitizeCollections: true });
// => { tags: Set { 'api_key=**********', 'env=production' } }Tip
Map and Set are not JSON-serializable by default; JSON.stringify turns
them into {} and []. To include them in structured logs, spread them first:
// Map with string keys → plain object
JSON.stringify(Object.fromEntries(sanitizedMap));
// Map with mixed or object keys → entries array
JSON.stringify([...sanitizedMap.entries()]);
// Set → array
JSON.stringify([...sanitizedSet]);| Option | Type | Default | Description |
|---|---|---|---|
patternMask |
string |
********** |
String used to replace matched string field values |
numericMask |
number |
9999999999 |
Number used to replace matched number field values |
removeMatches |
boolean |
false |
Remove matched fields entirely instead of masking |
sanitizeCollections |
boolean |
false |
Sanitize Map and Set instances by traversing their entries and returning a new sanitized copy. When false, these pass through unchanged like other non-plain object instances. |
scanStringValues |
boolean |
true |
Scan string values on non-sensitive keys for embedded patterns. Applies to object input and to string input parsed via parseJsonStrings; has no effect on raw string input. |
parseJsonStrings |
boolean |
true |
Parse valid JSON string inputs as structured data and sanitize by field name. Re-serializes with JSON.stringify, discarding whitespace. Set to false to use the regex path. |
customPatterns |
PatternEntry[] |
[] |
Additional field name patterns to match. Each entry is a pattern string (substring match) or { match: string; strict?: boolean } for an exact match. |
customMatchers |
DataSanitizationMatcher[] |
[] |
Additional regex matchers for custom string formats |
useDefaultPatterns |
boolean |
true |
Set to false to use only your custom patterns, ignoring the built-in defaults. |
useDefaultMatchers |
boolean |
true |
Set to false to use only your custom matchers, ignoring the built-in defaults. |
ignorePatterns |
string[] |
[] |
Patterns to exclude from the active set. Applied after defaults and customPatterns are merged. Use to prevent false positives from built-in substring matching. |
The following field name patterns are matched by default. All use case-insensitive substring matching unless noted as exact.
Credentials (credentialPatterns):
apikeyapi_keypasswordsecrettoken
HTTP authentication headers (headerPatterns):
authorizationapi-key
A field named db_password or x-authorization would also match because
these patterns match as substrings.
Two additional pattern groups are exported but not included by default:
piiPatterns: Personally Identifiable Information: names, contact details, government IDs, and digital identifiers. Ambiguous single-word terms such asaddress,city,state, andzipuse exact matching to avoid false positives (e.g.email_addressis not masked when onlyaddressis inpiiPatterns).phiPatterns: Protected Health Information under HIPAA: medical record identifiers, healthcare dates, clinical data, and biometrics.
Use them via customPatterns:
import { sanitizeData, piiPatterns, phiPatterns } from 'data-sanitization';
sanitizeData(patient, {
customPatterns: [...piiPatterns, ...phiPatterns],
});Each pattern in customPatterns is a PatternEntry: either a plain string
(substring match) or an object with strict: true for an exact field-name
match.
// Substring: matches 'token', 'access_token', 'session_token', ...
sanitizeData(data, { customPatterns: ['token'] });
// Exact: matches only 'token', not 'access_token'
sanitizeData(data, { customPatterns: [{ match: 'token', strict: true }] });Use exact matching when a pattern is a common English word that would produce
false positives as a substring; for example, state would otherwise mask
statement or stateCode.
ignorePatternsand exact matching:ignorePatternsis astring[]matched against thematchstring of each active pattern. To suppress an exact-match entry such as{ match: 'state', strict: true }, passignorePatterns: ['state'].
Three matchers are included by default:
- JSON matcher: matches
"fieldName":"value"patterns in JSON and JSON-like strings - Escaped JSON matcher: matches
\"fieldName\":\"value\"patterns in JSON embedded inside JSON string values - Cookie and form-encoded matcher: matches
fieldName=valueandfieldName:valuepatterns in URL form-encoded strings and HTTP Cookie headers. Values stop at&,;,\r, or\nso neither format's separator is consumed as part of a value.
Use customPatterns to add field names on top of the defaults, or use
useDefaultPatterns: false to replace the defaults entirely:
import { sanitizeData } from 'data-sanitization';
const data = {
username: 'mark',
ssn: '123-45-6789',
credit_card: '4111111111111111',
};
// Add to the built-in defaults
sanitizeData(data, {
customPatterns: ['ssn', 'credit_card'],
});
// => { username: 'mark', ssn: '**********', credit_card: '**********' }
// Use only specific patterns, ignoring the defaults
sanitizeData(data, {
customPatterns: ['ssn'],
useDefaultPatterns: false,
});
// => { username: 'mark', ssn: '**********', credit_card: '4111111111111111' }
// Use a different mask string
sanitizeData(data, {
customPatterns: ['ssn', 'credit_card'],
patternMask: '[REDACTED]',
});
// => { username: 'mark', ssn: '[REDACTED]', credit_card: '[REDACTED]' }Use ignorePatterns to prevent a built-in pattern from matching field names
that are not sensitive in your application. The default token pattern, for
example, would also match tokenizer_config:
const data = {
tokenizer_config: 'bert-base-uncased',
api_key: 'sk-abc123',
username: 'mark',
};
// Without ignorePatterns: tokenizer_config is incorrectly masked
sanitizeData(data);
// => { tokenizer_config: '**********', api_key: '**********', username: 'mark' }
// With ignorePatterns: token pattern suppressed, other patterns still active
sanitizeData(data, { ignorePatterns: ['token'] });
// => { tokenizer_config: 'bert-base-uncased', api_key: '**********', username: 'mark' }Note that ignorePatterns suppresses the entire substring pattern; any field
whose name matches the pattern will pass through unmasked. If you have a field
named token alongside tokenizer_config, both will be unmasked when token
is ignored. Use useDefaultPatterns: false with explicit customPatterns for
fine-grained per-field control.
Number-typed sensitive values are masked with numericMask to preserve the
field's type:
sanitizeData({ password: 12345, username: 'mark' });
// => { password: 9999999999, username: 'mark' }
sanitizeData({ password: 12345, username: 'mark' }, { numericMask: 0 });
// => { password: 0, username: 'mark' }For custom data formats, provide a DataSanitizationMatcher, a function that
takes a pattern string and returns a global, case-insensitive RegExp. The
regex must use capture groups $1 and $2 to preserve the field name and
trailing delimiter while replacing the value.
import type { DataSanitizationMatcher } from 'data-sanitization';
const headerMatcher: DataSanitizationMatcher = (pattern) =>
new RegExp(`(${pattern}:\\s*).+?(\\n|$)`, 'gi');
sanitizeData('authorization: Bearer abc123\nuser: mark', {
customMatchers: [headerMatcher],
customPatterns: ['authorization'],
useDefaultMatchers: false,
});
// => 'authorization: **********\nuser: mark'sanitizeData throws a DataSanitizationError when:
- The input is not a
string,object, ornull. - An unexpected error occurs during sanitization.
import { sanitizeData, DataSanitizationError } from 'data-sanitization';
try {
sanitizeData(123 as any);
} catch (error) {
if (error instanceof DataSanitizationError) {
console.error(error.message); // 'Invalid data type'
console.error(error.details); // { inputType: 'number' }
}
}Error details are limited to safe diagnostic metadata and do not include the original input payload.
sanitizeData dispatches on the input type and applies the configured patterns and matchers accordingly:
- String input: by default, valid JSON object and array strings are parsed and sanitized
the same way as object input (see item 2 below), then re-serialized with
JSON.stringify. Non-JSON strings, and strings whenparseJsonStrings: falseis set, are sanitized directly via regex replacement with the configured matchers. - Object input is sanitized recursively by key name without JSON serialization. Sensitive keys are masked or removed regardless of whether their values are strings, numbers, arrays, objects, or other primitives.
- Plain nested objects and arrays are cloned as they are sanitized.
Non-plain object instances are preserved without modification to avoid
corrupting their prototypes. Enable
sanitizeCollections: trueto instead traverseMapandSetinstances, producing a new sanitized copy. - Object property names and Map string keys are used for pattern matching but are not themselves sanitized. If a property name or string Map key happens to contain sensitive data it will appear unsanitized in the output. Map keys that are objects are recursed into and sanitized like any other nested object.
- Null input is accepted and returns
null. - For object input, each pattern is matched case-insensitively against key
names. By default (
scanStringValues: true), string values on non-sensitive keys are also scanned, which catches credentials embedded in log messages or other free-text fields. - For string input, each pattern is tested against each matcher to find and replace sensitive values in the raw string directly.
sanitizeData is designed for in-process sanitization of log payloads,
request/response objects, and similar data before they leave your application.
It is not designed for streaming pipelines or bulk batch processing of large
files.
String-value scanning (scanStringValues: true, the default) adds overhead
on object workloads. The cost depends on how many non-sensitive string fields
the input has and how long they are. Rough throughput on a modern laptop
(Apple M-series, Node.js 22):
| Workload | ops/s | ms/call | scan overhead |
|---|---|---|---|
| Shallow object (1 sensitive key) | ~464,000 | ~0.002 | ~18% |
| Log object, stack trace with credentials | ~46,000 | ~0.022 | ~88% |
| Log object, clean stack trace | ~318,000 | ~0.003 | ~18% |
| Object with 10KB non-sensitive string | ~200,000 | ~0.005 | ~68% |
| Large flat object (50 fields, 1 sensitive key) | ~82,000 | ~0.012 | ~10% |
| Array (1,000 items, 1 sensitive key each) | ~2,161 | ~0.46 | ~5% |
| Array (1,000,000 items, 1 sensitive key each) | ~1.7 | ~574 | ~4% |
Array workloads pay ~3–5% overhead regardless of size. The per-item pre-filter cost is negligible. The cost is most visible on individual objects with long non-sensitive string values such as stack traces or large text fields; a single 10KB non-sensitive string value incurs ~68% overhead.
Tip
Set scanStringValues: false when you control your data structure and know
sensitive values only appear on sensitive-named keys. This recovers full pre-scanning throughput.
JSON string inputs are parsed and sanitized via objectReplacer by default,
which is 3–4× faster than the regex path and correctly masks numeric-valued
sensitive fields. Set parseJsonStrings: false to use the regex path instead.
On first call with a given set of options, sanitizeData compiles its regex
set and caches the result by option fingerprint. Subsequent calls with the same
options reuse the cache at no extra cost. This applies whether options are
passed inline or as a variable, as long as the content is the same.
Warning
Building customPatterns dynamically per call from variable data causes a cache
miss on every call, so compilation runs on each request instead of being reused.
// Anti-pattern: patterns differ on every call, cache never hits
app.post('/log', (req) => {
sanitizeData(req.body, {
customPatterns: [...basePatterns, ...req.user.sensitiveFields],
});
});
// Correct: build options once at startup (or per stable configuration)
const sanitizerOptions = {
customPatterns: [...basePatterns, ...knownSensitiveFields],
};
app.post('/log', (req) => {
sanitizeData(req.body, sanitizerOptions);
});If dynamic options are unavoidable, set scanStringValues: false. This skips
the string-scanning cache and avoids the fingerprinting overhead on every call.
When options must genuinely vary per call, each call pays the first-call compilation cost (~32× slower than a cached call).
For full benchmark tables, charts, and scaling analysis see docs/performance.md. To run the benchmarks:
yarn benchBug reports and pull requests are welcome. See CONTRIBUTING.md to get started.