Skip to content

diagnostics_channel: opt-in subscriber suppression #63623

@BridgeAR

Description

@BridgeAR

What is the problem this feature will solve?

APM agents and other tools that subscribe to diagnostics channels often
need to suppress nested instrumentation when their own internal code calls
into an instrumented library. Example: a tracer's HTTP exporter
sends a span over HTTP, the HTTP client is instrumented, the exporter
recurses unless the nested call is suppressed.

Every implementation I have looked at solves it the same way: a custom
AsyncLocalStorage carrying a { noop: true } marker, plus a wrapper
closure around every subscribe and bindStore registration that reads
the marker and short-circuits. Datadog's dd-trace-js does it and I believe the other major APMs do equivalent things. That wrapper runs on every publish, not
only on the suppression path. Thus, we pay one extra JS frame, one extra ALS lookup,
one extra branch per subscribed handler, on the hot path of every channel
in the process.

What is the feature you are proposing to solve the problem?

Move the suppression check inside Channel.prototype.publish and
Channel.prototype.runStores, gated by per-subscription opt-in:

const { channel, suppressed } = require('node:diagnostics_channel')

const kMyTracer = Symbol('my-tracer')
const ch = channel('apm:foo:bar')

ch.subscribe(handler, { suppressedBy: kMyTracer })
ch.bindStore(als, transform, { suppressedBy: kMyTracer })

suppressed(kMyTracer, () => instrumentedCall())

suppressedBy names the Symbol that identifies the suppression scope. suppressed(symbol, fn) enters that scope on the active AsyncContextFrame. Subscribers that opted in are skipped while the scope is active. Subscribers that did not opt in keep firing. That gives multi-APM coordination, and it does not change semantics for any existing caller.

Wins: The userland wrapper closure on every subscribed handler goes away. The check folds into the publish loop where diagnostic_channel already iterates subscribers, and V8 inlines it against the (always-empty in production) marker.
Vendors stop re-implementing the same ALS dance, each with their own subtle differences in how the marker propagates across setImmediate, queueMicrotask, and unhandled-rejection paths.

There are probably some details to be fledged out but I wanted to get some feedback about the overall idea first. @bengl @timfish @Qard @trentm and others: opinions?

What alternatives have you considered?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    diagnostics_channelIssues and PRs related to diagnostics channelfeature requestIssues that request new features to be added to Node.js.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Awaiting Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions