fix(live-debugger): skip probe installation when sidecar is unavailable#3824
fix(live-debugger): skip probe installation when sidecar is unavailable#3824
Conversation
There was a problem hiding this comment.
This points to a deeper problem.
This testsuite never should have DDTRACE_G(sidecar) == NULL. How does that happen?
We never should install debugger probes before initializing the sidecar. And we probably should also skip installing them if connecting to the sidecar failed.
🎉 All green!❄️ No new flaky tests detected 🎯 Code Coverage (details) 🔗 Commit SHA: 299b215 | Docs | Datadog PR Page | Give us feedback! |
Benchmarks [ tracer ]Benchmark execution time: 2026-04-30 15:00:06 Comparing candidate commit 299b215 in PR branch Found 2 performance improvements and 1 performance regressions! Performance is the same for 191 metrics, 0 unstable metrics. scenario:PDOBench/benchPDOOverhead
scenario:PDOBench/benchPDOOverheadWithDBM
scenario:SamplingRuleMatchingBench/benchGlobMatching2-opcache
|
DDTRACE_G(sidecar) is NULL until ddtrace_sidecar_ensure_active() runs in RINIT. In minimal install environments the sidecar may not be initialized when live debugger diagnostics are sent. Use the global sidecar fallback (ddtrace_sidecar_for_signal) and skip the call if both are NULL.
Instead of guarding each ddog_send_debugger_diagnostics call against a NULL sidecar, bail out of dd_init_live_debugger_probe early if the sidecar isn't ready. This ensures probes are never installed in a half-functional state — the RC machinery will re-apply them on the next request once the sidecar has connected. Async callbacks (dd_probe_mark_active, dd_probe_resolved) retain their null guards since they fire outside of request initialization order.
dd_probe_resolved and dd_probe_mark_active only fire after a probe was successfully installed, which now requires a non-NULL sidecar. A NULL sidecar in these callbacks indicates a lifecycle bug — surface it rather than silently skipping the diagnostic.
The RC state machine calls add_probe again for probes that are in RECEIVED state (hook installed but class not yet resolved). This produces a spurious second RECEIVED diagnostic for class-targeted probes, visible in debugger_span_probe_class.phpt. Guard dd_init_live_debugger_probe against re-installing a probe whose ID is already present in active_rc_hooks, returning the existing hook ID so the RC machinery can track it correctly.
…at probe install time
zai_hook_remove() frees def via dd_probe_dtor, but the entry in active_rc_hooks was left dangling. The deduplication guard in dd_init_live_debugger_probe iterates active_rc_hooks by probe_id to avoid reinstalling already-active probes; if a probe was removed and re-added (RC state machine retry after removal), the guard read the freed def, causing use-after-free detected by valgrind.
7a0fed1 to
299b215
Compare
Problem
DDTRACE_G(sidecar)can be NULL when RC callbacks fire probe installation (RC is initialized beforeddtrace_sidecar_rinit()), and also when the sidecar fails to connect entirely.Additionally,
ddog_process_remote_configs()is called inddtrace_sidecar_submit_root_span_data_direct(added by c46963f) on every span submission when the service/env hasn't changed. For class-targeted probes (e.g.Delayed::foo) that are in RECEIVED state — hook installed but class not yet loaded — the RC state machine sees them as unconfirmed and callsadd_probeagain for the same probe ID in the same request. This results in a spurious second RECEIVED diagnostic indebugger_span_probe_class.phpt.Fix
Remove all NULL sidecar guards from
ddog_send_debugger_diagnosticscall sites (dd_probe_resolved,dd_probe_mark_active,dd_init_live_debugger_probe). A NULL sidecar at probe installation time is a lifecycle bug, not an expected condition — surface it rather than silently no-op.Deduplicate probe installation in
dd_init_live_debugger_probe: before installing a new hook, check if a probe with the same ID is already inactive_rc_hooks. If it is, skip re-installation and return the existing hook ID. This is not losing information — the probe config is already tracked; the RC state machine is simply retrying an already-pending probe.