Skip to content

fix(live-debugger): skip probe installation when sidecar is unavailable#3824

Open
Leiyks wants to merge 7 commits intomasterfrom
leiyks/fix-ci-live-debugger-null-sidecar
Open

fix(live-debugger): skip probe installation when sidecar is unavailable#3824
Leiyks wants to merge 7 commits intomasterfrom
leiyks/fix-ci-live-debugger-null-sidecar

Conversation

@Leiyks
Copy link
Copy Markdown
Contributor

@Leiyks Leiyks commented Apr 24, 2026

Problem

DDTRACE_G(sidecar) can be NULL when RC callbacks fire probe installation (RC is initialized before ddtrace_sidecar_rinit()), and also when the sidecar fails to connect entirely.

Additionally, ddog_process_remote_configs() is called in ddtrace_sidecar_submit_root_span_data_direct (added by c46963f) on every span submission when the service/env hasn't changed. For class-targeted probes (e.g. Delayed::foo) that are in RECEIVED state — hook installed but class not yet loaded — the RC state machine sees them as unconfirmed and calls add_probe again for the same probe ID in the same request. This results in a spurious second RECEIVED diagnostic in debugger_span_probe_class.phpt.

Fix

  1. Remove all NULL sidecar guards from ddog_send_debugger_diagnostics call sites (dd_probe_resolved, dd_probe_mark_active, dd_init_live_debugger_probe). A NULL sidecar at probe installation time is a lifecycle bug, not an expected condition — surface it rather than silently no-op.

  2. Deduplicate probe installation in dd_init_live_debugger_probe: before installing a new hook, check if a probe with the same ID is already in active_rc_hooks. If it is, skip re-installation and return the existing hook ID. This is not losing information — the probe config is already tracked; the RC state machine is simply retrying an already-pending probe.

Copy link
Copy Markdown
Collaborator

@bwoebi bwoebi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points to a deeper problem.
This testsuite never should have DDTRACE_G(sidecar) == NULL. How does that happen?

We never should install debugger probes before initializing the sidecar. And we probably should also skip installing them if connecting to the sidecar failed.

@datadog-datadog-prod-us1-2
Copy link
Copy Markdown

datadog-datadog-prod-us1-2 Bot commented Apr 24, 2026

Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

🎯 Code Coverage (details)
Patch Coverage: 100.00%
Overall Coverage: 60.68% (+0.00%)

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 299b215 | Docs | Datadog PR Page | Give us feedback!

@pr-commenter
Copy link
Copy Markdown

pr-commenter Bot commented Apr 24, 2026

Benchmarks [ tracer ]

Benchmark execution time: 2026-04-30 15:00:06

Comparing candidate commit 299b215 in PR branch leiyks/fix-ci-live-debugger-null-sidecar with baseline commit 82abdf3 in branch master.

Found 2 performance improvements and 1 performance regressions! Performance is the same for 191 metrics, 0 unstable metrics.

scenario:PDOBench/benchPDOOverhead

  • 🟩 execution_time [-10.281µs; -8.230µs] or [-4.088%; -3.272%]

scenario:PDOBench/benchPDOOverheadWithDBM

  • 🟩 execution_time [-8.492µs; -6.635µs] or [-3.391%; -2.649%]

scenario:SamplingRuleMatchingBench/benchGlobMatching2-opcache

  • 🟥 execution_time [+56.449ns; +190.751ns] or [+2.147%; +7.255%]

@Leiyks Leiyks changed the title fix(live-debugger): guard against NULL sidecar in diagnostics calls fix(live-debugger): skip probe installation when sidecar is unavailable Apr 28, 2026
@Leiyks Leiyks requested a review from bwoebi April 28, 2026 12:24
@Leiyks Leiyks marked this pull request as ready for review April 28, 2026 12:24
@Leiyks Leiyks requested a review from a team as a code owner April 28, 2026 12:24
Leiyks added 7 commits April 30, 2026 15:36
DDTRACE_G(sidecar) is NULL until ddtrace_sidecar_ensure_active() runs
in RINIT. In minimal install environments the sidecar may not be
initialized when live debugger diagnostics are sent. Use the global
sidecar fallback (ddtrace_sidecar_for_signal) and skip the call if
both are NULL.
Instead of guarding each ddog_send_debugger_diagnostics call against a
NULL sidecar, bail out of dd_init_live_debugger_probe early if the
sidecar isn't ready. This ensures probes are never installed in a
half-functional state — the RC machinery will re-apply them on the
next request once the sidecar has connected.

Async callbacks (dd_probe_mark_active, dd_probe_resolved) retain their
null guards since they fire outside of request initialization order.
dd_probe_resolved and dd_probe_mark_active only fire after a probe was
successfully installed, which now requires a non-NULL sidecar. A NULL
sidecar in these callbacks indicates a lifecycle bug — surface it
rather than silently skipping the diagnostic.
The RC state machine calls add_probe again for probes that are in
RECEIVED state (hook installed but class not yet resolved). This
produces a spurious second RECEIVED diagnostic for class-targeted
probes, visible in debugger_span_probe_class.phpt.

Guard dd_init_live_debugger_probe against re-installing a probe whose
ID is already present in active_rc_hooks, returning the existing hook
ID so the RC machinery can track it correctly.
zai_hook_remove() frees def via dd_probe_dtor, but the entry in
active_rc_hooks was left dangling. The deduplication guard in
dd_init_live_debugger_probe iterates active_rc_hooks by probe_id to
avoid reinstalling already-active probes; if a probe was removed and
re-added (RC state machine retry after removal), the guard read the
freed def, causing use-after-free detected by valgrind.
@Leiyks Leiyks force-pushed the leiyks/fix-ci-live-debugger-null-sidecar branch from 7a0fed1 to 299b215 Compare April 30, 2026 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants