feat(otel): Add distributed tracing across app and workspace server#2668
feat(otel): Add distributed tracing across app and workspace server#2668pauldambra wants to merge 5 commits into
Conversation
Generated-By: PostHog Code
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
|
pauldambra
left a comment
There was a problem hiding this comment.
QA Swarm review complete — see inline comments. (Diff base note: an earlier pass was run against a stale local main; this review reflects the authoritative PR diff against origin/main. Two findings about an agent.ts error-masking change and a duplicate initializePostHog were stale-diff artifacts and have been dropped.)
|
Note 🤖 Automated comment by QA Swarm — not written by a human Multi-perspective review: qa-team, paul-reviewer, xp-reviewer, security-audit Verdict:
|
| Reviewer | Assessment |
|---|---|
| 🔍 qa-team | Correct gating + sampling; the shutdown-flush handling (server + renderer) is the one real defect — request changes on that, rest is LOW/NIT. |
| 👤 paul | No correctness blockers in his read; "don't let it drift" (dup provider setup, no init log) + a "confirm no header when disabled" test. Stamp-with-comments. |
| 📐 xp | Genuine DRY win; tighten withSpan/traceTrpcCall consistency and add the missing renderer test. "Make it right" polish on working code. |
| 🛡 security-audit | No exploitable findings. Bearer key is a public phc_ ingestion key; span attributes deliberately low-sensitivity; inbound traceparent is behind the shared-secret auth. Two LOW hardening notes only. |
Diff-base note: an earlier pass ran against a stale local
mainand surfaced two findings (anagent.tserror-masking change, a duplicateinitializePostHog) that are not in the authoritative PR diff — those were dropped from this review.
Automated by QA Swarm — not a human review
- serve.ts: await the span flush (bounded by the grace timer) before process.exit so a clean workspace-server shutdown no longer drops buffered spans - shared: buildTracesEndpoint() strips trailing slash and asserts https (localhost http allowed for dev); OTEL_BATCH_DELAY_MS shares the batch delay across node + web so the magic number lives once - align renderer gating to require both key AND host (matches node + the log transport) so a key-only build can't half-instrument - init log line (enabled/disabled) on all three tracer inits - harden renderer flush: also flush on visibilitychange -> hidden - withSpan now takes the tracer explicitly, matching traceTrpcCall - tests: buildTracesEndpoint, renderer otel-trace gating + navigation + profiler threshold, workspace-client no-traceparent-when-disabled Generated-By: PostHog Code Task-Id: afd9489f-f2f8-4582-ab17-ace0d331ca0b
Covers the happy path: tracer present, fn resolves, span ends, value returned unchanged, no exception/error-status recorded. Generated-By: PostHog Code Task-Id: afd9489f-f2f8-4582-ab17-ace0d331ca0b
Strip trailing slashes with a linear loop instead of /\/+$/, which CodeQL flagged as a polynomial-time ReDoS on uncontrolled input. Generated-By: PostHog Code Task-Id: afd9489f-f2f8-4582-ab17-ace0d331ca0b
react-doctor's Socket check flags vitest@4.0.10 (known CVEs); adding it to a package that had no test runner registered a new baseline diagnostic. Revert the test-runner addition and keep tracePropagationHeaders private. The no-op-when-disabled behaviour is guaranteed by OTel's default NoopTextMapPropagator regardless. Generated-By: PostHog Code Task-Id: afd9489f-f2f8-4582-ab17-ace0d331ca0b
Problem
We lacked end-to-end visibility into requests flowing between the Electron main process and the workspace server. This made it difficult to diagnose latency and trace tRPC calls across process boundaries.
Changes
otel-trace.tsutility in the main process exposinginitOtelTracing,getMainTracer, andshutdownOtelTracing, backed by the workspace server'snode-tracingmodule.main/index.tsand wired graceful shutdown of the trace exporter into the app lifecycle service.tracingMonitor) that wraps each call viatraceTrpcCall, composed alongside the existing call-rate monitor onpublicProcedure.VITE_POSTHOG_API_KEY/VITE_POSTHOG_API_HOST) and returnsnullwhen not configured.@opentelemetry/api,context-zone,exporter-trace-otlp-http,sdk-trace-base,sdk-trace-node,sdk-trace-web).otel-trace.test.ts) covering initialization gating, provider registration, and shutdown flush behavior.How did you test this?
apps/code/src/main/utils/otel-trace.test.tscovering:nullwhen the API key or host is missing/invalid.Automatic notifications
Created with PostHog Code