Skip to content

feat: emit agent spans in RealtimeSession for SDK tracing#3573

Open
jordanchendev wants to merge 8 commits into
openai:mainfrom
jordanchendev:feat/1845-realtime-agent-spans
Open

feat: emit agent spans in RealtimeSession for SDK tracing#3573
jordanchendev wants to merge 8 commits into
openai:mainfrom
jordanchendev:feat/1845-realtime-agent-spans

Conversation

@jordanchendev
Copy link
Copy Markdown

Summary

RealtimeSession never created any SDK-level trace spans, so users who wrapped their session in with trace("...") saw empty traces on the OpenAI dashboard — only the outer trace ID was exported, no agent or tool spans inside it.

This PR fixes that by:

  • Starting an agent_span for the initial agent inside __aenter__.
  • Finishing the span (and starting a fresh one for the new agent) on every handoff, matching the per-agent span lifecycle in the regular Runner.
  • Finishing the active span in _cleanup (called from close and __aexit__).
  • Respecting run_config["tracing_disabled"] = True — when set, agent_span is created with disabled=True and no spans are emitted.

The span records:

  • name: the agent's name
  • tools: list of FunctionTool names available to the agent
  • handoffs: names of handoff target agents

Test plan

New file tests/realtime/test_session_spans.py with 6 tests covering:

  • Span created on enter
  • Span name matches agent name
  • Span is finished after close
  • Tool names recorded
  • Handoff names recorded
  • tracing_disabled=True suppresses all spans

Two existing test fixtures (mock_agent in test_session.py, fake_agent in test_session_exceptions.py) updated to set name and tools on their mock agents, which _make_agent_span now reads.

uv run pytest tests/realtime/test_session_spans.py tests/realtime/test_session_exceptions.py tests/realtime/test_session.py::TestModelSettingsIntegration -q
# 16 passed

uv run pytest -q --ignore=tests/extensions/memory
# 4414 passed, 1 skipped (2 pre-existing MCP failures unrelated to this change)

Issue number

Closes #1845

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation
  • I've run make lint and make format
  • I've made sure tests pass

RealtimeSession.__aenter__ now creates and starts an agent_span for the
starting agent, and finishes it when the session closes. On handoff, the
outgoing agent's span is finished and a new span is started for the
incoming agent, matching the tracing behaviour of the regular Runner.

The span records the agent name, tool names, and handoff target names.
Passing tracing_disabled=True in run_config suppresses all spans.

Fixes openai#1845
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a2d639b6d2

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
Comment on lines +832 to +833
self._current_agent_span = self._make_agent_span(self._current_agent)
self._current_agent_span.start(mark_as_current=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid resetting handoff span tokens across tasks

When a handoff function call is handled with the default async_tool_calls=True, this code runs inside the background task created by _enqueue_tool_call_task, so start(mark_as_current=True) stores a contextvars token for that task. Later close()/__aexit__ calls _cleanup() in the caller's context and finish(reset_current=True) on this same span, which raises ValueError: Token ... was created in a different Context and skips the rest of cleanup after any handoff.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit ed34935. _cleanup now uses reset_current=False, and the new agent span started during a handoff uses mark_as_current=False, avoiding cross-context token errors when async_tool_calls=True.

Comment thread src/agents/realtime/session.py Outdated
"""Create a new agent span for the given agent, respecting tracing_disabled."""
disabled: bool = bool(self._run_config.get("tracing_disabled", False))
handoff_names = [h.agent_name if isinstance(h, Handoff) else h.name for h in agent.handoffs]
tool_names = [t.name for t in agent.tools if isinstance(t, FunctionTool)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive span tool names from resolved tools

For agents that use is_enabled on function tools or tools supplied through MCP, this reads only agent.tools instead of the resolved list from agent.get_all_tools(...) that is actually sent to the model. In those supported configurations, disabled function tools can be reported in the span and MCP/enabled dynamic tools can be omitted, making the new tracing metadata inaccurate.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit ed34935. _make_agent_span no longer reads agent.tools; instead, both aenter and the handoff handler call agent.get_all_tools() after obtaining model settings and write the resolved names to span_data.tools via get_tool_trace_name_for_tool(), matching the pattern used by Runner.

Two issues raised in automated review of the initial commit:

P1 (crash): When async_tool_calls=True (the default), handoffs execute
inside asyncio.create_task(). start(mark_as_current=True) stores a
contextvars token in the background task's context. _cleanup() runs in
the main task and cannot reset that token; finish(reset_current=True)
raises ValueError: Token was created in a different Context.

Fix: _cleanup uses reset_current=False (session is ending; no need to
restore the previous current span). The new agent span started during a
handoff uses mark_as_current=False for the same reason.

P2 (inaccurate metadata): _make_agent_span read agent.tools (static
list) instead of the resolved list from get_all_tools(), so disabled
function tools appeared in the span and MCP tools were omitted.

Fix: tool names are no longer set in _make_agent_span; both __aenter__
and the handoff handler call agent.get_all_tools() after obtaining model
settings and write the resolved names to span_data.tools via
get_tool_trace_name_for_tool(), matching the pattern used by Runner.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ed34935eea

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +1278 to +1279
if self._current_agent_span is not None:
self._current_agent_span.finish(reset_current=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reset the initial agent span on cleanup

Fresh evidence is that the initial session span is still started with mark_as_current=True, while cleanup now always calls finish(reset_current=False). After any successfully entered realtime session closes without a handoff, the ended agent span remains in the current context; later spans in the same task and trace, or in a subsequent trace started in that task, will be parented under this stale span (even across trace IDs) instead of the active trace/root. The cross-task handoff case needs care, but the initial span token created by __aenter__ should still be reset when cleanup runs in that same context.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit ceb50e0. aenter now calls span.start(mark_as_current=False) then Scope.set_current_span() explicitly, storing the context-var token in self._initial_span_token. _cleanup (always in the same task as aenter via aexit) calls Scope.reset_current_span() on that token after finishing the current span, so the stale span is no longer current in the context regardless of whether a handoff occurred.

Comment thread src/agents/realtime/session.py Outdated
Comment on lines +210 to +212
# Start an agent span for the initial agent.
self._current_agent_span = self._make_agent_span(self._current_agent)
self._current_agent_span.start(mark_as_current=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Close the span if session entry fails

When any later step in __aenter__ fails after this span is started, such as resolving instructions/tools or model.connect(), Python does not call __aexit__ for the context manager. In that failure path the new span is left unfinished and current in the task context, so subsequent tracing can inherit a dangling parent and the failed attempt is never exported as an ended span. Wrap the fallible setup after span creation in a cleanup path or defer starting the span until entry can complete.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit ceb50e0. The fallible setup in aenter (get_updated_model_settings, get_all_tools, model.connect) is now wrapped in try/except BaseException. If any step raises, the handler calls span.finish(reset_current=False) and Scope.reset_current_span() before re-raising, so the span is always finished and the context-var is always reset even when aexit is never called.

Two issues fixed in RealtimeSession tracing:

P1 – After closing a session without a handoff, the agent span started in
__aenter__ with mark_as_current=True left its context-var token unreset
because _cleanup always used reset_current=False. Subsequent spans in the
same task were incorrectly parented under the stale (finished) span.

Fix: manage the context-var token explicitly via Scope.set_current_span() in
__aenter__ and store it in self._initial_span_token. _cleanup now calls
Scope.reset_current_span() on that token, which is always safe because
_cleanup runs via __aexit__ in the same task as __aenter__. The current span
(which may be a handoff span started with mark_as_current=False) is still
finished with reset_current=False to avoid cross-context token errors.

P2 – If __aenter__ raised an exception after span start (e.g. model.connect()
failed), __aexit__ was not called, leaving the span unfinished and current
in the context. Fix: wrap the fallible setup section in a try/except that
finishes the span and resets the token before re-raising.

Fixes openai#1845
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ceb50e07e9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
# already called finish() on this span from a background task.
self._current_agent_span = self._make_agent_span(self._current_agent)
self._current_agent_span.start(mark_as_current=False)
self._initial_span_token = Scope.set_current_span(self._current_agent_span)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid installing no-op spans as current

When the session is entered without an already-active trace (or with tracing_disabled=True), _make_agent_span() returns a no-op span, but this still stores it in the current span context for the whole session. If user code later starts a real trace() inside the session and creates any spans, the tracing provider treats the current no-op span as a no-op parent and drops those spans until _cleanup() resets the token. Only set the current span when a real agent span was created, or otherwise avoid poisoning the ambient tracing context.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b61f80f. aenter now checks isinstance(span, NoOpSpan) before calling Scope.set_current_span(). When no active trace exists (or tracing_disabled=True), _make_agent_span() returns a NoOpSpan and we skip installing it as current, so the ambient context is unchanged for the duration of the session.

Comment on lines +865 to +866
self._current_agent_span = self._make_agent_span(self._current_agent)
self._current_agent_span.start(mark_as_current=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parent handoff spans to the active trace, not the old agent

After a handoff, the outgoing agent span is finished with reset_current=False, so the task's current span is still the outgoing agent when this new span is created. Because _make_agent_span() relies on the ambient current span, every incoming agent span is recorded as a child of the ended previous agent rather than as the next agent span under the trace/task like the regular runner does after handoffs, which makes multi-agent realtime traces misleading.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b61f80f. Before calling _make_agent_span() for the incoming agent, the handoff handler now temporarily sets the current span to None via Scope.set_current_span(None) and immediately restores it. This ensures provider.create_span() sees no current span and assigns parent_id=None, making the new agent span a sibling under the trace root rather than a child of the finished outgoing span.

Comment thread src/agents/realtime/session.py Outdated
asynchronously via get_all_tools() to include MCP tools and respect is_enabled.
"""
disabled: bool = bool(self._run_config.get("tracing_disabled", False))
handoff_names = [h.agent_name if isinstance(h, Handoff) else h.name for h in agent.handoffs]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Filter handoffs before recording span metadata

This records handoff names directly from agent.handoffs, but realtime only exposes handoffs after _get_handoffs() applies each handoff's is_enabled predicate. For agents with disabled or context-gated handoffs, the trace will report handoff tools that were never sent to the model, so the new span metadata is inaccurate in exactly the configurations where handoff availability is dynamic.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit b61f80f. _make_agent_span() no longer reads agent.handoffs. Both aenter and the handoff handler now call _get_handoffs(agent, context_wrapper) (which applies each Handoff.is_enabled predicate via asyncio.gather) and write the filtered result to span_data.handoffs, matching the pattern Runner uses for get_handoffs().

… filter handoffs

Three issues fixed per tracing conventions (local_dev/notes/tracing-conventions.md):

Convention 3 — NoOpSpan poisoning:
__aenter__ now skips Scope.set_current_span() when _make_agent_span() returns a
NoOpSpan (no active trace, or tracing_disabled). Installing a NoOpSpan as current
would cause provider.create_span() to return NoOpSpan for every subsequent child
span in the same context (_is_noop_span check in provider.py).

Convention 5 — Span parent fixed at create_span() call time:
Before creating the incoming agent span in the handoff handler (background task),
the current span context is temporarily cleared via Scope.set_current_span(None)
and immediately restored. Without this, the outgoing agent span (still "current"
in the background task's copied context because finish(reset_current=False) does
not reset the context var) would be used as parent_id for the new span, producing
incorrect parent-child nesting instead of sibling spans under the trace root.

Convention 7 — is_enabled filtering for handoffs:
_make_agent_span() no longer reads agent.handoffs directly. Both __aenter__ and
the handoff handler now call _get_handoffs() (which applies each Handoff.is_enabled
predicate) and write the filtered result to span_data.handoffs, matching the pattern
used by Runner for handoff metadata.

Tests added for all three behaviors:
- test_no_active_trace_does_not_poison_span_context
- test_disabled_handoff_excluded_from_span_metadata
- test_handoff_span_is_sibling_not_child_of_initial_span

Fixes openai#1845
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b61f80f557

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
Comment on lines +1327 to +1328
if self._initial_span_token is not None:
Scope.reset_current_span(self._initial_span_token)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard span-token reset to the entry task

Fresh evidence is that _cleanup() is also reachable from __aiter__ when _stored_exception is set, and close() is public, so cleanup is not guaranteed to run in the same asyncio task that created _initial_span_token in __aenter__. In those cross-task paths, Scope.reset_current_span() raises ValueError: Token ... was created in a different Context, which stops cleanup before cancelling tasks, removing the listener, and closing the model.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0470b2d.

_cleanup() is reachable from __aexit__ (guaranteed main task), close() (any task), and __aiter__ (any task). The fix decomposes reset responsibility by task:

  • __aexit__ now resets _initial_span_token directly before calling close() — this path is always in the same task as __aenter__, so no cross-task issue.
  • _cleanup wraps the token reset in try/except ValueError as a safety net for calls from other tasks. A comment explains that __aexit__ will handle the actual reset from the correct task.

Regression test test_cleanup_from_different_task_does_not_raise was RED before this fix (ValueError propagated and left session open) and is GREEN now.

Comment thread src/agents/realtime/session.py Outdated
Comment on lines +242 to +246
resolved_tools = await self._current_agent.get_all_tools(self._context_wrapper)
self._current_agent_span.span_data.tools = [
n for t in resolved_tools if (n := get_tool_trace_name_for_tool(t)) is not None
] or None
enabled_handoffs = await self._get_handoffs(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reuse the resolved tools for span metadata

_get_updated_model_settings_from_agent() has already resolved the exact tools and handoffs used for the model configuration, but this block calls get_all_tools() and _get_handoffs() again just to populate the span. For agents with dynamic is_enabled predicates or MCP tool listings, the second resolution can have side effects or return a different set, so span_data can report tools/handoffs that were not the ones sent to the model; populate the span from the resolved lists already in the model settings instead.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 0470b2d.

_get_updated_model_settings_from_agent() now returns a 3-tuple (settings, tools, handoffs). Callers unpack the tuple and reuse the already-resolved lists:

initial_settings, resolved_tools, enabled_handoffs = (
    await self._get_updated_model_settings_from_agent(...)
)
# populate span from the same resolved lists, no second resolution
span_data.tools = [t.name for t in resolved_tools]
span_data.handoffs = [h.agent_name for h in enabled_handoffs] or None

The handoff handler and update_agent() also unpack the tuple, eliminating all duplicate get_all_tools() / _get_handoffs() calls. Test callers updated to unpack the tuple accordingly.

…_settings_from_agent

Eliminates duplicate get_all_tools() / _get_handoffs() calls by returning
already-computed results as a tuple. Callers in __aenter__, update_agent(),
and handoff handler now unpack the tuple; test callers updated to match.
Also guard _cleanup token reset against cross-task ValueError; defer span
context reset to __aexit__ which is guaranteed to run in the same task as
__aenter__. Add 5 new regression tests covering NoOpSpan poisoning, disabled
handoff filtering, sibling span hierarchy, cross-task cleanup, and direct
close() context reset.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0470b2d0e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
updated_settings["tracing"] = None

return updated_settings
return updated_settings, list(tools or []), list(handoffs or [])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use final model settings for span metadata

When model_config["initial_model_settings"] overrides tools or handoffs, updated_settings is changed above and that final config is what is passed to model.connect, but this return still gives callers the pre-override agent-resolved tools/handoffs. In that supported setup the model can receive different or no tools while the emitted AgentSpanData reports the agent defaults, so the new tracing metadata is inaccurate; derive the returned metadata from the final settings after applying starting_settings.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9408465.

_get_updated_model_settings_from_agent now derives the returned tools and handoffs after starting_settings.update() is applied, so callers always get the final values that were sent to the model:

# Apply starting_settings (from model config) — may override tools and handoffs.
if starting_settings:
    updated_settings.update(starting_settings)

# Return the final tools/handoffs AFTER overrides so span metadata matches the model.
final_tools = list(updated_settings.get("tools") or [])
final_handoffs = list(updated_settings.get("handoffs") or [])
return updated_settings, final_tools, final_handoffs

New test test_span_tool_metadata_reflects_model_config_override verifies that when model_config["initial_model_settings"]["tools"] = [] overrides the agent's tools, the span shows the empty list (not the agent defaults).

Comment on lines +875 to +876
if self._current_agent_span is not None:
self._current_agent_span.finish(reset_current=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear finished agent span after handoff

With the default async_tool_calls=True, the handoff runs in a background task and this finishes the outgoing agent span while the task that entered the session still has that same span installed as its current span until __aexit__. Any SDK/custom span the application starts after the handoff but before closing the session is then parented under an already-ended previous agent because DefaultTraceProvider.create_span() reads Scope.get_current_span(), so handoff traces can mis-parent user spans to the wrong agent.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 9408465 with an architectural change.

The root cause is that asyncio tasks inherit a snapshot of the parent context — a background task cannot update the main task's ContextVar. So after a handoff runs in a bg task and finishes the outgoing span, the main task's ContextVar still holds the old finished span as current. There is no way to fix this while keeping the pattern of installing agent spans as the ContextVar current span.

The fix: never install agent spans as the ContextVar current span. Scope.set_current_span is removed from __aenter__, _cleanup, and the handoff handler. Agent spans are created, tracked, and finished via self._current_agent_span without touching the context var. Consequences:

  • No stale span: the main task's context is always the enclosing trace root, before and after any handoff.
  • Sibling relationship is now natural: the background task inherits trace-root as current at task-creation time, so the incoming agent span's parent is the trace root without needing the Scope.set_current_span(None) trick.
  • _initial_span_token and all associated cross-task ValueError handling are removed entirely.

…adata from final settings

Resolves two root-cause issues:

1. Stale current span after handoff (#3353121454): installing the initial agent
   span as the ContextVar current span in the main task causes it to remain
   current even after a handoff finishes that span in a background task.
   asyncio tasks inherit a snapshot of the parent context — the background task
   cannot update the main task's ContextVar. Remove all Scope.set_current_span
   calls; agent spans are created and finished without being set as current.
   This also eliminates _initial_span_token and all related cross-task token-
   reset complexity.

2. Span metadata mismatch after model_config override (#3353121446):
   _get_updated_model_settings_from_agent returned the pre-override tools and
   handoffs even when starting_settings had overridden them. Now returns
   updated_settings["tools"/"handoffs"] after the update() call so span metadata
   matches what was actually sent to the model.

Add test_span_tool_metadata_reflects_model_config_override to cover case 2.
Update docstrings on no-longer-ContextVar-dependent tests to reflect new design.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94084651fa

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/agents/realtime/session.py Outdated
Comment on lines 320 to 323
updated_settings, _, _ = await self._get_updated_model_settings_from_agent(
starting_settings=None,
agent=self._current_agent,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Start a new span when update_agent switches agents

When an already-entered realtime session is switched via the public update_agent() API, this code updates _current_agent and the model settings but leaves _current_agent_span pointing at the previous agent until cleanup. In that scenario the trace records only the original agent span, so activity after the manual agent switch is attributed to the wrong agent and no span is emitted for the new agent; the handoff path handles this by finishing the old span and starting a new one, but this equivalent switch path does not.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. update_agent() now mirrors the handoff path exactly: it finishes the outgoing agent span (with reset_current=False), creates and starts a new span for the incoming agent (also mark_as_current=False), and populates span_data.tools / span_data.handoffs from the tuple returned by _get_updated_model_settings_from_agent().

Added test_update_agent_finishes_old_span_and_starts_new_one which asserts two agent spans are emitted (one per agent, both finished) and that the original agent span is closed before the session ends.

@seratch seratch added duplicate This issue or pull request already exists feature:realtime labels Jun 4, 2026
jordanchendev and others added 2 commits June 4, 2026 14:06
Add test_aenter_failure_finishes_span to cover the except BaseException cleanup
path in __aenter__ — the final missing entry from the tracing conventions test
checklist (Rule 9). Update the _make_agent_span docstring to reflect the current
design: callers now populate span_data from _get_updated_model_settings_from_agent
tuple return rather than calling get_all_tools() / _get_handoffs() separately.
update_agent() now mirrors the handoff path: finishes the outgoing
agent span (reset_current=False), creates and starts a new span for
the incoming agent (mark_as_current=False), and populates span
metadata from _get_updated_model_settings_from_agent() tuple.

Adds test_update_agent_finishes_old_span_and_starts_new_one to
verify both agent spans are emitted and the original span is closed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists feature:realtime

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RealtimeAgent traces contain no spans (empty traces on dashboard)

2 participants