Skip to content

Commit 5b4359d

Browse files
authored
fix(harness): harden Claude Code + OpenAI taps and span tracing (#446)
1 parent 53ab8ef commit 5b4359d

14 files changed

Lines changed: 812 additions & 116 deletions

File tree

adk/docs/harness.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,3 +198,9 @@ result = await emitter.auto_send_turn(turn, created_at=workflow.now())
198198
# result.final_text — last text segment
199199
# result.usage — TurnUsage (tokens, cost, ...)
200200
```
201+
202+
---
203+
204+
## Migration
205+
206+
- [Migrating to `agentex-client` 0.16.0 / `agentex-sdk` 0.15.0](./migration-0.16.0.md) — removed LangGraph/Pydantic-AI tracing handlers (tracing is now derived from the canonical stream), private `_modules` path moves, the OpenAI harness facade relocation, and the new `run_turn` Temporal entry point.

adk/docs/migration-0.16.0.md

Lines changed: 272 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
# Migration Guide — `agentex-client` 0.16.0 / `agentex-sdk` 0.15.0
2+
3+
This release consolidates the LangGraph, Pydantic-AI, and OpenAI Agents harnesses
4+
onto the **unified harness surface** (`UnifiedEmitter` + `SpanDeriver`), introduces
5+
`run_turn` as the single Temporal entry point for OpenAI Agents, renders
6+
hosted/server-side tool calls in the Temporal streaming model, and ships new CLI
7+
init templates.
8+
9+
Most consumers only need to act on **section 1** (removed tracing handlers).
10+
Sections 2–3 only matter if you import private modules. Section 4 lists the new,
11+
opt-in capabilities. Section 5 documents the defect fixes shipped on top of the
12+
release.
13+
14+
---
15+
16+
## 1. Tracing handlers removed (LangGraph + Pydantic-AI) — **action required**
17+
18+
The bespoke tracing callback handlers are **gone** from the public
19+
`agentex.lib.adk` surface:
20+
21+
| Removed | |
22+
|---|---|
23+
| `agentex.lib.adk.create_langgraph_tracing_handler` | + class `AgentexLangGraphTracingHandler` |
24+
| `agentex.lib.adk.create_pydantic_ai_tracing_handler` | + class `AgentexPydanticAITracingHandler` |
25+
26+
Span tracing is now **derived automatically** from the canonical
27+
`StreamTaskMessage*` stream by `UnifiedEmitter`. You no longer construct or pass a
28+
callback handler — you wrap the run in the harness `*Turn` and drive delivery
29+
through the emitter, and spans fall out of the stream.
30+
31+
### LangGraph
32+
33+
**Before**
34+
35+
```python
36+
from agentex.lib import adk
37+
38+
handler = adk.create_langgraph_tracing_handler(
39+
trace_id=trace_id,
40+
parent_span_id=parent_span_id,
41+
)
42+
result = await graph.ainvoke(state, config={"callbacks": [handler]})
43+
```
44+
45+
**After**
46+
47+
```python
48+
from agentex.lib.adk import stream_langgraph_events # facade name unchanged
49+
50+
# Streaming delivery + tracing are handled for you; no callbacks wiring.
51+
async for event in stream_langgraph_events(graph, state, ...):
52+
...
53+
```
54+
55+
or, when you own the emitter directly:
56+
57+
```python
58+
from agentex.lib.adk import LangGraphTurn
59+
from agentex.lib.core.harness import UnifiedEmitter
60+
61+
emitter = UnifiedEmitter(...)
62+
await emitter.auto_send_turn(LangGraphTurn(...)) # or: emitter.yield_turn(...)
63+
```
64+
65+
### Pydantic-AI
66+
67+
**Before**
68+
69+
```python
70+
handler = adk.create_pydantic_ai_tracing_handler(trace_id=..., parent_span_id=...)
71+
```
72+
73+
**After**
74+
75+
```python
76+
from agentex.lib.adk import PydanticAITurn, stream_pydantic_ai_events
77+
from agentex.lib.core.harness import UnifiedEmitter
78+
79+
# Wrap in PydanticAITurn and drive UnifiedEmitter.yield_turn / auto_send_turn.
80+
await UnifiedEmitter(...).auto_send_turn(PydanticAITurn(...))
81+
```
82+
83+
The `agentex init` templates were migrated to this pattern. If you scaffolded
84+
from an older template, regenerate (or diff against a fresh template) for the
85+
canonical shape.
86+
87+
---
88+
89+
## 2. Private `_modules` import paths changed — **only if you import privates**
90+
91+
Each harness now exposes exactly `_<harness>_sync.py` + `_<harness>_turn.py` under
92+
`agentex.lib.adk._modules`. Several private modules were deleted and their
93+
functions relocated. If you imported the **public facade names** from
94+
`agentex.lib.adk`, **nothing changes**. Repoint only if you reached into the
95+
private modules directly:
96+
97+
| Old (deleted) private import | New location | Public facade (unchanged) |
98+
|---|---|---|
99+
| `_modules._langgraph_async.stream_langgraph_events` | `_modules._langgraph_turn` | `adk.stream_langgraph_events` |
100+
| `_modules._langgraph_messages.emit_langgraph_messages` | `_modules._langgraph_sync` | `adk.emit_langgraph_messages` |
101+
| `_modules._langgraph_tracing.*` | **removed** (see §1) ||
102+
| `_modules._pydantic_ai_async.stream_pydantic_ai_events` | `_modules._pydantic_ai_turn` | `adk.stream_pydantic_ai_events` |
103+
| `_modules._pydantic_ai_tracing.*` | **removed** (see §1) ||
104+
105+
✅ These facade names are unchanged and keep working:
106+
`stream_langgraph_events`, `emit_langgraph_messages`,
107+
`convert_langgraph_to_agentex_events`, `LangGraphTurn`,
108+
`stream_pydantic_ai_events`, `convert_pydantic_ai_to_agentex_events`,
109+
`PydanticAITurn`.
110+
111+
---
112+
113+
## 3. OpenAI harness moved into `adk/_modules` + facade export
114+
115+
The OpenAI Agents harness now lives alongside the others:
116+
117+
- `OpenAITurn`, `openai_usage_to_turn_usage``agentex.lib.adk._modules._openai_turn`
118+
- `convert_openai_to_agentex_events``agentex.lib.adk._modules._openai_sync`
119+
120+
New **public** facade exports (prefer these):
121+
122+
```python
123+
from agentex.lib.adk import (
124+
OpenAITurn,
125+
convert_openai_to_agentex_events,
126+
openai_usage_to_turn_usage,
127+
)
128+
```
129+
130+
Back-compat shims remain at
131+
`agentex.lib.adk.providers._modules.{openai_turn,sync_provider}` **for one
132+
release** — migrate to the facade names before the next minor.
133+
134+
---
135+
136+
## 4. New capabilities (opt-in, no migration required)
137+
138+
- **`run_turn` — unified Temporal entry point for OpenAI Agents.**
139+
140+
```python
141+
from agentex.lib.core.temporal.plugins.openai_agents import run_turn, OpenAIAgentsTurnResult
142+
143+
result = await run_turn(
144+
agent, input,
145+
task_id=task_id,
146+
trace_id=trace_id,
147+
parent_span_id=parent_span_id,
148+
)
149+
result.final_output # raw SDK final_output
150+
result.usage # normalized TurnUsage for the turn span
151+
```
152+
153+
It emits each tool call exactly once (the streaming model is the sole
154+
tool-**request** emitter; hooks emit tool **responses**), traces per-tool spans,
155+
normalizes token usage, and drains orphaned tool spans in a `finally` block if
156+
the run terminates mid-tool. Existing `TemporalStreamingHooks` callers keep
157+
working — `run_turn` is additive. If you pass your own `hooks` subclass, also
158+
set `emit_tool_requests=False` and forward `trace_id` / `parent_span_id`
159+
yourself (they are only auto-applied to the default hooks).
160+
161+
- **Hosted / server-side tool rendering** in `TemporalStreamingModel`:
162+
web_search, file_search, code_interpreter, image_generation, server-side mcp,
163+
computer, and local_shell calls now surface as ToolRequest/ToolResponse pairs.
164+
165+
- **New CLI init templates:** `default` / `sync` / `temporal` flavors of
166+
`claude-code` and `codex`, plus `default-openai-agents`.
167+
168+
---
169+
170+
## 5. Defect fixes shipped with this migration
171+
172+
These fixes harden the newly-added sync OpenAI converter
173+
(`convert_openai_to_agentex_events` / `OpenAITurn`) and the Temporal hosted-tool
174+
path. No API change — behavior only.
175+
176+
1. **Malformed tool arguments no longer abort the turn.** The converter now
177+
parses raw tool-call arguments through a defensive helper
178+
(`_safe_parse_arguments`): a non-decodable string is preserved under `raw`
179+
and a non-dict JSON value under `value`, instead of raising `JSONDecodeError`
180+
and killing the run before later output is delivered. This matches the
181+
Temporal streaming model's existing fallback.
182+
183+
2. **Reasoning messages are closed.** Completed reasoning content/summary items
184+
now emit a matching `StreamTaskMessageDone`. Previously the `Done` was
185+
skipped, so `UnifiedEmitter.auto_send` never released the context and the
186+
reasoning span could be marked incomplete (reasoning-model output appeared to
187+
hang).
188+
189+
3. **Text no longer collides with reasoning.** Every new text `item_id` now
190+
reserves a fresh message index (matching the increment-then-use convention of
191+
the reasoning/tool paths). Previously the first text item reused the current
192+
index, so on reasoning-model streams the final answer could overwrite the
193+
reasoning message, duplicate a `Start`, or route deltas into the wrong context.
194+
195+
4. **Hosted-tool response shape aligned.** Hosted/server-side tool responses in
196+
`TemporalStreamingModel` now emit `content` as a plain string, matching the
197+
function-tool response path (`on_tool_end`) so hosted and function tools
198+
render identically within the same flow.
199+
200+
5. **Reasoning text now appears in derived spans.** `SpanDeriver` opened reasoning
201+
spans with empty input and closed them with `output=None`, so reasoning/thinking
202+
text never reached the trace (spans showed blank — read as "0 reasoning traces").
203+
It now accumulates the `ReasoningContentDelta` / `ReasoningSummaryDelta` text (and
204+
any text seeded on the Start content) and records it as the span output. Affects
205+
every harness that streams reasoning, including the Claude Code tap.
206+
207+
6. **Claude Code: no more duplicate text messages.** The `stream-json` converter
208+
deduped streamed-vs-materialized blocks by numeric block index and reset that
209+
state after every materialized `assistant` envelope. A single streamed message
210+
that materializes as several envelopes (thinking, then text) lost the dedup
211+
marker between envelopes and re-emitted the text. Dedup is now **content-based**
212+
(match the streamed block's text, consume once), which a numeric index cannot do
213+
reliably.
214+
215+
> Action: if you adopted `OpenAITurn` for **reasoning models** (o1/o3/gpt-5) on
216+
> the sync path before these fixes, upgrade — fixes 2 and 3 are required for
217+
> correct reasoning rendering. Claude Code agents on the unified harness tap should
218+
> upgrade for fixes 5 and 6.
219+
220+
---
221+
222+
## 6. Legacy Temporal `claude_agents` plugin → unified harness tap
223+
224+
`agentex.lib.core.temporal.plugins.claude_agents` (`run_claude_agent_activity`,
225+
`create_streaming_hooks`, `TemporalStreamingHooks`, `ClaudeMessageHandler`) is the
226+
**original** Claude Code integration: it drives the Python `claude-agent-sdk`
227+
directly and hand-rolls its own streaming + tracing. It is **superseded** by the
228+
unified harness tap and slated for removal in a future release. It still works
229+
today, so this migration is **recommended, not yet required** — but new Claude Code
230+
agents should use the tap, and existing ones should plan to move.
231+
232+
Why migrate: the tap routes Claude Code through the same canonical
233+
`StreamTaskMessage*` stream as every other harness, so it gets central span
234+
derivation (tool **and** reasoning spans), the single delivery path
235+
(`UnifiedEmitter`), and fixes like the two above for free. The legacy plugin does
236+
not derive reasoning spans at all and duplicates the streaming/tracing logic.
237+
238+
**Before — legacy plugin activity:**
239+
240+
```python
241+
from agentex.lib.core.temporal.plugins.claude_agents import run_claude_agent_activity
242+
243+
# In the workflow:
244+
result = await workflow.execute_activity(
245+
run_claude_agent_activity,
246+
args=[prompt, workspace_path, allowed_tools, ...],
247+
start_to_close_timeout=...,
248+
)
249+
```
250+
251+
**After — unified harness tap.** Run the CLI yourself (`claude -p --output-format
252+
stream-json --include-partial-messages`), wrap its stdout in `ClaudeCodeTurn`, and
253+
deliver through `UnifiedEmitter`:
254+
255+
```python
256+
from agentex.lib.adk import ClaudeCodeTurn, UnifiedEmitter
257+
258+
# `stdout_lines` is an async iterator of the CLI's stdout lines (raw JSON strings
259+
# or pre-parsed dicts) — e.g. read from sandbox.exec() / a subprocess.
260+
turn = ClaudeCodeTurn(stdout_lines)
261+
262+
emitter = UnifiedEmitter(task_id=task_id, trace_id=trace_id, parent_span_id=parent_span_id)
263+
result = await emitter.auto_send_turn(turn, created_at=workflow.now())
264+
# result.final_text — last text segment
265+
# result.usage — TurnUsage (tokens, cost, num_reasoning_blocks, ...)
266+
```
267+
268+
The golden agent is the reference implementation
269+
(`teams/sgp/agents/golden_agent/project/harness/`): it spawns the CLI in a sandbox,
270+
yields stdout lines into `ClaudeCodeTurn`, and drives `auto_send_turn`. Known
271+
remaining consumers to migrate: the `090_claude_agents_sdk_mvp` tutorial and the
272+
`eval_dashboard_agent`.

0 commit comments

Comments
 (0)