[codex] make OpenAI streaming strict by default#74
Conversation
solderzzc
left a comment
There was a problem hiding this comment.
Review: [codex] make OpenAI streaming strict by default
Overall: ✅ Approve with minor suggestions. The direction is correct — gating prefill_progress behind an opt-in header is the right call for OpenAI-compatibility. A few concurrency and API-design nits worth addressing before merging.
✅ What's Good
- Named SSE event (
event: prefill_progress) is the right wire format. Strict clients skip unknown named events; they would choke on a pseudo-OpenAIdata:chunk with"object": "prefill_progress". - Leaner payload — dropping
id,created,model, and the nestedprefillkey mirrors llama-server'sslot_updateand is easier to parse. - CORS header updated to expose
X-SwiftLM-Prefill-Progress— easy to forget, good catch. - Unit tests in
ServerSSETestscover the truthy-header parser and the new SSE shape cleanly.
🟡 Concerns
1. activePrefillProgressHook race condition (existing + worsened)
activePrefillProgressHook = nil // ← reset
if emitPrefillProgress {
activePrefillProgressHook = { … } // ← set
Task { … } // ← separate Task reads it via the global
}activePrefillProgressHook is a mutable global. Two concurrent requests will clobber each other's hook — the second nil assignment will silence the first request's heartbeat, or the first request's hook will fire into the second request's cont. This was already racy before this PR, but the new nil-then-set pattern makes the window explicit.
Suggestion: thread the hook through PrefillState or pass it as a closure parameter to handleChatStreaming/handleTextStreaming so it's scoped to the request.
2. handleTextCompletion missing request parameter in one call site
The diff patches the /v1/completions route handler to pass request:, but double-check there isn't a second call site (e.g. an internal test helper) that invokes handleTextCompletion without the new label — this would be a silent compile error caught only on full build.
3. Heartbeat Task leaks if the client disconnects before prefill completes
Task {
var elapsed = 0
while await !prefillState.done { … } // loops until done
}If the HTTP connection drops, cont will be cancelled but the Task continues looping until prefillState.finish() is called. Since finish() is only called on firstToken or .info, a request that errors out during prefill will spin for up to 2 s per iteration until the server is restarted. Consider holding a reference to the Task and calling .cancel() in the stream's termination path.
4. ssePrefillChunk — modelId parameter removed but sseChunk still takes it
The function signature change (modelId removed) is consistent with the new lean payload, but the doc comment still refers to "llama-server's slot_update event" which includes a model field. If downstream consumers need to correlate which model is prefilling (relevant for multi-model deployments), they can't — but they can infer from the stream's earlier chunks. Leaving this as-is is acceptable; just worth a doc-comment update to say model is intentionally omitted.
5. Test: dropLast(4) is fragile
let payload = String(chunk.dropFirst(prefix.count).dropLast(4))\r\n\r\n is 4 bytes in UTF-8, but dropLast operates on Character (Unicode scalars), not bytes. This works for ASCII but would silently mis-parse if the JSON ever contains multi-byte characters right before the terminator. Prefer:
XCTAssertTrue(chunk.hasSuffix("\r\n\r\n"))
let trimmed = chunk
.dropFirst(prefix.count)
.dropLast("\r\n\r\n".count)…which is still correct because you already assert hasSuffix above it.
🔵 Minor / Nits
- The
ontruthy value (parseTruthyHeaderValue) is unusual for HTTP headers —true/1/yesis standard,onis an HTML form artifact. Low risk, butoncould be dropped. - Consider documenting
X-SwiftLM-Prefill-Progressin the README alongside--turbo-kvandkv_bits(per the pattern established in PR #73).
Summary
The core change is sound. Before merging I'd prioritize item 1 (global hook race) and item 3 (Task leak) since both affect production correctness under concurrent load. Items 4–5 are polish.
… fix CORS/parallel test gaps - Server.swift: add defer-based heartbeat cleanup in both handleChatStreaming and handleTextStreaming so heartbeatTask is always cancelled on any exit path (client disconnect during prefill no longer leaks the heartbeat task) - ServerSSETests.swift: add missing import Foundation for Data/JSONSerialization - test-server.sh Test 32: fail on empty curl response instead of false-passing - test-server.sh Test 33: use conditional curl; fail if request fails entirely - test-server.sh Test 34: redirect CORS preflight to CORS_PORT (--cors server) instead of the main server which has no CORS middleware - test-server.sh Test 35: spin up a dedicated --parallel 2 server so concurrent requests actually overlap and stress the global hook under real parallelism - test-opencode.sh: capture opencode exit code separately; classify parse errors vs acceptable non-zero exits to prevent false passes
…ch in Tests 32-33 The new conditional curl patterns in Tests 32 and 33 combined with the existing set -euo pipefail caused the script to abort when grep found no match (exit 1) in the EVENT_DATA pipeline. All grep/jq calls that may produce no output now use || true or are wrapped in if/else to prevent premature script exit.
There was a problem hiding this comment.
Pull request overview
This PR makes OpenAI-compatible streaming strict by default by preventing non-standard prefill-progress frames from appearing in /v1/chat/completions and /v1/completions, while preserving opt-in prefill observability via X-SwiftLM-Prefill-Progress: true using named SSE events.
Changes:
- Gate prefill progress emission behind a request header and emit it as
event: prefill_progresswith a lean JSON payload. - Add Swift unit tests for header parsing, SSE formatting, and PrefillState behavior; extend shell integration tests (including CORS coverage).
- Add an optional CI modality to exercise OpenAI Python SDK + OpenCode CLI parsing.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
Sources/SwiftLM/Server.swift |
Adds header-based opt-in, named SSE event formatting, and CORS allow-header update. |
tests/SwiftLMTests/ServerSSETests.swift |
New XCTest coverage for truthy header parsing, SSE payload shape, and PrefillState guards. |
tests/test-server.sh |
Adds integration checks for strict default streaming, opt-in SSE events, CORS preflight, and concurrency scenario. |
tests/test-opencode.sh |
New integration test script for OpenAI Python SDK + OpenCode CLI SSE parsing. |
Package.swift |
Adds SwiftLMTests as a SwiftPM test target. |
.github/workflows/ci.yml |
Runs SwiftLMTests in CI and adds an optional opencode integration modality. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Requires: python3, pip (installs openai package dynamically) | ||
|
|
There was a problem hiding this comment.
The header comment says this script only requires Python/pip, but it also uses npm/npx later. Suggested fix: document Node/npm as prerequisites (or gate the OpenCode CLI test behind a command -v npm check with a clear skip message).
| log "Installing opencode-ai in isolated directory..." | ||
| mkdir -p /tmp/opencode_cli_test | ||
| cd /tmp/opencode_cli_test | ||
| npm install opencode-ai@latest --silent >/dev/null 2>&1 | ||
|
|
There was a problem hiding this comment.
npm install opencode-ai@latest makes the test non-deterministic and more likely to break over time. Suggested fix: pin the package version (or source it from a repo-controlled variable/file) so CI failures reflect server changes rather than upstream releases.
| // We capture the hook in a local variable so that concurrent requests | ||
| // cannot clobber each other's hook via the global. The global is still | ||
| // written here because LLMModel.prepare() reads it, but the semaphore | ||
| // ensures only one generation runs at a time. | ||
| var heartbeatTask: Task<Void, Never>? = nil | ||
| activePrefillProgressHook = nil | ||
| if emitPrefillProgress { | ||
| // Hook is scoped to this request: the local prefillState is the only | ||
| // shared state, and it is actor-isolated. | ||
| activePrefillProgressHook = { nPast, _ in | ||
| Task { await prefillState.update(nPast: nPast) } | ||
| } | ||
| heartbeatTask = Task { | ||
| var elapsed = 0 | ||
| while await !prefillState.done { | ||
| try? await Task.sleep(for: .seconds(2)) | ||
| // Guard against Task cancellation on client disconnect. | ||
| guard !Task.isCancelled else { break } | ||
| if await !prefillState.done { | ||
| elapsed += 2 | ||
| let nPast = await prefillState.nPast | ||
| _ = cont.yield(ssePrefillChunk( | ||
| nPast: nPast, | ||
| promptTokens: promptTokenCount, | ||
| elapsedSeconds: elapsed)) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
activePrefillProgressHook is a shared global, but this code assumes the semaphore means only one generation runs at a time. AsyncSemaphore(limit: parallelSlots) allows multiple concurrent generations when --parallel > 1, so concurrent requests can overwrite/clear each other’s hook (including non-opt-in requests setting it to nil), leading to missing/misattributed progress events and potential races inside LLMModel.prepare().
Suggested fix: don’t write per-request closures into the global. Set the global once to forward updates into a thread-safe registry (e.g., an actor that multiplexes to per-request subscribers), or disable this opt-in feature when parallelSlots > 1 and document the limitation.
| // We capture the hook in a local variable so that concurrent requests | |
| // cannot clobber each other's hook via the global. The global is still | |
| // written here because LLMModel.prepare() reads it, but the semaphore | |
| // ensures only one generation runs at a time. | |
| var heartbeatTask: Task<Void, Never>? = nil | |
| activePrefillProgressHook = nil | |
| if emitPrefillProgress { | |
| // Hook is scoped to this request: the local prefillState is the only | |
| // shared state, and it is actor-isolated. | |
| activePrefillProgressHook = { nPast, _ in | |
| Task { await prefillState.update(nPast: nPast) } | |
| } | |
| heartbeatTask = Task { | |
| var elapsed = 0 | |
| while await !prefillState.done { | |
| try? await Task.sleep(for: .seconds(2)) | |
| // Guard against Task cancellation on client disconnect. | |
| guard !Task.isCancelled else { break } | |
| if await !prefillState.done { | |
| elapsed += 2 | |
| let nPast = await prefillState.nPast | |
| _ = cont.yield(ssePrefillChunk( | |
| nPast: nPast, | |
| promptTokens: promptTokenCount, | |
| elapsedSeconds: elapsed)) | |
| } | |
| } | |
| } | |
| // Do not install a per-request closure into `activePrefillProgressHook`. | |
| // That hook is shared global state, and concurrent requests can run when | |
| // the server is configured with parallel generation slots. Writing it here | |
| // would allow one request to overwrite or clear another request's hook, | |
| // causing missing or misattributed progress events and introducing races | |
| // around `LLMModel.prepare()`. | |
| // | |
| // A safe implementation requires a single global forwarder plus a | |
| // concurrency-safe registry, or serialization of this feature. Until that | |
| // exists, leave the shared hook untouched in this request path. | |
| var heartbeatTask: Task<Void, Never>? = nil | |
| if emitPrefillProgress { | |
| // Prefill progress reporting is intentionally disabled here because it | |
| // cannot be implemented safely with the current shared global hook. |
| if [ -z "$STRICT_STREAM" ] || ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then | ||
| # Only fail if it was a curl failure (empty), not a missing event | ||
| [ -z "$STRICT_STREAM" ] && fail "Strict mode: stream was empty" |
There was a problem hiding this comment.
The logic here can silently produce neither PASS nor FAIL when the stream is non-empty but missing the terminating data: [DONE] (the branch runs but only fails on empty output). That makes this check non-deterministic and can hide truncated-stream regressions.
Suggested fix: explicitly fail (or at least emit a WARN/PASS) when [DONE] is missing so the test result always reflects the outcome.
| if [ -z "$STRICT_STREAM" ] || ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then | |
| # Only fail if it was a curl failure (empty), not a missing event | |
| [ -z "$STRICT_STREAM" ] && fail "Strict mode: stream was empty" | |
| if [ -z "$STRICT_STREAM" ]; then | |
| fail "Strict mode: stream was empty" | |
| elif ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then | |
| fail "Strict mode: stream missing terminating [DONE] event" |
| log "Setting up virtual environment with openai SDK..." | ||
| VENV_DIR="/tmp/opencode_venv" | ||
| python3 -m venv "$VENV_DIR" | ||
| "$VENV_DIR/bin/pip" install --quiet openai |
There was a problem hiding this comment.
The virtualenv path is hard-coded to /tmp/opencode_venv and the script doesn’t clean it up. Re-running locally (or in environments where /tmp persists) can cause python3 -m venv to fail because the directory already exists.
Suggested fix: use mktemp -d for an isolated venv dir and delete it in cleanup(), or rm -rf "$VENV_DIR" before creating it.
|
@solderzzc Thank you for finishing up the PR so fast 👍 |
|
@jankaderabel, Thank you for your bug report with PR. Let me know if
there’s any other issues.
…On Wed, Apr 22, 2026 at 11:39 PM Jan Kaderabek ***@***.***> wrote:
*jankaderabek* left a comment (SharpAI/SwiftLM#74)
<#74?email_source=notifications&email_token=AAXRJ7HOTKVRJ6WFZNXCSBL4XG3DFA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMZQGIZDGNBUGYYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJNLQOJPWG33NNVSW45C7N5YGK3S7MNWGSY3L#issuecomment-4302234460>
@solderzzc <https://github.com/solderzzc> Thank you for finishing up the
PR so fast 👍
—
Reply to this email directly, view it on GitHub
<#74?email_source=notifications&email_token=AAXRJ7HOTKVRJ6WFZNXCSBL4XG3DFA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMZQGIZDGNBUGYYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJNLQOJPWG33NNVSW45C7N5YGK3S7MNWGSY3L#issuecomment-4302234460>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAXRJ7FMBQHHDGF634JYVWD4XG3DFAVCNFSM6AAAAACYC3TMYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DGMBSGIZTINBWGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
This removes non-standard prefill progress
data:frames from/v1/chat/completionsand/v1/completionsby default.Previously, SwiftLM could emit custom
prefill_progresspayloads during long prompt prefill. Strict OpenAI-compatible clients such as OpenCode may reject those frames because they are neither standard OpenAI stream chunks nor error objects.Changes:
X-SwiftLM-Prefill-Progress: trueevent: prefill_progress) instead of custom OpenAI-like chunksThis keeps default streaming wire-compatible while preserving optional prefill observability for custom clients.