[codex] make OpenAI streaming strict by default by jankaderabek · Pull Request #74 · SharpAI/SwiftLM

jankaderabek · 2026-04-22T21:07:38Z

This removes non-standard prefill progress data: frames from /v1/chat/completions and /v1/completions by default.

Previously, SwiftLM could emit custom prefill_progress payloads during long prompt prefill. Strict OpenAI-compatible clients such as OpenCode may reject those frames because they are neither standard OpenAI stream chunks nor error objects.

Changes:

disable prefill progress frames by default on OpenAI-compatible streaming endpoints
add opt-in support via X-SwiftLM-Prefill-Progress: true
emit opt-in progress as named SSE events (event: prefill_progress) instead of custom OpenAI-like chunks
add focused tests for header parsing and SSE payload shape

This keeps default streaming wire-compatible while preserving optional prefill observability for custom clients.

solderzzc

Review: `[codex] make OpenAI streaming strict by default`

Overall: ✅ Approve with minor suggestions. The direction is correct — gating prefill_progress behind an opt-in header is the right call for OpenAI-compatibility. A few concurrency and API-design nits worth addressing before merging.

✅ What's Good

Named SSE event (event: prefill_progress) is the right wire format. Strict clients skip unknown named events; they would choke on a pseudo-OpenAI data: chunk with "object": "prefill_progress".
Leaner payload — dropping id, created, model, and the nested prefill key mirrors llama-server's slot_update and is easier to parse.
CORS header updated to expose X-SwiftLM-Prefill-Progress — easy to forget, good catch.
Unit tests in ServerSSETests cover the truthy-header parser and the new SSE shape cleanly.

🟡 Concerns

1. `activePrefillProgressHook` race condition (existing + worsened)

activePrefillProgressHook = nil          // ← reset
if emitPrefillProgress {
    activePrefillProgressHook = { … }    // ← set
    Task { … }                            // ← separate Task reads it via the global
}

activePrefillProgressHook is a mutable global. Two concurrent requests will clobber each other's hook — the second nil assignment will silence the first request's heartbeat, or the first request's hook will fire into the second request's cont. This was already racy before this PR, but the new nil-then-set pattern makes the window explicit.

Suggestion: thread the hook through PrefillState or pass it as a closure parameter to handleChatStreaming/handleTextStreaming so it's scoped to the request.

2. `handleTextCompletion` missing `request` parameter in one call site

The diff patches the /v1/completions route handler to pass request:, but double-check there isn't a second call site (e.g. an internal test helper) that invokes handleTextCompletion without the new label — this would be a silent compile error caught only on full build.

3. Heartbeat `Task` leaks if the client disconnects before prefill completes

Task {
    var elapsed = 0
    while await !prefillState.done { … }   // loops until done
}

If the HTTP connection drops, cont will be cancelled but the Task continues looping until prefillState.finish() is called. Since finish() is only called on firstToken or .info, a request that errors out during prefill will spin for up to 2 s per iteration until the server is restarted. Consider holding a reference to the Task and calling .cancel() in the stream's termination path.

4. `ssePrefillChunk` — `modelId` parameter removed but `sseChunk` still takes it

The function signature change (modelId removed) is consistent with the new lean payload, but the doc comment still refers to "llama-server's slot_update event" which includes a model field. If downstream consumers need to correlate which model is prefilling (relevant for multi-model deployments), they can't — but they can infer from the stream's earlier chunks. Leaving this as-is is acceptable; just worth a doc-comment update to say model is intentionally omitted.

5. Test: `dropLast(4)` is fragile

let payload = String(chunk.dropFirst(prefix.count).dropLast(4))

\r\n\r\n is 4 bytes in UTF-8, but dropLast operates on Character (Unicode scalars), not bytes. This works for ASCII but would silently mis-parse if the JSON ever contains multi-byte characters right before the terminator. Prefer:

XCTAssertTrue(chunk.hasSuffix("\r\n\r\n"))
let trimmed = chunk
    .dropFirst(prefix.count)
    .dropLast("\r\n\r\n".count)

…which is still correct because you already assert hasSuffix above it.

🔵 Minor / Nits

The on truthy value (parseTruthyHeaderValue) is unusual for HTTP headers — true/1/yes is standard, on is an HTML form artifact. Low risk, but on could be dropped.
Consider documenting X-SwiftLM-Prefill-Progress in the README alongside --turbo-kv and kv_bits (per the pattern established in PR #73).

Summary

The core change is sound. Before merging I'd prioritize item 1 (global hook race) and item 3 (Task leak) since both affect production correctness under concurrent load. Items 4–5 are polish.

… fix CORS/parallel test gaps - Server.swift: add defer-based heartbeat cleanup in both handleChatStreaming and handleTextStreaming so heartbeatTask is always cancelled on any exit path (client disconnect during prefill no longer leaks the heartbeat task) - ServerSSETests.swift: add missing import Foundation for Data/JSONSerialization - test-server.sh Test 32: fail on empty curl response instead of false-passing - test-server.sh Test 33: use conditional curl; fail if request fails entirely - test-server.sh Test 34: redirect CORS preflight to CORS_PORT (--cors server) instead of the main server which has no CORS middleware - test-server.sh Test 35: spin up a dedicated --parallel 2 server so concurrent requests actually overlap and stress the global hook under real parallelism - test-opencode.sh: capture opencode exit code separately; classify parse errors vs acceptable non-zero exits to prevent false passes

…ch in Tests 32-33 The new conditional curl patterns in Tests 32 and 33 combined with the existing set -euo pipefail caused the script to abort when grep found no match (exit 1) in the EVENT_DATA pipeline. All grep/jq calls that may produce no output now use || true or are wrapped in if/else to prevent premature script exit.

Copilot

Pull request overview

This PR makes OpenAI-compatible streaming strict by default by preventing non-standard prefill-progress frames from appearing in /v1/chat/completions and /v1/completions, while preserving opt-in prefill observability via X-SwiftLM-Prefill-Progress: true using named SSE events.

Changes:

Gate prefill progress emission behind a request header and emit it as event: prefill_progress with a lean JSON payload.
Add Swift unit tests for header parsing, SSE formatting, and PrefillState behavior; extend shell integration tests (including CORS coverage).
Add an optional CI modality to exercise OpenAI Python SDK + OpenCode CLI parsing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`Sources/SwiftLM/Server.swift`	Adds header-based opt-in, named SSE event formatting, and CORS allow-header update.
`tests/SwiftLMTests/ServerSSETests.swift`	New XCTest coverage for truthy header parsing, SSE payload shape, and PrefillState guards.
`tests/test-server.sh`	Adds integration checks for strict default streaming, opt-in SSE events, CORS preflight, and concurrency scenario.
`tests/test-opencode.sh`	New integration test script for OpenAI Python SDK + OpenCode CLI SSE parsing.
`Package.swift`	Adds `SwiftLMTests` as a SwiftPM test target.
`.github/workflows/ci.yml`	Runs `SwiftLMTests` in CI and adds an optional `opencode` integration modality.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T03:37:55Z

+# Requires: python3, pip (installs openai package dynamically)
+


The header comment says this script only requires Python/pip, but it also uses npm/npx later. Suggested fix: document Node/npm as prerequisites (or gate the OpenCode CLI test behind a command -v npm check with a clear skip message).

Copilot · 2026-04-23T03:37:56Z

+log "Installing opencode-ai in isolated directory..."
+mkdir -p /tmp/opencode_cli_test
+cd /tmp/opencode_cli_test
+npm install opencode-ai@latest --silent >/dev/null 2>&1
+


npm install opencode-ai@latest makes the test non-deterministic and more likely to break over time. Suggested fix: pin the package version (or source it from a repo-controlled variable/file) so CI failures reflect server changes rather than upstream releases.

Copilot · 2026-04-23T03:37:56Z

+    // We capture the hook in a local variable so that concurrent requests
+    // cannot clobber each other's hook via the global. The global is still
+    // written here because LLMModel.prepare() reads it, but the semaphore
+    // ensures only one generation runs at a time.
+    var heartbeatTask: Task<Void, Never>? = nil
+    activePrefillProgressHook = nil
+    if emitPrefillProgress {
+        // Hook is scoped to this request: the local prefillState is the only
+        // shared state, and it is actor-isolated.
+        activePrefillProgressHook = { nPast, _ in
+            Task { await prefillState.update(nPast: nPast) }
+        }
+        heartbeatTask = Task {
+            var elapsed = 0
+            while await !prefillState.done {
+                try? await Task.sleep(for: .seconds(2))
+                // Guard against Task cancellation on client disconnect.
+                guard !Task.isCancelled else { break }
+                if await !prefillState.done {
+                    elapsed += 2
+                    let nPast = await prefillState.nPast
+                    _ = cont.yield(ssePrefillChunk(
+                        nPast: nPast,
+                        promptTokens: promptTokenCount,
+                        elapsedSeconds: elapsed))
+                }
            }
        }


activePrefillProgressHook is a shared global, but this code assumes the semaphore means only one generation runs at a time. AsyncSemaphore(limit: parallelSlots) allows multiple concurrent generations when --parallel > 1, so concurrent requests can overwrite/clear each other’s hook (including non-opt-in requests setting it to nil), leading to missing/misattributed progress events and potential races inside LLMModel.prepare().

Suggested fix: don’t write per-request closures into the global. Set the global once to forward updates into a thread-safe registry (e.g., an actor that multiplexes to per-request subscribers), or disable this opt-in feature when parallelSlots > 1 and document the limitation.

Suggested change

// We capture the hook in a local variable so that concurrent requests

// cannot clobber each other's hook via the global. The global is still

// written here because LLMModel.prepare() reads it, but the semaphore

// ensures only one generation runs at a time.

var heartbeatTask: Task<Void, Never>? = nil

activePrefillProgressHook = nil

if emitPrefillProgress {

// Hook is scoped to this request: the local prefillState is the only

// shared state, and it is actor-isolated.

activePrefillProgressHook = { nPast, _ in

Task { await prefillState.update(nPast: nPast) }

}

heartbeatTask = Task {

var elapsed = 0

while await !prefillState.done {

try? await Task.sleep(for: .seconds(2))

// Guard against Task cancellation on client disconnect.

guard !Task.isCancelled else { break }

if await !prefillState.done {

elapsed += 2

let nPast = await prefillState.nPast

_ = cont.yield(ssePrefillChunk(

nPast: nPast,

promptTokens: promptTokenCount,

elapsedSeconds: elapsed))

}

}

}

// Do not install a per-request closure into `activePrefillProgressHook`.

// That hook is shared global state, and concurrent requests can run when

// the server is configured with parallel generation slots. Writing it here

// would allow one request to overwrite or clear another request's hook,

// causing missing or misattributed progress events and introducing races

// around `LLMModel.prepare()`.

//

// A safe implementation requires a single global forwarder plus a

// concurrency-safe registry, or serialization of this feature. Until that

// exists, leave the shared hook untouched in this request path.

var heartbeatTask: Task<Void, Never>? = nil

if emitPrefillProgress {

// Prefill progress reporting is intentionally disabled here because it

// cannot be implemented safely with the current shared global hook.

Copilot · 2026-04-23T03:37:56Z

+if [ -z "$STRICT_STREAM" ] || ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then
+    # Only fail if it was a curl failure (empty), not a missing event
+    [ -z "$STRICT_STREAM" ] && fail "Strict mode: stream was empty"


The logic here can silently produce neither PASS nor FAIL when the stream is non-empty but missing the terminating data: [DONE] (the branch runs but only fails on empty output). That makes this check non-deterministic and can hide truncated-stream regressions.

Suggested fix: explicitly fail (or at least emit a WARN/PASS) when [DONE] is missing so the test result always reflects the outcome.

Suggested change

if [ -z "$STRICT_STREAM" ] || ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then

# Only fail if it was a curl failure (empty), not a missing event

[ -z "$STRICT_STREAM" ] && fail "Strict mode: stream was empty"

if [ -z "$STRICT_STREAM" ]; then

fail "Strict mode: stream was empty"

elif ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then

fail "Strict mode: stream missing terminating [DONE] event"

Copilot · 2026-04-23T03:37:56Z

+log "Setting up virtual environment with openai SDK..."
+VENV_DIR="/tmp/opencode_venv"
+python3 -m venv "$VENV_DIR"
+"$VENV_DIR/bin/pip" install --quiet openai


The virtualenv path is hard-coded to /tmp/opencode_venv and the script doesn’t clean it up. Re-running locally (or in environments where /tmp persists) can cause python3 -m venv to fail because the directory already exists.

Suggested fix: use mktemp -d for an isolated venv dir and delete it in cleanup(), or rm -rf "$VENV_DIR" before creating it.

jankaderabek · 2026-04-23T06:39:24Z

@solderzzc Thank you for finishing up the PR so fast 👍

solderzzc · 2026-04-23T06:43:50Z

@jankaderabel, Thank you for your bug report with PR. Let me know if there’s any other issues.

…

On Wed, Apr 22, 2026 at 11:39 PM Jan Kaderabek ***@***.***> wrote: *jankaderabek* left a comment (SharpAI/SwiftLM#74) <#74?email_source=notifications&email_token=AAXRJ7HOTKVRJ6WFZNXCSBL4XG3DFA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMZQGIZDGNBUGYYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJNLQOJPWG33NNVSW45C7N5YGK3S7MNWGSY3L#issuecomment-4302234460> @solderzzc <https://github.com/solderzzc> Thank you for finishing up the PR so fast 👍 — Reply to this email directly, view it on GitHub <#74?email_source=notifications&email_token=AAXRJ7HOTKVRJ6WFZNXCSBL4XG3DFA5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMZQGIZDGNBUGYYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJNLQOJPWG33NNVSW45C7N5YGK3S7MNWGSY3L#issuecomment-4302234460>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXRJ7FMBQHHDGF634JYVWD4XG3DFAVCNFSM6AAAAACYC3TMYOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DGMBSGIZTINBWGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

make OpenAI streaming strict by default

39015cf

solderzzc reviewed Apr 22, 2026

View reviewed changes

test: implement OpenCode e2e validation testing with gemma-4

64cbdfc

solderzzc mentioned this pull request Apr 22, 2026

feat: OpenAI-compatible streaming hardening (prefill heartbeat + OpenCode e2e CI) #75

Closed

github-actions Bot added 2 commits April 22, 2026 15:56

solderzzc marked this pull request as ready for review April 23, 2026 03:29

Copilot AI review requested due to automatic review settings April 23, 2026 03:29

solderzzc merged commit 743b1a1 into SharpAI:main Apr 23, 2026
10 checks passed

Copilot started reviewing on behalf of solderzzc April 23, 2026 03:29 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[codex] make OpenAI streaming strict by default#74

[codex] make OpenAI streaming strict by default#74
solderzzc merged 4 commits intoSharpAI:mainfrom
jankaderabek:codex/openai-streaming-compat

jankaderabek commented Apr 22, 2026

Uh oh!

solderzzc left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

jankaderabek commented Apr 23, 2026

Uh oh!

solderzzc commented Apr 23, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Requires: python3, pip (installs openai package dynamically)

Conversation

jankaderabek commented Apr 22, 2026

Uh oh!

solderzzc left a comment

Choose a reason for hiding this comment

Review: [codex] make OpenAI streaming strict by default

✅ What's Good

🟡 Concerns

1. activePrefillProgressHook race condition (existing + worsened)

2. handleTextCompletion missing request parameter in one call site

3. Heartbeat Task leaks if the client disconnects before prefill completes

4. ssePrefillChunk — modelId parameter removed but sseChunk still takes it

5. Test: dropLast(4) is fragile

🔵 Minor / Nits

Summary

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

jankaderabek commented Apr 23, 2026

Uh oh!

solderzzc commented Apr 23, 2026 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Review: `[codex] make OpenAI streaming strict by default`

1. `activePrefillProgressHook` race condition (existing + worsened)

2. `handleTextCompletion` missing `request` parameter in one call site

3. Heartbeat `Task` leaks if the client disconnects before prefill completes

4. `ssePrefillChunk` — `modelId` parameter removed but `sseChunk` still takes it

5. Test: `dropLast(4)` is fragile