Skip to content

[codex] make OpenAI streaming strict by default#74

Merged
solderzzc merged 4 commits intoSharpAI:mainfrom
jankaderabek:codex/openai-streaming-compat
Apr 23, 2026
Merged

[codex] make OpenAI streaming strict by default#74
solderzzc merged 4 commits intoSharpAI:mainfrom
jankaderabek:codex/openai-streaming-compat

Conversation

@jankaderabek
Copy link
Copy Markdown
Contributor

This removes non-standard prefill progress data: frames from /v1/chat/completions and /v1/completions by default.

Previously, SwiftLM could emit custom prefill_progress payloads during long prompt prefill. Strict OpenAI-compatible clients such as OpenCode may reject those frames because they are neither standard OpenAI stream chunks nor error objects.

Changes:

  • disable prefill progress frames by default on OpenAI-compatible streaming endpoints
  • add opt-in support via X-SwiftLM-Prefill-Progress: true
  • emit opt-in progress as named SSE events (event: prefill_progress) instead of custom OpenAI-like chunks
  • add focused tests for header parsing and SSE payload shape

This keeps default streaming wire-compatible while preserving optional prefill observability for custom clients.

Copy link
Copy Markdown
Member

@solderzzc solderzzc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: [codex] make OpenAI streaming strict by default

Overall: ✅ Approve with minor suggestions. The direction is correct — gating prefill_progress behind an opt-in header is the right call for OpenAI-compatibility. A few concurrency and API-design nits worth addressing before merging.


✅ What's Good

  • Named SSE event (event: prefill_progress) is the right wire format. Strict clients skip unknown named events; they would choke on a pseudo-OpenAI data: chunk with "object": "prefill_progress".
  • Leaner payload — dropping id, created, model, and the nested prefill key mirrors llama-server's slot_update and is easier to parse.
  • CORS header updated to expose X-SwiftLM-Prefill-Progress — easy to forget, good catch.
  • Unit tests in ServerSSETests cover the truthy-header parser and the new SSE shape cleanly.

🟡 Concerns

1. activePrefillProgressHook race condition (existing + worsened)

activePrefillProgressHook = nil          // ← reset
if emitPrefillProgress {
    activePrefillProgressHook = {  }    // ← set
    Task {  }                            // ← separate Task reads it via the global
}

activePrefillProgressHook is a mutable global. Two concurrent requests will clobber each other's hook — the second nil assignment will silence the first request's heartbeat, or the first request's hook will fire into the second request's cont. This was already racy before this PR, but the new nil-then-set pattern makes the window explicit.

Suggestion: thread the hook through PrefillState or pass it as a closure parameter to handleChatStreaming/handleTextStreaming so it's scoped to the request.

2. handleTextCompletion missing request parameter in one call site

The diff patches the /v1/completions route handler to pass request:, but double-check there isn't a second call site (e.g. an internal test helper) that invokes handleTextCompletion without the new label — this would be a silent compile error caught only on full build.

3. Heartbeat Task leaks if the client disconnects before prefill completes

Task {
    var elapsed = 0
    while await !prefillState.done {  }   // loops until done
}

If the HTTP connection drops, cont will be cancelled but the Task continues looping until prefillState.finish() is called. Since finish() is only called on firstToken or .info, a request that errors out during prefill will spin for up to 2 s per iteration until the server is restarted. Consider holding a reference to the Task and calling .cancel() in the stream's termination path.

4. ssePrefillChunkmodelId parameter removed but sseChunk still takes it

The function signature change (modelId removed) is consistent with the new lean payload, but the doc comment still refers to "llama-server's slot_update event" which includes a model field. If downstream consumers need to correlate which model is prefilling (relevant for multi-model deployments), they can't — but they can infer from the stream's earlier chunks. Leaving this as-is is acceptable; just worth a doc-comment update to say model is intentionally omitted.

5. Test: dropLast(4) is fragile

let payload = String(chunk.dropFirst(prefix.count).dropLast(4))

\r\n\r\n is 4 bytes in UTF-8, but dropLast operates on Character (Unicode scalars), not bytes. This works for ASCII but would silently mis-parse if the JSON ever contains multi-byte characters right before the terminator. Prefer:

XCTAssertTrue(chunk.hasSuffix("\r\n\r\n"))
let trimmed = chunk
    .dropFirst(prefix.count)
    .dropLast("\r\n\r\n".count)

…which is still correct because you already assert hasSuffix above it.


🔵 Minor / Nits

  • The on truthy value (parseTruthyHeaderValue) is unusual for HTTP headers — true/1/yes is standard, on is an HTML form artifact. Low risk, but on could be dropped.
  • Consider documenting X-SwiftLM-Prefill-Progress in the README alongside --turbo-kv and kv_bits (per the pattern established in PR #73).

Summary

The core change is sound. Before merging I'd prioritize item 1 (global hook race) and item 3 (Task leak) since both affect production correctness under concurrent load. Items 4–5 are polish.

… fix CORS/parallel test gaps

- Server.swift: add defer-based heartbeat cleanup in both handleChatStreaming and
  handleTextStreaming so heartbeatTask is always cancelled on any exit path
  (client disconnect during prefill no longer leaks the heartbeat task)
- ServerSSETests.swift: add missing import Foundation for Data/JSONSerialization
- test-server.sh Test 32: fail on empty curl response instead of false-passing
- test-server.sh Test 33: use conditional curl; fail if request fails entirely
- test-server.sh Test 34: redirect CORS preflight to CORS_PORT (--cors server)
  instead of the main server which has no CORS middleware
- test-server.sh Test 35: spin up a dedicated --parallel 2 server so concurrent
  requests actually overlap and stress the global hook under real parallelism
- test-opencode.sh: capture opencode exit code separately; classify parse errors
  vs acceptable non-zero exits to prevent false passes
…ch in Tests 32-33

The new conditional curl patterns in Tests 32 and 33 combined with the
existing set -euo pipefail caused the script to abort when grep found
no match (exit 1) in the EVENT_DATA pipeline. All grep/jq calls that
may produce no output now use || true or are wrapped in if/else to
prevent premature script exit.
@solderzzc solderzzc marked this pull request as ready for review April 23, 2026 03:29
Copilot AI review requested due to automatic review settings April 23, 2026 03:29
@solderzzc solderzzc merged commit 743b1a1 into SharpAI:main Apr 23, 2026
10 checks passed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes OpenAI-compatible streaming strict by default by preventing non-standard prefill-progress frames from appearing in /v1/chat/completions and /v1/completions, while preserving opt-in prefill observability via X-SwiftLM-Prefill-Progress: true using named SSE events.

Changes:

  • Gate prefill progress emission behind a request header and emit it as event: prefill_progress with a lean JSON payload.
  • Add Swift unit tests for header parsing, SSE formatting, and PrefillState behavior; extend shell integration tests (including CORS coverage).
  • Add an optional CI modality to exercise OpenAI Python SDK + OpenCode CLI parsing.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
Sources/SwiftLM/Server.swift Adds header-based opt-in, named SSE event formatting, and CORS allow-header update.
tests/SwiftLMTests/ServerSSETests.swift New XCTest coverage for truthy header parsing, SSE payload shape, and PrefillState guards.
tests/test-server.sh Adds integration checks for strict default streaming, opt-in SSE events, CORS preflight, and concurrency scenario.
tests/test-opencode.sh New integration test script for OpenAI Python SDK + OpenCode CLI SSE parsing.
Package.swift Adds SwiftLMTests as a SwiftPM test target.
.github/workflows/ci.yml Runs SwiftLMTests in CI and adds an optional opencode integration modality.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread tests/test-opencode.sh
Comment on lines +7 to +8
# Requires: python3, pip (installs openai package dynamically)

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header comment says this script only requires Python/pip, but it also uses npm/npx later. Suggested fix: document Node/npm as prerequisites (or gate the OpenCode CLI test behind a command -v npm check with a clear skip message).

Copilot uses AI. Check for mistakes.
Comment thread tests/test-opencode.sh
Comment on lines +123 to +127
log "Installing opencode-ai in isolated directory..."
mkdir -p /tmp/opencode_cli_test
cd /tmp/opencode_cli_test
npm install opencode-ai@latest --silent >/dev/null 2>&1

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

npm install opencode-ai@latest makes the test non-deterministic and more likely to break over time. Suggested fix: pin the package version (or source it from a repo-controlled variable/file) so CI failures reflect server changes rather than upstream releases.

Copilot uses AI. Check for mistakes.
Comment on lines +1397 to 1424
// We capture the hook in a local variable so that concurrent requests
// cannot clobber each other's hook via the global. The global is still
// written here because LLMModel.prepare() reads it, but the semaphore
// ensures only one generation runs at a time.
var heartbeatTask: Task<Void, Never>? = nil
activePrefillProgressHook = nil
if emitPrefillProgress {
// Hook is scoped to this request: the local prefillState is the only
// shared state, and it is actor-isolated.
activePrefillProgressHook = { nPast, _ in
Task { await prefillState.update(nPast: nPast) }
}
heartbeatTask = Task {
var elapsed = 0
while await !prefillState.done {
try? await Task.sleep(for: .seconds(2))
// Guard against Task cancellation on client disconnect.
guard !Task.isCancelled else { break }
if await !prefillState.done {
elapsed += 2
let nPast = await prefillState.nPast
_ = cont.yield(ssePrefillChunk(
nPast: nPast,
promptTokens: promptTokenCount,
elapsedSeconds: elapsed))
}
}
}
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

activePrefillProgressHook is a shared global, but this code assumes the semaphore means only one generation runs at a time. AsyncSemaphore(limit: parallelSlots) allows multiple concurrent generations when --parallel > 1, so concurrent requests can overwrite/clear each other’s hook (including non-opt-in requests setting it to nil), leading to missing/misattributed progress events and potential races inside LLMModel.prepare().

Suggested fix: don’t write per-request closures into the global. Set the global once to forward updates into a thread-safe registry (e.g., an actor that multiplexes to per-request subscribers), or disable this opt-in feature when parallelSlots > 1 and document the limitation.

Suggested change
// We capture the hook in a local variable so that concurrent requests
// cannot clobber each other's hook via the global. The global is still
// written here because LLMModel.prepare() reads it, but the semaphore
// ensures only one generation runs at a time.
var heartbeatTask: Task<Void, Never>? = nil
activePrefillProgressHook = nil
if emitPrefillProgress {
// Hook is scoped to this request: the local prefillState is the only
// shared state, and it is actor-isolated.
activePrefillProgressHook = { nPast, _ in
Task { await prefillState.update(nPast: nPast) }
}
heartbeatTask = Task {
var elapsed = 0
while await !prefillState.done {
try? await Task.sleep(for: .seconds(2))
// Guard against Task cancellation on client disconnect.
guard !Task.isCancelled else { break }
if await !prefillState.done {
elapsed += 2
let nPast = await prefillState.nPast
_ = cont.yield(ssePrefillChunk(
nPast: nPast,
promptTokens: promptTokenCount,
elapsedSeconds: elapsed))
}
}
}
// Do not install a per-request closure into `activePrefillProgressHook`.
// That hook is shared global state, and concurrent requests can run when
// the server is configured with parallel generation slots. Writing it here
// would allow one request to overwrite or clear another request's hook,
// causing missing or misattributed progress events and introducing races
// around `LLMModel.prepare()`.
//
// A safe implementation requires a single global forwarder plus a
// concurrency-safe registry, or serialization of this feature. Until that
// exists, leave the shared hook untouched in this request path.
var heartbeatTask: Task<Void, Never>? = nil
if emitPrefillProgress {
// Prefill progress reporting is intentionally disabled here because it
// cannot be implemented safely with the current shared global hook.

Copilot uses AI. Check for mistakes.
Comment thread tests/test-server.sh
Comment on lines +976 to +978
if [ -z "$STRICT_STREAM" ] || ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then
# Only fail if it was a curl failure (empty), not a missing event
[ -z "$STRICT_STREAM" ] && fail "Strict mode: stream was empty"
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here can silently produce neither PASS nor FAIL when the stream is non-empty but missing the terminating data: [DONE] (the branch runs but only fails on empty output). That makes this check non-deterministic and can hide truncated-stream regressions.

Suggested fix: explicitly fail (or at least emit a WARN/PASS) when [DONE] is missing so the test result always reflects the outcome.

Suggested change
if [ -z "$STRICT_STREAM" ] || ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then
# Only fail if it was a curl failure (empty), not a missing event
[ -z "$STRICT_STREAM" ] && fail "Strict mode: stream was empty"
if [ -z "$STRICT_STREAM" ]; then
fail "Strict mode: stream was empty"
elif ! echo "$STRICT_STREAM" | grep -q 'data: \[DONE\]'; then
fail "Strict mode: stream missing terminating [DONE] event"

Copilot uses AI. Check for mistakes.
Comment thread tests/test-opencode.sh
Comment on lines +51 to +54
log "Setting up virtual environment with openai SDK..."
VENV_DIR="/tmp/opencode_venv"
python3 -m venv "$VENV_DIR"
"$VENV_DIR/bin/pip" install --quiet openai
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The virtualenv path is hard-coded to /tmp/opencode_venv and the script doesn’t clean it up. Re-running locally (or in environments where /tmp persists) can cause python3 -m venv to fail because the directory already exists.

Suggested fix: use mktemp -d for an isolated venv dir and delete it in cleanup(), or rm -rf "$VENV_DIR" before creating it.

Copilot uses AI. Check for mistakes.
@jankaderabek
Copy link
Copy Markdown
Contributor Author

@solderzzc Thank you for finishing up the PR so fast 👍

@solderzzc
Copy link
Copy Markdown
Member

solderzzc commented Apr 23, 2026 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants