Conversation
Co-authored-by: CodebuffAI <189203002+CodebuffAI@users.noreply.github.com>
Greptile SummaryThis PR introduces a freebuff waiting room — a queue-based admission system that gates free-tier CLI users before allowing them to send chat requests. It is implemented end-to-end across the server (new Key design decisions:
Notable issues found:
Confidence Score: 4/5Safe to merge behind the feature flag; the queued-row TTL gap is real but won't manifest until the feature is enabled in production. The implementation is thorough and well-tested with dependency injection throughout. The feature defaults to OFF so no existing request path is affected. The main gap — no TTL on queued rows — is a correctness concern once the feature goes live but does not break the primary admission or chat path. The two P2 issues are minor observability/cosmetic concerns. web/src/server/free-session/store.ts (queued-row TTL), web/src/server/free-session/public-api.ts (null instanceId path and non-transactional reads) Important Files Changed
Sequence DiagramsequenceDiagram
participant CLI
participant SessionAPI as /api/v1/freebuff/session
participant ChatAPI as /api/v1/chat/completions
participant DB as free_session (Postgres)
participant Ticker as AdmissionTicker
CLI->>SessionAPI: POST /session (on startup)
SessionAPI->>DB: joinOrTakeOver (UPSERT)
DB-->>SessionAPI: row {status: queued, instanceId}
SessionAPI-->>CLI: {status: queued, position, estimatedWaitMs}
loop Poll every 5s while queued
CLI->>SessionAPI: GET /session
SessionAPI->>DB: getSessionRow
DB-->>SessionAPI: row
SessionAPI-->>CLI: {status: queued | active}
end
Ticker->>DB: sweepExpired() + admitFromQueue() every 5s
Note over Ticker,DB: advisory lock prevents double-admission across pods
DB-->>Ticker: admitted rows
Note over CLI: bell plays on queued→active transition
CLI->>ChatAPI: POST /completions + freebuff_instance_id
ChatAPI->>DB: checkSessionAdmissible(userId, instanceId)
alt session active and instance matches
DB-->>ChatAPI: {ok: true}
ChatAPI-->>CLI: 200 stream
else queued / expired / superseded
DB-->>ChatAPI: {ok: false, code}
ChatAPI-->>CLI: 428/429/409/410
CLI->>SessionAPI: refreshFreebuffSession() or markSuperseded()
end
CLI->>SessionAPI: DELETE /session (on exit)
SessionAPI->>DB: endSession(userId)
|
| sql`(${schema.freeSession.queued_at}, ${schema.freeSession.user_id}) <= (${params.queuedAt}, ${params.userId})`, | ||
| ), | ||
| ) | ||
| return Number(rows[0]?.n ?? 0) | ||
| } | ||
|
|
||
| /** Remove rows whose active session has expired. Safe to call repeatedly. */ | ||
| export async function sweepExpired(now: Date): Promise<number> { | ||
| const deleted = await db |
There was a problem hiding this comment.
No TTL on queued rows — stale entries accumulate indefinitely
sweepExpired only removes rows where status = 'active' AND expires_at < now. Queued rows are never automatically removed. A user who joins the queue (POST /session) and then hard-crashes or loses network before their CLI can send the fire-and-forget DELETE will leave a stale status=queued row in the table forever. These phantom entries:
- Inflate
queueDepth()for all other users, making wait-time estimates wrong. - Keep would-be new users at artificially higher positions until a real admin runs cleanup.
Consider adding a queued_at + max_queue_wait expiry sweep alongside the active-session sweep, e.g.:
// In sweepExpired (store.ts): also remove queued rows idle for too long
await db.delete(schema.freeSession).where(
and(
eq(schema.freeSession.status, 'queued'),
lt(schema.freeSession.queued_at, new Date(now.getTime() - MAX_QUEUE_IDLE_MS)),
),
)Alternatively, track the last heartbeat time and sweep queued rows that haven't polled in N minutes.
| if (!row.expires_at || row.expires_at.getTime() <= now.getTime()) { | ||
| return { | ||
| ok: false, | ||
| code: 'session_expired', | ||
| message: 'Your free session has expired. Re-join the waiting room via POST /api/v1/freebuff/session.', | ||
| } | ||
| } | ||
|
|
||
| if (!params.claimedInstanceId || params.claimedInstanceId !== row.active_instance_id) { |
There was a problem hiding this comment.
Null
claimedInstanceId returns session_superseded instead of a distinct error
When claimedInstanceId is null or undefined (absent from the request), the code falls through to the session_superseded branch with the message "Another instance of freebuff has taken over this session." This conflates two separate conditions — a missing identifier vs. a rotated one — under the same 409 code.
In the current UI flow this is safe (the WaitingRoomScreen gate prevents users from sending messages before their instance ID is populated), but it makes server-side observability harder and would produce a confusing 409 for any future client that omits the field.
Consider splitting the two cases:
if (!params.claimedInstanceId) {
return {
ok: false,
code: 'waiting_room_required',
message: 'freebuff_instance_id missing from request. Call POST /api/v1/freebuff/session first.',
}
}
if (params.claimedInstanceId !== row.active_instance_id) {
return { ok: false, code: 'session_superseded', message: '...' }
}| | { ok: false; code: 'session_superseded'; message: string } | ||
| | { ok: false; code: 'session_expired'; message: string } | ||
|
|
||
| /** | ||
| * Called from the chat/completions hot path for free-mode requests. Either | ||
| * returns `{ ok: true }` (request may proceed) or a structured rejection | ||
| * the caller translates into a 4xx response. | ||
| * | ||
| * Never trusts client timestamps. The caller supplies `claimedInstanceId` |
There was a problem hiding this comment.
queuePositionFor and queueDepth are not fetched in a transaction
These two queries run in parallel outside a transaction. If users are admitted between the two reads, position could exceed depth (e.g. position=3, depth=2), which would be confusing on the client's waiting-room display. This is cosmetic rather than a correctness bug, but wrapping both in a single read-only transaction or a WITH CTE would eliminate the inconsistency.
Replicas=0 or no replicas metric at all (the deployment has been scaled to zero or dropped from the scrape) now flips that deployment's health to unhealthy unconditionally, so admission fails closed instead of funneling users to a backend that cannot serve traffic. Also drop generationQueueMs degraded 5000 -> 400 and ttftMs degraded 8000 -> 2000, and comment out the kimi deployment since only glm-5.1 is in production. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
FREEBUFF_MAX_CONCURRENT_SESSIONS is gone. Admission now runs purely as a drip (MAX_ADMITS_PER_TICK=1 every 15s) gated by the Fireworks health monitor — utilisation ramps up slowly and pauses the moment metrics degrade, so a static cap is redundant. Renamed SessionDeps' getMaxConcurrentSessions/getSessionLengthMs to getAdmissionTickMs/getMaxAdmitsPerTick (those are what the wait-time estimate actually needs now). estimateWaitMs is rewritten from the session-cycle model to the drip model: waitMs = ceil((position - 1) / maxAdmitsPerTick) * admissionTickMs Dropped the 'full' branch of AdmissionTickResult and the full-capacity admission test — the only reason admission skips now is health. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stdin is in raw mode on these screens, so SIGINT never fires — Ctrl+C had no effect and users had to kill the process. Now both screens hook Ctrl+C via OpenTUI's useKeyboard, flush analytics with a 1s cap, and exit. The waiting-room screen additionally sends a best-effort DELETE /api/v1/freebuff/session before exit so the seat frees up immediately instead of waiting on the server-side expiry sweep. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
p50 TTFT degraded 1000 → 1500ms and p50 generation queue degraded 200 → 300ms, so a healthy deployment running at steady-state 1s TTFT does not trip the admission gate. scripts/scrape-check.ts pulls the live Fireworks metrics and prints the same per-deployment health the admission gate sees — useful for tuning thresholds without guessing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep admitting requests for FREEBUFF_SESSION_GRACE_MS (default 30m) after a session's expires_at so in-flight agent runs can drain; hard cutoff past that. Also: replicas=0 → unhealthy, hoist chat/completions gate status map, fix stale threshold comment and a pre-existing free-mode test missing the checkSessionAdmissible override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the server flips the seat into `draining` (past `expires_at`) or past the hard cutoff, the chat input is replaced by a "Session ended" banner. While an agent is still streaming under the grace window, Enter is disabled and the banner shows "Agent is wrapping up. Rejoin the wait room after it's finished." — Esc still interrupts. Once idle, Enter re-POSTs /session to drop back into the waiting room. Adds a small countdown to the far right of the status bar (muted, turning soft warning in the final minute — no red) and schedules the next poll just after expires_at / gracePeriodEndsAt so the draining transition shows up promptly instead of stalling at 0 for a full interval. Moves getFreebuffInstanceId onto the session hook's module handle and deletes the now-vestigial freebuff-session-store. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Delete the 1.5k-LOC fireworks-monitor package (Prometheus scrape, health computation, admin endpoint, CLI scripts) in favor of a single-function reachability probe inline in free-session/admission.ts: GET the account metrics endpoint with a 5s timeout and fail closed on non-OK. The full-health-scoring machinery was load-bearing on nothing — admission only ever read the boolean gate, and reachability is what actually matters for halting during an outage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the waiting-room gate rejects a free-mode request and no freebuff_instance_id was sent, return 426 with a "please restart to upgrade" message. Old CLI versions render the message verbatim in their error banner; new clients still get the normal gate responses. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Collapse client-side draining/ended into a single ended state, move freebuff session state into zustand (replacing the module-level handle singleton), host the Fireworks probe inside admitFromQueue, and share the wire types between server and CLI. Drops ~150 lines net. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The CLI used to layer two client-only states (`superseded`, `ended`) on
top of the server's response and translate `draining` → `ended` in a
mapper. Replaces all of that with a single union the server emits
directly:
- GET /session now reads `X-Freebuff-Instance-Id` and returns
`{ status: 'superseded' }` when the active row's id no longer
matches. Previously this was inferred client-side from the chat
gate's 409.
- The wire's `draining` is renamed to `ended` and carries the same
grace fields. The CLI's post-grace synthetic `ended` (no
instanceId) reuses that shape.
Also drops the zustand `driver` indirection — imperative session
controls (refresh / mark superseded / mark ended) live as module-level
functions on the hook, talking to a private controller ref. Combines
`refreshFreebuffSession` and `rejoinFreebuffSession` into one with an
optional `{ resetChat }` flag. Inlines the constant getters in
`SessionDeps`/`AdmissionDeps` so tests pass plain numbers, drops the
`limit` parameter from `admitFromQueue` (always 1), and consolidates
the three 1-Hz UI tickers into a shared `useNow` hook.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ambient bar fills proportionally to time remaining so users have a passive sense of session progress; a bold warning-colored "X:XX" readout appears only for the final five minutes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No description provided.