Skip to content

Commit f5f2f60

Browse files
jahoomaclaude
andcommitted
Drop MAX_CONCURRENT_SESSIONS; drip admission is sole concurrency control
FREEBUFF_MAX_CONCURRENT_SESSIONS is gone. Admission now runs purely as a drip (MAX_ADMITS_PER_TICK=1 every 15s) gated by the Fireworks health monitor — utilisation ramps up slowly and pauses the moment metrics degrade, so a static cap is redundant. Renamed SessionDeps' getMaxConcurrentSessions/getSessionLengthMs to getAdmissionTickMs/getMaxAdmitsPerTick (those are what the wait-time estimate actually needs now). estimateWaitMs is rewritten from the session-cycle model to the drip model: waitMs = ceil((position - 1) / maxAdmitsPerTick) * admissionTickMs Dropped the 'full' branch of AdmissionTickResult and the full-capacity admission test — the only reason admission skips now is health. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4a0efb8 commit f5f2f60

File tree

10 files changed

+92
-117
lines changed

10 files changed

+92
-117
lines changed

docs/freebuff-waiting-room.md

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44

55
The waiting room is the admission control layer for **free-mode** requests against the freebuff Fireworks deployment. It has three jobs:
66

7-
1. **Bound concurrency**cap the number of simultaneously-active free users so one deployment does not degrade under load.
8-
2. **Gate on upstream health** — only admit new users while the Fireworks deployment is reporting `healthy` (via the separate monitor in `web/src/server/fireworks-monitor/`).
7+
1. **Drip-admit users**admit at a steady trickle (default 1 per 15s) so load ramps up gradually rather than stampeding the deployment when the queue is long.
8+
2. **Gate on upstream health** — only admit new users while the Fireworks deployment is reporting `healthy` (via the separate monitor in `web/src/server/fireworks-monitor/`). Once metrics degrade, admission halts until they recover — this is the primary concurrency control, not a static cap.
99
3. **One instance per account** — prevent a single user from running N concurrent freebuff CLIs to get N× throughput.
1010

1111
Users who cannot be admitted immediately are placed in a FIFO queue and given an estimated wait time. Admitted users get a fixed-length session (default 1h) during which they can make free-mode requests subject to the existing per-user rate limits.
@@ -20,7 +20,6 @@ FREEBUFF_WAITING_ROOM_ENABLED=false
2020

2121
# Other knobs (only read when enabled)
2222
FREEBUFF_SESSION_LENGTH_MS=3600000 # 1 hour
23-
FREEBUFF_MAX_CONCURRENT_SESSIONS=50
2423
```
2524

2625
Flipping the flag is safe at runtime: existing rows stay in the DB and will be admitted / expired correctly whenever the flag is flipped back on.
@@ -127,17 +126,15 @@ Each tick does (in order):
127126

128127
1. **Sweep expired.** `DELETE FROM free_session WHERE status='active' AND expires_at < now()`. Runs regardless of upstream health so zombie sessions are cleaned up even during an outage.
129128
2. **Check upstream health.** `isFireworksAdmissible()` from the monitor. If not `healthy`, skip admission for this tick (queue grows; users see `status: 'queued'` with increasing position).
130-
3. **Measure capacity.** `capacity = min(MAX_CONCURRENT - activeCount, MAX_ADMITS_PER_TICK)`. `MAX_ADMITS_PER_TICK=20` caps thundering-herd admission when a large block of sessions expires simultaneously.
131-
4. **Admit.** `SELECT ... WHERE status='queued' ORDER BY queued_at, user_id LIMIT capacity FOR UPDATE SKIP LOCKED`, then `UPDATE` those rows to `status='active'` with `admitted_at=now()`, `expires_at=now()+sessionLength`.
129+
3. **Admit.** `SELECT ... WHERE status='queued' ORDER BY queued_at, user_id LIMIT MAX_ADMITS_PER_TICK FOR UPDATE SKIP LOCKED`, then `UPDATE` those rows to `status='active'` with `admitted_at=now()`, `expires_at=now()+sessionLength`. Staggering the queue at `MAX_ADMITS_PER_TICK=1` / 15s keeps Fireworks from getting hit by a thundering herd of newly-admitted CLIs; once metrics show the deployment is saturated, step 2 halts further admissions.
132130

133131
### Tunables
134132

135133
| Constant | Location | Default | Purpose |
136134
|---|---|---|---|
137-
| `ADMISSION_TICK_MS` | `config.ts` | 5000 | How often the ticker fires |
138-
| `MAX_ADMITS_PER_TICK` | `config.ts` | 20 | Upper bound on admits per tick |
135+
| `ADMISSION_TICK_MS` | `config.ts` | 15000 | How often the ticker fires |
136+
| `MAX_ADMITS_PER_TICK` | `config.ts` | 1 | Upper bound on admits per tick |
139137
| `FREEBUFF_SESSION_LENGTH_MS` | env | 3_600_000 | Session lifetime |
140-
| `FREEBUFF_MAX_CONCURRENT_SESSIONS` | env | 50 | Global active-session cap |
141138

142139
## HTTP API
143140

@@ -210,18 +207,18 @@ When the waiting room is disabled, the gate returns `{ ok: true, reason: 'disabl
210207

211208
## Estimated Wait Time
212209

213-
Computed in `session-view.ts` as an **upper bound** that assumes uniform session expiry:
210+
Computed in `session-view.ts` from the drip-admission rate:
214211

215212
```
216-
waves = floor((position - 1) / maxConcurrent)
217-
waitMs = waves * sessionLengthMs
213+
ticksAhead = ceil((position - 1) / maxAdmitsPerTick)
214+
waitMs = ticksAhead * admissionTickMs
218215
```
219216

220-
- Position 1..`maxConcurrent` → 0 (next tick will admit them)
221-
- Position `maxConcurrent`+1..`2*maxConcurrent` → one full session length
217+
- Position 1 → 0 (next tick admits you)
218+
- Position `maxAdmitsPerTick` + 1 → one tick
222219
- and so on.
223220

224-
Actual wait is usually shorter because users call `DELETE /session` on CLI exit and sessions turn over naturally. We show an upper bound because under-promising on wait time is better UX than surprise delays.
221+
This estimate **ignores health-gated pauses**: during a Fireworks incident admission halts entirely, so the actual wait can be longer. We choose to under-report here because showing "unknown" / "indefinite" is worse UX for the common case where the deployment is healthy.
225222

226223
## CLI Integration (frontend-side contract)
227224

packages/internal/src/env-schema.ts

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ export const serverEnvSchema = clientEnvSchema.extend({
4242
.default('false')
4343
.transform((v) => v === 'true'),
4444
FREEBUFF_SESSION_LENGTH_MS: z.coerce.number().int().positive().default(60 * 60 * 1000),
45-
FREEBUFF_MAX_CONCURRENT_SESSIONS: z.coerce.number().int().positive().default(50),
4645
})
4746
export const serverEnvVars = serverEnvSchema.keyof().options
4847
export type ServerEnvVar = (typeof serverEnvVars)[number]
@@ -94,5 +93,4 @@ export const serverProcessEnv: ServerInput = {
9493
// Freebuff waiting room
9594
FREEBUFF_WAITING_ROOM_ENABLED: process.env.FREEBUFF_WAITING_ROOM_ENABLED,
9695
FREEBUFF_SESSION_LENGTH_MS: process.env.FREEBUFF_SESSION_LENGTH_MS,
97-
FREEBUFF_MAX_CONCURRENT_SESSIONS: process.env.FREEBUFF_MAX_CONCURRENT_SESSIONS,
9896
}

web/src/app/api/v1/freebuff/session/__tests__/session.test.ts

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@ function makeSessionDeps(overrides: Partial<SessionDeps> = {}): SessionDeps & {
2828
return {
2929
rows,
3030
isWaitingRoomEnabled: () => true,
31-
getMaxConcurrentSessions: () => 10,
32-
getSessionLengthMs: () => 60 * 60_000,
31+
getAdmissionTickMs: () => 15_000,
32+
getMaxAdmitsPerTick: () => 1,
3333
now: () => now,
3434
getSessionRow: async (userId) => rows.get(userId) ?? null,
3535
queueDepth: async () => [...rows.values()].filter((r) => r.status === 'queued').length,

web/src/server/free-session/__tests__/admission.test.ts

Lines changed: 8 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -20,53 +20,36 @@ function makeAdmissionDeps(overrides: Partial<AdmissionDeps> = {}): AdmissionDep
2020
return Array.from({ length: limit }, (_, i) => ({ user_id: `u${i}` }))
2121
},
2222
isFireworksAdmissible: () => true,
23-
getMaxConcurrentSessions: () => 10,
23+
getMaxAdmitsPerTick: () => 1,
2424
getSessionLengthMs: () => 60 * 60 * 1000,
2525
now: () => NOW,
2626
...overrides,
2727
}
2828
}
2929

3030
describe('runAdmissionTick', () => {
31-
test('admits up to (max - active) when healthy', async () => {
32-
const deps = makeAdmissionDeps({
33-
countActive: async () => 3,
34-
getMaxConcurrentSessions: () => 10,
35-
})
31+
test('admits maxAdmitsPerTick when healthy', async () => {
32+
const deps = makeAdmissionDeps({ getMaxAdmitsPerTick: () => 2 })
3633
const result = await runAdmissionTick(deps)
37-
expect(result.admitted).toBe(7)
34+
expect(result.admitted).toBe(2)
3835
expect(result.skipped).toBeNull()
3936
})
4037

41-
test('caps admits per tick at MAX_ADMITS_PER_TICK', async () => {
42-
const deps = makeAdmissionDeps({
43-
countActive: async () => 0,
44-
getMaxConcurrentSessions: () => 1000,
45-
})
38+
test('defaults to 1 admit per tick', async () => {
39+
const deps = makeAdmissionDeps()
4640
const result = await runAdmissionTick(deps)
47-
expect(result.admitted).toBe(20)
41+
expect(result.admitted).toBe(1)
4842
})
4943

5044
test('skips admission when Fireworks not healthy', async () => {
5145
const deps = makeAdmissionDeps({
5246
isFireworksAdmissible: () => false,
53-
countActive: async () => 0,
5447
})
5548
const result = await runAdmissionTick(deps)
5649
expect(result.admitted).toBe(0)
5750
expect(result.skipped).toBe('health')
5851
})
5952

60-
test('skips when at capacity', async () => {
61-
const deps = makeAdmissionDeps({
62-
countActive: async () => 10,
63-
getMaxConcurrentSessions: () => 10,
64-
})
65-
const result = await runAdmissionTick(deps)
66-
expect(result.admitted).toBe(0)
67-
expect(result.skipped).toBe('full')
68-
})
69-
7053
test('sweeps expired sessions even when skipping admission', async () => {
7154
let swept = 0
7255
const deps = makeAdmissionDeps({
@@ -85,10 +68,9 @@ describe('runAdmissionTick', () => {
8568
const deps = makeAdmissionDeps({
8669
sweepExpired: async () => 2,
8770
countActive: async () => 5,
88-
getMaxConcurrentSessions: () => 8,
8971
})
9072
const result = await runAdmissionTick(deps)
9173
expect(result.expired).toBe(2)
92-
expect(result.admitted).toBe(3)
74+
expect(result.admitted).toBe(1)
9375
})
9476
})

web/src/server/free-session/__tests__/public-api.test.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ import type { SessionDeps } from '../public-api'
1111
import type { InternalSessionRow } from '../types'
1212

1313
const SESSION_LEN = 60 * 60 * 1000
14-
const MAX_CONC = 10
14+
const TICK_MS = 15_000
15+
const ADMITS_PER_TICK = 1
1516

1617
function makeDeps(overrides: Partial<SessionDeps> = {}): SessionDeps & {
1718
rows: Map<string, InternalSessionRow>
@@ -35,8 +36,8 @@ function makeDeps(overrides: Partial<SessionDeps> = {}): SessionDeps & {
3536
},
3637
_now: () => currentNow,
3738
isWaitingRoomEnabled: () => true,
38-
getMaxConcurrentSessions: () => MAX_CONC,
39-
getSessionLengthMs: () => SESSION_LEN,
39+
getAdmissionTickMs: () => TICK_MS,
40+
getMaxAdmitsPerTick: () => ADMITS_PER_TICK,
4041
now: () => currentNow,
4142
getSessionRow: async (userId) => rows.get(userId) ?? null,
4243
endSession: async (userId) => {

web/src/server/free-session/__tests__/session-view.test.ts

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ import { estimateWaitMs, toSessionStateResponse } from '../session-view'
44

55
import type { InternalSessionRow } from '../types'
66

7-
const SESSION_LEN = 60 * 60 * 1000
8-
const MAX_CONC = 50
7+
const TICK_MS = 15_000
8+
const ADMITS_PER_TICK = 1
99

1010
function row(overrides: Partial<InternalSessionRow> = {}): InternalSessionRow {
1111
const now = new Date('2026-04-17T12:00:00Z')
@@ -23,35 +23,43 @@ function row(overrides: Partial<InternalSessionRow> = {}): InternalSessionRow {
2323
}
2424

2525
describe('estimateWaitMs', () => {
26-
test('position <= capacity → 0 wait', () => {
27-
expect(estimateWaitMs({ position: 1, maxConcurrent: MAX_CONC, sessionLengthMs: SESSION_LEN })).toBe(0)
28-
expect(estimateWaitMs({ position: MAX_CONC, maxConcurrent: MAX_CONC, sessionLengthMs: SESSION_LEN })).toBe(0)
26+
test('position 1 → 0 wait (next tick picks you up)', () => {
27+
expect(estimateWaitMs({ position: 1, admissionTickMs: TICK_MS, maxAdmitsPerTick: ADMITS_PER_TICK })).toBe(0)
2928
})
3029

31-
test('position in second wave → one full session length', () => {
32-
expect(estimateWaitMs({ position: MAX_CONC + 1, maxConcurrent: MAX_CONC, sessionLengthMs: SESSION_LEN })).toBe(SESSION_LEN)
30+
test('position N → (N-1) ticks ahead at 1 admit/tick', () => {
31+
expect(estimateWaitMs({ position: 2, admissionTickMs: TICK_MS, maxAdmitsPerTick: 1 })).toBe(TICK_MS)
32+
expect(estimateWaitMs({ position: 10, admissionTickMs: TICK_MS, maxAdmitsPerTick: 1 })).toBe(9 * TICK_MS)
3333
})
3434

35-
test('position in third wave → two full session lengths', () => {
36-
expect(estimateWaitMs({ position: 2 * MAX_CONC + 1, maxConcurrent: MAX_CONC, sessionLengthMs: SESSION_LEN })).toBe(2 * SESSION_LEN)
35+
test('batched admission divides wait', () => {
36+
// 5 admits/tick: positions 2-6 all sit one tick ahead.
37+
expect(estimateWaitMs({ position: 2, admissionTickMs: TICK_MS, maxAdmitsPerTick: 5 })).toBe(TICK_MS)
38+
expect(estimateWaitMs({ position: 6, admissionTickMs: TICK_MS, maxAdmitsPerTick: 5 })).toBe(TICK_MS)
39+
// Position 7 enters the second tick.
40+
expect(estimateWaitMs({ position: 7, admissionTickMs: TICK_MS, maxAdmitsPerTick: 5 })).toBe(2 * TICK_MS)
3741
})
3842

3943
test('degenerate inputs return 0', () => {
40-
expect(estimateWaitMs({ position: 0, maxConcurrent: 10, sessionLengthMs: 1000 })).toBe(0)
41-
expect(estimateWaitMs({ position: 5, maxConcurrent: 0, sessionLengthMs: 1000 })).toBe(0)
44+
expect(estimateWaitMs({ position: 0, admissionTickMs: TICK_MS, maxAdmitsPerTick: 1 })).toBe(0)
45+
expect(estimateWaitMs({ position: 5, admissionTickMs: 0, maxAdmitsPerTick: 1 })).toBe(0)
46+
expect(estimateWaitMs({ position: 5, admissionTickMs: TICK_MS, maxAdmitsPerTick: 0 })).toBe(0)
4247
})
4348
})
4449

4550
describe('toSessionStateResponse', () => {
4651
const now = new Date('2026-04-17T12:00:00Z')
52+
const baseArgs = {
53+
admissionTickMs: TICK_MS,
54+
maxAdmitsPerTick: ADMITS_PER_TICK,
55+
}
4756

4857
test('returns null when row is null', () => {
4958
const view = toSessionStateResponse({
5059
row: null,
5160
position: 0,
5261
queueDepth: 0,
53-
maxConcurrent: MAX_CONC,
54-
sessionLengthMs: SESSION_LEN,
62+
...baseArgs,
5563
now,
5664
})
5765
expect(view).toBeNull()
@@ -60,18 +68,17 @@ describe('toSessionStateResponse', () => {
6068
test('queued row maps to queued response with position + wait estimate', () => {
6169
const view = toSessionStateResponse({
6270
row: row({ status: 'queued' }),
63-
position: 51,
64-
queueDepth: 100,
65-
maxConcurrent: MAX_CONC,
66-
sessionLengthMs: SESSION_LEN,
71+
position: 3,
72+
queueDepth: 10,
73+
...baseArgs,
6774
now,
6875
})
6976
expect(view).toEqual({
7077
status: 'queued',
7178
instanceId: 'inst-1',
72-
position: 51,
73-
queueDepth: 100,
74-
estimatedWaitMs: SESSION_LEN,
79+
position: 3,
80+
queueDepth: 10,
81+
estimatedWaitMs: 2 * TICK_MS,
7582
queuedAt: now.toISOString(),
7683
})
7784
})
@@ -83,8 +90,7 @@ describe('toSessionStateResponse', () => {
8390
row: row({ status: 'active', admitted_at: admittedAt, expires_at: expiresAt }),
8491
position: 0,
8592
queueDepth: 0,
86-
maxConcurrent: MAX_CONC,
87-
sessionLengthMs: SESSION_LEN,
93+
...baseArgs,
8894
now,
8995
})
9096
expect(view).toEqual({
@@ -101,8 +107,7 @@ describe('toSessionStateResponse', () => {
101107
row: row({ status: 'active', admitted_at: now, expires_at: new Date(now.getTime() - 1) }),
102108
position: 0,
103109
queueDepth: 0,
104-
maxConcurrent: MAX_CONC,
105-
sessionLengthMs: SESSION_LEN,
110+
...baseArgs,
106111
now,
107112
})
108113
expect(view).toBeNull()

web/src/server/free-session/admission.ts

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
import {
22
ADMISSION_TICK_MS,
33
MAX_ADMITS_PER_TICK,
4-
getMaxConcurrentSessions,
54
getSessionLengthMs,
65
isWaitingRoomEnabled,
76
} from './config'
@@ -20,8 +19,8 @@ let state: AdmissionState | null = null
2019

2120
/** Emit a `[FreeSessionAdmission] snapshot` log every N ticks even when
2221
* nothing changed, so dashboards / alerts have a reliable heartbeat of
23-
* queue depth and active count. At ADMISSION_TICK_MS=5s, 12 ticks = 1 min. */
24-
const SNAPSHOT_EVERY_N_TICKS = 12
22+
* queue depth and active count. At ADMISSION_TICK_MS=15s, 10 ticks = 2.5 min. */
23+
const SNAPSHOT_EVERY_N_TICKS = 10
2524

2625
export interface AdmissionDeps {
2726
sweepExpired: (now: Date) => Promise<number>
@@ -33,7 +32,7 @@ export interface AdmissionDeps {
3332
now: Date
3433
}) => Promise<{ user_id: string }[]>
3534
isFireworksAdmissible: () => boolean
36-
getMaxConcurrentSessions: () => number
35+
getMaxAdmitsPerTick: () => number
3736
getSessionLengthMs: () => number
3837
now?: () => Date
3938
}
@@ -44,7 +43,7 @@ const defaultDeps: AdmissionDeps = {
4443
queueDepth,
4544
admitFromQueue,
4645
isFireworksAdmissible,
47-
getMaxConcurrentSessions,
46+
getMaxAdmitsPerTick: () => MAX_ADMITS_PER_TICK,
4847
getSessionLengthMs,
4948
}
5049

@@ -53,14 +52,19 @@ export interface AdmissionTickResult {
5352
admitted: number
5453
active: number
5554
queueDepth: number
56-
skipped: 'health' | 'full' | null
55+
skipped: 'health' | null
5756
}
5857

5958
/**
6059
* Run a single admission tick:
6160
* 1. Expire sessions past their expires_at.
6261
* 2. If Fireworks is not 'healthy', skip admission (waiting queue grows).
63-
* 3. Admit up to (maxConcurrent - activeCount, MAX_ADMITS_PER_TICK) users.
62+
* 3. Admit up to maxAdmitsPerTick queued users.
63+
*
64+
* There is no global concurrency cap — the Fireworks health monitor is the
65+
* primary gate. Admission drips at (maxAdmitsPerTick / ADMISSION_TICK_MS),
66+
* which drives utilization up slowly; once metrics degrade, step 2 halts
67+
* admission until things recover.
6468
*
6569
* Returns counts for observability. Safe to call concurrently across pods —
6670
* the underlying admit query takes an advisory xact lock.
@@ -80,15 +84,8 @@ export async function runAdmissionTick(
8084
}
8185

8286
const active = await deps.countActive(now)
83-
const max = deps.getMaxConcurrentSessions()
84-
const capacity = Math.min(Math.max(0, max - active), MAX_ADMITS_PER_TICK)
85-
if (capacity === 0) {
86-
const depth = await deps.queueDepth()
87-
return { expired, admitted: 0, active, queueDepth: depth, skipped: 'full' }
88-
}
89-
9087
const admitted = await deps.admitFromQueue({
91-
limit: capacity,
88+
limit: deps.getMaxAdmitsPerTick(),
9289
sessionLengthMs: deps.getSessionLengthMs(),
9390
now,
9491
})
@@ -129,7 +126,6 @@ function runTick() {
129126
expired: result.expired,
130127
active: result.active,
131128
queueDepth: result.queueDepth,
132-
maxConcurrent: getMaxConcurrentSessions(),
133129
skipped: result.skipped,
134130
},
135131
changed ? '[FreeSessionAdmission] tick' : '[FreeSessionAdmission] snapshot',
@@ -158,7 +154,7 @@ export function startFreeSessionAdmission(): boolean {
158154
state = { timer: null, inFlight: null, tickCount: 0 }
159155
runTick()
160156
logger.info(
161-
{ tickMs: ADMISSION_TICK_MS, maxConcurrent: getMaxConcurrentSessions() },
157+
{ tickMs: ADMISSION_TICK_MS, maxAdmitsPerTick: MAX_ADMITS_PER_TICK },
162158
'[FreeSessionAdmission] Started',
163159
)
164160
return true

web/src/server/free-session/config.ts

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,3 @@ export function isWaitingRoomEnabled(): boolean {
2424
export function getSessionLengthMs(): number {
2525
return env.FREEBUFF_SESSION_LENGTH_MS
2626
}
27-
28-
export function getMaxConcurrentSessions(): number {
29-
return env.FREEBUFF_MAX_CONCURRENT_SESSIONS
30-
}

0 commit comments

Comments
 (0)