Skip to content

Support multiple Fireworks deployments independently#514

Closed
jahooma wants to merge 1 commit intomainfrom
jahooma/fireworks-multi-deploy
Closed

Support multiple Fireworks deployments independently#514
jahooma wants to merge 1 commit intomainfrom
jahooma/fireworks-multi-deploy

Conversation

@jahooma
Copy link
Copy Markdown
Contributor

@jahooma jahooma commented Apr 19, 2026

Summary

  • Waiting-room admission now admits if any Fireworks deployment is healthy (was worst-of across all). With one deployment per model — and per country in the future — a degraded deployment for one model shouldn't block users whose model routes elsewhere.
  • DEPLOYMENT_SCALING_UP cooldown is now per-deployment (keyed by deployment path), so one deployment's 503 no longer poisons routing for the others.
  • Replicas within a deployment need no handling: Fireworks aggregates them server-side via the :sum_by_deployment / :avg_by_deployment metric suffixes.

Test plan

  • fireworks-health.test.ts — any-healthy, all-degraded, all-unhealthy cases
  • fireworks-deployment.test.ts — per-deployment cooldown isolation + existing fallback cases (21 tests)
  • tsc --noEmit clean

🤖 Generated with Claude Code

Admit from the waiting room if any deployment is healthy (was worst-of
across all). With one deployment per model — and per country in the
future — a degraded deployment for one model shouldn't block users whose
model routes elsewhere.

Also make the DEPLOYMENT_SCALING_UP cooldown per-deployment; one
deployment's 503 no longer poisons routing for the others.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 19, 2026

Greptile Summary

This PR refactors Fireworks deployment health checking and cooldown tracking to be per-deployment rather than global, so that a degraded or scaling-up deployment for one model doesn't block routing to other models.

Key changes:

  • fireworks-health.ts (classify): Switched from "worst-of" to "best-of" semantics — the waiting room now admits users if any deployment is healthy, and only blocks all users when every deployment is non-healthy.
  • fireworks.ts: Replaced the single global deploymentScalingUpUntil timestamp with a Map<string, number> keyed by deployment path, so a DEPLOYMENT_SCALING_UP 503 on one deployment's cooldown doesn't bleed into other deployments.
  • Tests: Added isolation tests for per-deployment cooldown and three new health classification scenarios (any-healthy, all-degraded, all-unhealthy).
  • .gitignore: Added .gstack/.

Confidence Score: 5/5

Safe to merge — changes are well-scoped, backwards-compatible, and covered by new and updated tests.

The logic change in classify is straightforward and the best-of-any semantics align precisely with the deployment architecture (one deployment per model). The per-deployment cooldown Map is a clean, minimal refactor. tsc is clean, all 21 tests pass, and the three new health scenarios and two new cooldown-isolation tests provide solid coverage of the new behaviour. No security implications, no data-loss risk, and the fallback path in createFireworksRequestWithFallback already handles any individual deployment failure gracefully.

No files require special attention.

Important Files Changed

Filename Overview
web/src/server/free-session/fireworks-health.ts Rewrites classify to best-of-any semantics; adds empty-list guard that complements the one already in probe(); logic is clean and well-reasoned.
web/src/llm-api/fireworks.ts Replaces single global cooldown scalar with a per-deployment Map; !!deploymentModelId coerces to boolean for the composite boolean expression; all three cooldown functions updated consistently.
web/src/llm-api/tests/fireworks-deployment.test.ts All existing call sites updated to pass deploymentId; two new isolation tests cover cross-deployment independence and selective resetDeploymentCooldown.
web/src/server/free-session/tests/fireworks-health.test.ts Old worst-of test replaced with three focused scenarios; data setup is correct for each expected outcome.
.gitignore Adds .gstack/ directory to gitignore — routine developer tooling exclusion.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Waiting Room: getFireworksHealth] --> B[probe: fetch Prometheus metrics]
    B --> C{deploymentIds empty?}
    C -- yes --> D[return 'healthy']
    C -- no --> E[classify samples, deploymentIds]

    E --> F{for each deploymentId\nclassifyOne}
    F --> G{any 'healthy'?}
    G -- yes --> D
    G -- no --> H{any 'degraded'?}
    H -- yes --> I[return 'degraded'\ndo NOT admit]
    H -- no --> J[return 'unhealthy'\ndo NOT admit]

    subgraph classifyOne
        K[KV blocks >= 0.98?] -- yes --> L[unhealthy]
        K -- no --> M[5xx rate >= 10%?]
        M -- yes --> L
        M -- no --> N[prefill p90 > 1000ms?]
        N -- yes --> O[degraded]
        N -- no --> P[KV blocks >= 0.80?]
        P -- yes --> O
        P -- no --> Q[healthy]
    end

    subgraph createFireworksRequestWithFallback
        R[isDeploymentCoolingDown deploymentId] -- cooling down --> S[standard Fireworks API]
        R -- not cooling down --> T[custom deployment request]
        T -- 503 DEPLOYMENT_SCALING_UP --> U[markDeploymentScalingUp deploymentId\ncooldown per-deployment Map]
        U --> S
        T -- other 5xx --> S
        T -- success --> V[return response]
    end
Loading

Reviews (1): Last reviewed commit: "Support multiple Fireworks deployments i..." | Re-trigger Greptile

@jahooma jahooma closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant