fix(code-reviews): parallelize dispatch and fix stuck queued reviews by alex-alecu · Pull Request #1177 · Kilo-Org/cloud

alex-alecu · 2026-03-17T16:06:56Z

Summary

Fixes a crash-safety bug where reviews can get permanently stuck in queued status. Previously, the DB status was updated to queued before dispatching to the CF worker — if the process crashed between those two operations, the review would never be dispatched but could never be retried (it was no longer pending). The fix reorders these operations so the worker dispatch happens first; on failure the review stays pending and is retried on the next dispatch cycle.
Replaces sequential review dispatch with Promise.allSettled for parallel execution. Each review's dispatch is independent (own config, payload prep, worker call), so there's no ordering dependency. This reduces dispatch time for 20 reviews from ~60-100s to ~3-5s, eliminating the dashboard experience of "1 running, many queued for minutes."

Verification

pnpm typecheck — root project passes cleanly (pre-existing errors in cloud-agent-next/wrapper unrelated to changes)
pnpm run test — all 174 test suites pass, 2746 tests pass
No existing test file for dispatch-pending-reviews.ts; changes are mechanical (reorder two lines + replace for-loop with Promise.allSettled)

Visual Changes

N/A

Reviewer Notes

prepareReviewPayload() is the heaviest call per dispatch (~3-5s, multiple DB + GitHub/GitLab API calls). Parallelizing means all reviews hit these APIs concurrently — should be fine since each review targets different resources.
PostHog feature flag checks (isFeatureFlagEnabled) will also run concurrently across all reviews. No contention expected.
The CF worker client has a 10s timeout per dispatch. With 20 parallel dispatches, all 20 will be in-flight at once. Each creates its own DO (keyed by reviewId) so there's no contention on the worker side.
The DO's runReview() has a guard (status !== 'queued') that prevents double execution, so even in the unlikely event of a double-dispatch, the review won't run twice.

…eview Reorder so the CF worker dispatch happens before the DB status update. If dispatch fails, the review stays 'pending' and can be retried on the next dispatch cycle, eliminating the stuck-in-queued state that occurs when the process crashes between the DB write and the dispatch call.

Replace the sequential for-loop with Promise.allSettled so all pending reviews are dispatched concurrently. Each review has independent config, payload preparation, and worker dispatch — no ordering dependency. This reduces dispatch time from ~60-100s to ~3-5s for 20 reviews.

src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts

kilo-code-bot · 2026-03-17T16:10:19Z

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity	Count
CRITICAL	0
WARNING	1
SUGGESTION	0

Issue Details (click to expand)

CRITICAL

File	Line	Issue

WARNING

File	Line	Issue
`cloudflare-code-review-infra/src/code-review-orchestrator.ts`	378	Unconditionally reloading Durable Object storage on the duplicate-start path can overwrite live in-memory state and drop buffered events or usage data that have not been persisted yet.

SUGGESTION

File	Line	Issue

Fix these issues in Kilo Cloud

Other Observations (not in diff)

Issues found in unchanged code that cannot receive inline comments:

None.

Files Reviewed (1 files)

cloudflare-code-review-infra/src/code-review-orchestrator.ts - 1 issue

_{Reviewed by gpt-5.4-20260305 · 304,699 tokens}

…t races Add a conditional UPDATE ... WHERE status = 'pending' with RETURNING to atomically claim each review before dispatching. If another concurrent dispatcher already claimed the review, skip it. This prevents the double-dispatch race that the order swap alone introduced.

src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts

Move the atomic claim to just before the worker call (after all prep work) to minimise the window where a crash leaves a review stuck in 'queued'. On dispatch failure, revert the claim back to 'pending'. As a safety net, the pending-reviews query now also picks up stale queued reviews (older than 5 minutes) that were abandoned by a crashed process, so they become eligible for re-dispatch.

src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts

…, scope stale claim predicate Worker dispatch failures (timeout, 5xx) now revert the claim to 'pending' and return false instead of re-throwing — the outer handler only marks truly non-retriable errors (missing config, payload prep failure) as 'failed'. The atomic claim WHERE for stale-queued rows now includes the same staleness predicate (updated_at < cutoff) as the selector query, ensuring only one dispatcher can revive an abandoned row.

src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts

Exclude stale queued rows from active slot accounting so abandoned claims can be reclaimed even when an owner appears to be at capacity. Keep ambiguous worker dispatch failures queued for stale recovery instead of reverting to pending, and persist agentVersion without rewriting status or failing a live review if the follow-up DB write errors.

kilo-code-bot · 2026-03-18T12:58:16Z

src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts

+            eq(cloud_agent_code_reviews.status, 'pending'),
+            and(
+              eq(cloud_agent_code_reviews.status, 'queued'),
+              lt(cloud_agent_code_reviews.updated_at, staleCutoff)


WARNING: Stale queued rows are not guaranteed to be abandoned

CodeReviewOrchestrator.updateStatus('running') explicitly continues when the DB callback fails (cloudflare-code-review-infra/src/code-review-orchestrator.ts:261), so a live review can sit in Postgres as stale queued even though its Durable Object is already running. Re-dispatching it here sends a second POST /review for the same reviewId; start() then resets that existing DO back to queued (cloudflare-code-review-infra/src/index.ts:84, cloudflare-code-review-infra/src/code-review-orchestrator.ts:365), which can launch the review twice.

Fixed in c52af03.

…m:Kilo-Org/cloud into fix/parallelize-dispatch-fix-stuck-queued

kilo-code-bot · 2026-03-18T13:59:06Z

cloudflare-code-review-infra/src/code-review-orchestrator.ts

    agentVersion?: string;
    previousCloudAgentSessionId?: string;
  }): Promise<{ status: CodeReviewStatus }> {
+    await this.loadState();


WARNING: loadState() can clobber a live orchestrator state

start() is now the duplicate-dispatch path, so it can run while this Durable Object is already streaming a review. loadState() always replaces this.state with the last persisted snapshot before the early return, which drops any in-memory changes that have not been flushed yet (for example buffered events and token usage accumulated in processEventStream()). Guard the reload with if (!this.state) so duplicate starts stay idempotent without rewinding a live review.

alex-alecu added 2 commits March 17, 2026 17:37

kilo-code-bot bot reviewed Mar 17, 2026

View reviewed changes

src/lib/code-reviews/dispatch/dispatch-pending-reviews.ts Outdated Show resolved Hide resolved