Conversation
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
📝 WalkthroughWalkthroughintroduces a session supervisor system for managing multi-account quota rotation and resumable codex sessions. supervisor runs before real codex execution, probes account quota states, selects eligible accounts, monitors session binding metadata from jsonl files, and triggers child restarts when quota exhaustion approaches and session idle time permits. integrates with storage locking, abort-aware control flow, and optional prewarming. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as CLI Entry
participant Supervisor as Supervisor<br/>(codex-supervisor.js)
participant AccountMgr as Account Manager<br/>(config/accounts)
participant QuotaProbe as Quota Probe<br/>(quota-probe)
participant Storage as Storage<br/>(persistent)
participant Child as Child Codex<br/>Process
participant SessionLog as Session Log<br/>(JSONL)
CLI->>Supervisor: runCodexSupervisorIfEnabled(args)
Note over Supervisor: Parse args, check if interactive
Supervisor->>Storage: Acquire supervisor lock<br/>(with TTL heartbeat)
Storage-->>Supervisor: Lock acquired
Supervisor->>AccountMgr: Get eligible accounts
AccountMgr-->>Supervisor: Account list
Supervisor->>QuotaProbe: Probe quota for account<br/>(with TTL cache)
QuotaProbe-->>Supervisor: Quota snapshot
alt Quota OK
Supervisor->>Storage: Store selected account<br/>state
else Quota Exhausted
Supervisor->>Storage: Mark account cooling down
Supervisor->>AccountMgr: Rotate to next account
end
Supervisor->>Child: Spawn child codex process
Note over Supervisor,Child: Interactive monitoring loop
loop Until child exits
Supervisor->>SessionLog: Poll session binding<br/>metadata (JSONL)
SessionLog-->>Supervisor: session_meta line
Supervisor->>QuotaProbe: Probe current account<br/>quota (with de-dup)
QuotaProbe-->>Supervisor: Quota pressure
alt Quota near exhaustion<br/>AND session idle
Supervisor->>Child: Send signal / restart
Child->>Child: Exit
Supervisor->>Child: Spawn replacement codex
else Continue
Supervisor->>Storage: Heartbeat lock TTL
end
end
Child-->>CLI: Exit code
Supervisor-->>CLI: Forwarded exit code
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes substantial new feature spanning 2448 lines in supervisor core logic with dense, multi-concern interactions: quota polling with in-memory ttl caching and in-flight de-duplication, persistent supervisor lock acquisition/heartbeat/stale-detection, abortable async control flow, session binding metadata polling with rollout-path caching, child process restart decisions, and platform-specific signal escalation. integration touch in 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/codex-routing.js`:
- Around line 37-69: The function findPrimaryCodexCommand currently returns the
lowercased token (normalizedArg); change its return shape to preserve the
original token by returning the raw token (use args[index] trimmed but not
lowercased) as command and add a separate normalized (or commandNormalized)
field with the lowercased value for comparisons; apply the same change to the
other occurrence around the split helper (splitCodexCommandArgs / the function
at lines ~109-123) so leadingArgs/trailingArgs keep original casing while
consumers get a normalized field for case-insensitive checks.
In `@scripts/codex-supervisor.js`:
- Around line 1921-1922: The wrapper currently calls onLaunch immediately (see
onLaunch = () => {}) before any resumable session is confirmed; change the
behavior so onLaunch is invoked only after a successful binding.sessionId is
observed (i.e., after findBinding() or waitForBinding() returns a binding with a
sessionId), and ensure spawnChild = spawnRealCodex remains the child spawner;
update the wrapper logic that currently marks “launched” at startup (lines
referencing the wrapper launch flag) to instead wait for the first successful
binding.sessionId capture before firing onLaunch, and add a regression test that
uses the real supervisor path (replacing the current stubs in the bin wrapper
test) to confirm the wrapper won't take post-session auto-sync if the child dies
immediately.
- Around line 153-155: getMaxSessionRestarts currently treats the env var
CODEX_AUTH_CLI_SESSION_MAX_RESTARTS as "max launches" because the first spawn
consumes one unit; change the logic so the env value represents allowed restarts
only: keep getMaxSessionRestarts as the parsed value but update the
spawning/resume logic that uses it (the code path around the initial
spawn/resume handling) to not decrement or consume a restart count on the very
first launch and only decrement when attempting to resume/restart a session;
ensure checks permit an initial launch when getMaxSessionRestarts() === 0 and
only block further resume attempts when the remaining restarts count is
exhausted.
In `@test/codex-supervisor.test.ts`:
- Around line 49-66: The test defines a duplicate directory-removal backoff
helper removeDirectoryWithRetry with its own retry codes/timing; instead,
extract and reuse the centralized windows cleanup retry helper used by the other
suite: move the shared logic into a test utility (e.g., export a function like
removeDirectoryWithRetry or reuse the existing helper function signature),
update this test to import and call that shared helper instead of its local
removeDirectoryWithRetry, and ensure the retryableCodes and backoff timing are
taken from the single shared implementation so both suites use identical
semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 807682f1-86cc-4607-936c-c74c54c45cb9
📒 Files selected for processing (6)
package.jsonscripts/codex-routing.jsscripts/codex-supervisor.jsscripts/codex.jstest/codex-bin-wrapper.test.tstest/codex-supervisor.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
test/**
⚙️ CodeRabbit configuration file
tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.
Files:
test/codex-bin-wrapper.test.tstest/codex-supervisor.test.ts
All review threads are resolved and later commits addressed this stale automated change request.
Summary
Validation
Notes
note: greptile review for oc-chatgpt-multi-auth. cite files like
lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.Greptile Summary
this PR completes the session supervisor feature by landing the full supervisor runtime (
scripts/codex-supervisor.js), wiring it into the existing codex wrapper (scripts/codex.js), and adding three new routing helpers (findPrimaryCodexCommand,hasTopLevelHelpOrVersionFlag,splitCodexCommandArgs) tocodex-routing.js. it also addsAbortSignalpropagation tolib/quota-probe.tsand backs everything with ~5k lines of focused test coverage.all six concerns from the previous review passes have been resolved:
writeSupervisorLockPayloadnow loops up toDEFAULT_WRITE_RETRY_ATTEMPTS(4) with exponential back-off, gated behindisWindowsRetryableWriteErrorbuildCodexChildEnvnormalises each key withkey.toUpperCase()before prefix matching, closing the lowercase-leak pathpendingRejecternow rejects withQuotaProbeUnavailableError { skipAccountCooldown: true }instead of a rawAbortError, so a concurrent waiter (caller B) no longer inherits a cool-down meant only for caller A's abortbeforeEachresets caches,afterEachfully restoresprocess.platformand all tracked env keysnormalizeAuthAliasandshouldHandleMultiAuthAuthtests are restored in dedicateddescribeblocks alongside the new helper testsone minor item remains: lines 1208-1211 of
codex-supervisor.js(theerror?.name === "AbortError"guard in the pending-entry catch) are unreachable after thependingRejecternormalisation fix. see inline comment.Confidence Score: 4/5
Important Files Changed
Sequence Diagram
sequenceDiagram participant CLI as codex.js main() participant Sup as runCodexSupervisorIfEnabled participant RT as loadSupervisorRuntime participant Gate as isSupervisorAccountGateBypassCommand participant Acct as ensureLaunchableAccount participant Probe as probeAccountSnapshot (cache) participant Sess as runInteractiveSupervision participant Child as spawnRealCodex (child) CLI->>Sup: { codexBin, rawArgs, forwardToRealCodex, onLaunch, syncBeforeLaunch } Sup->>RT: importIfPresent dist/lib/{config,accounts,quota-probe,storage}.js RT-->>Sup: runtime or null alt runtime is null Sup-->>CLI: null (supervisor disabled) CLI->>CLI: forwardToRealCodexWithStartupSync() else runtime loaded Sup->>Gate: isSupervisorAccountGateBypassCommand(rawArgs) alt bypass (auth/help/version/--help) Gate-->>Sup: true Sup->>CLI: forwardToRealCodex (with sync) else normal command Sup->>Acct: ensureLaunchableAccount(runtime, signal) Acct->>Probe: probeAccountSnapshot (batched, cached, abort-aware) Probe-->>Acct: snapshot / QuotaProbeUnavailableError{skipAccountCooldown} Acct-->>Sup: { ok, account, manager } or { aborted: true } alt aborted Sup-->>CLI: 130 else no healthy account Sup-->>CLI: 1 else non-interactive command Sup->>CLI: forwardToRealCodex (with sync) else interactive (resume/fork/bare) Sup->>Sess: runInteractiveSupervision(...) loop each launch / restart Sess->>Sess: syncBeforeLaunch() Sess->>Child: spawnRealCodex Child-->>Sess: exitCode Sess->>Probe: monitor quota pressure alt preemptive rotation triggered Sess->>Acct: prepareResumeSelection / ensureLaunchableAccount Sess->>Child: SIGTERM then relaunch with resume args end end Sess-->>Sup: exitCode Sup-->>CLI: exitCode alt supervisorDidLaunchSession && !syncedYet CLI->>CLI: autoSyncManagerActiveSelectionIfEnabled() end end end endPrompt To Fix All With AI
Last reviewed commit: "Tighten supervisor t..."