Skip to content

feat: add Codex session supervisor runtime#213

Open
ndycode wants to merge 7 commits intodevfrom
git-clean/pr147-supervisor-runtime
Open

feat: add Codex session supervisor runtime#213
ndycode wants to merge 7 commits intodevfrom
git-clean/pr147-supervisor-runtime

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 21, 2026

Summary

  • move the supervisor wrapper and monitor runtime out of feat: add experimental Codex session supervisor #207 into a stacked follow-up PR
  • wire the codex wrapper through the new routing helpers and supervisor entrypoint
  • add focused wrapper and supervisor smoke coverage for the runtime behavior

Validation

  • npm run test:session-supervisor:smoke
  • npm run typecheck
  • npm run build

Notes

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this PR completes the session supervisor feature by landing the full supervisor runtime (scripts/codex-supervisor.js), wiring it into the existing codex wrapper (scripts/codex.js), and adding three new routing helpers (findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, splitCodexCommandArgs) to codex-routing.js. it also adds AbortSignal propagation to lib/quota-probe.ts and backs everything with ~5k lines of focused test coverage.

all six concerns from the previous review passes have been resolved:

  • heartbeat write retrywriteSupervisorLockPayload now loops up to DEFAULT_WRITE_RETRY_ATTEMPTS (4) with exponential back-off, gated behind isWindowsRetryableWriteError
  • windows env-key strippingbuildCodexChildEnv normalises each key with key.toUpperCase() before prefix matching, closing the lowercase-leak path
  • abort cascade / spurious account cool-downpendingRejecter now rejects with QuotaProbeUnavailableError { skipAccountCooldown: true } instead of a raw AbortError, so a concurrent waiter (caller B) no longer inherits a cool-down meant only for caller A's abort
  • module-level singleton teardownbeforeEach resets caches, afterEach fully restores process.platform and all tracked env keys
  • legacy routing unit coveragenormalizeAuthAlias and shouldHandleMultiAuthAuth tests are restored in dedicated describe blocks alongside the new helper tests

one minor item remains: lines 1208-1211 of codex-supervisor.js (the error?.name === "AbortError" guard in the pending-entry catch) are unreachable after the pendingRejecter normalisation fix. see inline comment.

Confidence Score: 4/5

  • safe to merge; all prior blockers are resolved and the one remaining item is a dead-code style nit
  • all six concrete concerns from the previous review passes are addressed. windows filesystem safety (write retry + case-insensitive env stripping), abort-cascade propagation, and test teardown are all correct. the only remaining item is an unreachable branch (lines 1208-1211) that can be cleaned up in a follow-up — it poses no runtime risk.
  • scripts/codex-supervisor.js lines 1208-1211 (dead branch, nit only)

Important Files Changed

Filename Overview
scripts/codex-supervisor.js new 2480-line supervisor runtime; prior concerns (heartbeat write retry, case-sensitive env stripping, abort-cascade cool-down, beforeEach/afterEach teardown) are all addressed in this revision; one minor dead-code path remains at lines 1208-1211
test/codex-supervisor.test.ts 2839-line test suite; beforeEach resets caches, afterEach fully restores process.platform and all env keys; comprehensive coverage of lock lifecycle, abort propagation, account selection, and Windows retry paths
scripts/codex.js wires supervisor entrypoint; startup auto-sync deferred into callbacks; sentinel exit-130 path correctly skips sync; auto-sync-after-interactive guard is tested end-to-end
scripts/codex-routing.js adds findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, splitCodexCommandArgs; preserves original casing while exposing a lowercase-normalizedCommand; flag-value skipping and -- terminator handled correctly
lib/quota-probe.ts adds AbortSignal support; signal is checked before each model loop iteration and wired into the fetch AbortController; removeEventListener cleanup on finally is correct; new quota-probe tests cover pre-abort, mid-fetch abort, and post-instructions abort paths
test/codex-bin-wrapper.test.ts extracts removeDirectoryWithRetry to shared fs-test-utils; adds integration tests for supervisor-forwarded run, auto-sync once, abort-sentinel skip, and interactive post-launch sync; double-inject --config guards also added

Sequence Diagram

sequenceDiagram
    participant CLI as codex.js main()
    participant Sup as runCodexSupervisorIfEnabled
    participant RT as loadSupervisorRuntime
    participant Gate as isSupervisorAccountGateBypassCommand
    participant Acct as ensureLaunchableAccount
    participant Probe as probeAccountSnapshot (cache)
    participant Sess as runInteractiveSupervision
    participant Child as spawnRealCodex (child)

    CLI->>Sup: { codexBin, rawArgs, forwardToRealCodex, onLaunch, syncBeforeLaunch }
    Sup->>RT: importIfPresent dist/lib/{config,accounts,quota-probe,storage}.js
    RT-->>Sup: runtime or null
    alt runtime is null
        Sup-->>CLI: null (supervisor disabled)
        CLI->>CLI: forwardToRealCodexWithStartupSync()
    else runtime loaded
        Sup->>Gate: isSupervisorAccountGateBypassCommand(rawArgs)
        alt bypass (auth/help/version/--help)
            Gate-->>Sup: true
            Sup->>CLI: forwardToRealCodex (with sync)
        else normal command
            Sup->>Acct: ensureLaunchableAccount(runtime, signal)
            Acct->>Probe: probeAccountSnapshot (batched, cached, abort-aware)
            Probe-->>Acct: snapshot / QuotaProbeUnavailableError{skipAccountCooldown}
            Acct-->>Sup: { ok, account, manager } or { aborted: true }
            alt aborted
                Sup-->>CLI: 130
            else no healthy account
                Sup-->>CLI: 1
            else non-interactive command
                Sup->>CLI: forwardToRealCodex (with sync)
            else interactive (resume/fork/bare)
                Sup->>Sess: runInteractiveSupervision(...)
                loop each launch / restart
                    Sess->>Sess: syncBeforeLaunch()
                    Sess->>Child: spawnRealCodex
                    Child-->>Sess: exitCode
                    Sess->>Probe: monitor quota pressure
                    alt preemptive rotation triggered
                        Sess->>Acct: prepareResumeSelection / ensureLaunchableAccount
                        Sess->>Child: SIGTERM then relaunch with resume args
                    end
                end
                Sess-->>Sup: exitCode
                Sup-->>CLI: exitCode
                alt supervisorDidLaunchSession && !syncedYet
                    CLI->>CLI: autoSyncManagerActiveSelectionIfEnabled()
                end
            end
        end
    end
Loading

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 1204-1214

Comment:
**Dead branch after abort-cascade fix**

lines 1208-1211 check `error?.name === "AbortError"` on the value rejected by `pendingEntry.pending`. since `pendingRejecter` now always rejects with a `QuotaProbeUnavailableError` (name `"QuotaProbeUnavailableError"`, not `"AbortError"`), this branch can no longer be reached via the pending-entry path. the only way an `AbortError` reaches here today is if `signalB` itself fires, which is already caught at line 1205 before this check. harmless, but worth removing to avoid confusing future readers:

```suggestion
		} catch (error) {
			if (signal?.aborted) {
				throw error;
			}
			throw error;
		}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "Tighten supervisor t..."

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 21, 2026

📝 Walkthrough

Walkthrough

introduces a session supervisor system for managing multi-account quota rotation and resumable codex sessions. supervisor runs before real codex execution, probes account quota states, selects eligible accounts, monitors session binding metadata from jsonl files, and triggers child restarts when quota exhaustion approaches and session idle time permits. integrates with storage locking, abort-aware control flow, and optional prewarming.

Changes

Cohort / File(s) Summary
Argument Parsing Utilities
scripts/codex-routing.js
Added findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, and splitCodexCommandArgs functions to parse and normalize command-line arguments, including handling of option flags with values, -- termination, and configuration flag detection.
Session Supervisor Implementation
scripts/codex-supervisor.js
New comprehensive module implementing quota-aware session supervision: dynamically loads runtime config/accounts/quota-probe/storage modules, probes account quota with TTL caching and de-duplication, maintains supervisor lock with heartbeat TTL, polls session binding metadata from jsonl files, spawns child codex process, monitors quota pressure in loop, decides on child restart based on quota exhaustion and session idle time, supports prewarming and selection preparation, handles abort signals, and exposes test-only internal API.
CLI Integration
scripts/codex.js
Integrated supervisor into main codex entry point: replaced unconditional autoSyncManagerActiveSelectionIfEnabled() with conditional runCodexSupervisorIfEnabled(...), wrapped real codex forwarding in forwardToRealCodexWithStartupSync to handle startup sync, added state flags to track supervisor delegation (supervisorDidForward, supervisorDidLaunchSession, supervisorDidStartupSync), and conditionally re-run auto-sync for interactive supervisor commands.
Test Infrastructure & Coverage
test/codex-supervisor.test.ts
Comprehensive test suite (2805 lines) covering supervisor internals via test-only API: session metadata parsing, binding lookup with caching, quota probe caching/de-duplication, concurrent caller sharing, restart control flow with platform-specific SIG escalation, supervisor storage locking with TTL heartbeat and stale detection, critical-section serialization, interactive monitoring loop, cooldown pacing, and account gating bypass for auth/help/version.
Integration Test Updates
test/codex-bin-wrapper.test.ts
Extended wrapper fixture to copy codex-supervisor.js and generate runtime stubs (config.js, accounts.js, quota-probe.js, storage.js, codex-manager.js). Added tests validating supervisor non-interactive forwarding, auto-sync cardinality (exactly once per supervisor command), startup sync skip on abort sentinel 130, post-launch auto-sync for interactive runs, double-sync avoidance, and config flag injection edge cases.
Test Script Addition
package.json
Added npm script test:session-supervisor:smoke running vitest against supervisor and related test files.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI Entry
    participant Supervisor as Supervisor<br/>(codex-supervisor.js)
    participant AccountMgr as Account Manager<br/>(config/accounts)
    participant QuotaProbe as Quota Probe<br/>(quota-probe)
    participant Storage as Storage<br/>(persistent)
    participant Child as Child Codex<br/>Process
    participant SessionLog as Session Log<br/>(JSONL)

    CLI->>Supervisor: runCodexSupervisorIfEnabled(args)
    Note over Supervisor: Parse args, check if interactive
    
    Supervisor->>Storage: Acquire supervisor lock<br/>(with TTL heartbeat)
    Storage-->>Supervisor: Lock acquired
    
    Supervisor->>AccountMgr: Get eligible accounts
    AccountMgr-->>Supervisor: Account list
    
    Supervisor->>QuotaProbe: Probe quota for account<br/>(with TTL cache)
    QuotaProbe-->>Supervisor: Quota snapshot
    
    alt Quota OK
        Supervisor->>Storage: Store selected account<br/>state
    else Quota Exhausted
        Supervisor->>Storage: Mark account cooling down
        Supervisor->>AccountMgr: Rotate to next account
    end
    
    Supervisor->>Child: Spawn child codex process
    
    Note over Supervisor,Child: Interactive monitoring loop
    loop Until child exits
        Supervisor->>SessionLog: Poll session binding<br/>metadata (JSONL)
        SessionLog-->>Supervisor: session_meta line
        
        Supervisor->>QuotaProbe: Probe current account<br/>quota (with de-dup)
        QuotaProbe-->>Supervisor: Quota pressure
        
        alt Quota near exhaustion<br/>AND session idle
            Supervisor->>Child: Send signal / restart
            Child->>Child: Exit
            Supervisor->>Child: Spawn replacement codex
        else Continue
            Supervisor->>Storage: Heartbeat lock TTL
        end
    end
    
    Child-->>CLI: Exit code
    Supervisor-->>CLI: Forwarded exit code
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

substantial new feature spanning 2448 lines in supervisor core logic with dense, multi-concern interactions: quota polling with in-memory ttl caching and in-flight de-duplication, persistent supervisor lock acquisition/heartbeat/stale-detection, abortable async control flow, session binding metadata polling with rollout-path caching, child process restart decisions, and platform-specific signal escalation. integration touch in scripts/codex.js:46 restructures startup sync flow with three state flags. test coverage is comprehensive (2805 lines) but flag missing explicit regression tests for: (1) windows SIGKILL escalation timeout edge cases when child hangs during lock loss (scripts/codex-supervisor.js:~1900), (2) concurrent supervisor instances racing on lock acquisition—heartbeat renewal during long critical sections could lose lease mid-restart, (3) quota probe abortion semantics when caller aborts shared in-flight probe, (4) jsonl session binding corruption or truncated lines causing unbounded scan, and (5) stale lock cleanup on windows transient EPERM/EBUSY retry logic (test/codex-supervisor.test.ts:1200+) only covers synchronous happy path. concurrency risks: probe cache eviction under load, lock heartbeat failure modes during quota-driven restart cascades, and session log reader assumptions about atomic writes.

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description is complete with summary, validation steps, and notes. However, it omits required docs/governance checklist items (README, docs/, SECURITY.md, CONTRIBUTING.md) and risk/rollback assessment. Add docs and governance checklist section and explicit risk level + rollback plan to match the repository template.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed Title follows conventional commits format with 'feat' type, lowercase summary, and is well under 72-char limit at 42 chars.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch git-clean/pr147-supervisor-runtime
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch git-clean/pr147-supervisor-runtime

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ndycode ndycode added the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 21, 2026
coderabbitai[bot]
coderabbitai bot previously requested changes Mar 21, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codex-routing.js`:
- Around line 37-69: The function findPrimaryCodexCommand currently returns the
lowercased token (normalizedArg); change its return shape to preserve the
original token by returning the raw token (use args[index] trimmed but not
lowercased) as command and add a separate normalized (or commandNormalized)
field with the lowercased value for comparisons; apply the same change to the
other occurrence around the split helper (splitCodexCommandArgs / the function
at lines ~109-123) so leadingArgs/trailingArgs keep original casing while
consumers get a normalized field for case-insensitive checks.

In `@scripts/codex-supervisor.js`:
- Around line 1921-1922: The wrapper currently calls onLaunch immediately (see
onLaunch = () => {}) before any resumable session is confirmed; change the
behavior so onLaunch is invoked only after a successful binding.sessionId is
observed (i.e., after findBinding() or waitForBinding() returns a binding with a
sessionId), and ensure spawnChild = spawnRealCodex remains the child spawner;
update the wrapper logic that currently marks “launched” at startup (lines
referencing the wrapper launch flag) to instead wait for the first successful
binding.sessionId capture before firing onLaunch, and add a regression test that
uses the real supervisor path (replacing the current stubs in the bin wrapper
test) to confirm the wrapper won't take post-session auto-sync if the child dies
immediately.
- Around line 153-155: getMaxSessionRestarts currently treats the env var
CODEX_AUTH_CLI_SESSION_MAX_RESTARTS as "max launches" because the first spawn
consumes one unit; change the logic so the env value represents allowed restarts
only: keep getMaxSessionRestarts as the parsed value but update the
spawning/resume logic that uses it (the code path around the initial
spawn/resume handling) to not decrement or consume a restart count on the very
first launch and only decrement when attempting to resume/restart a session;
ensure checks permit an initial launch when getMaxSessionRestarts() === 0 and
only block further resume attempts when the remaining restarts count is
exhausted.

In `@test/codex-supervisor.test.ts`:
- Around line 49-66: The test defines a duplicate directory-removal backoff
helper removeDirectoryWithRetry with its own retry codes/timing; instead,
extract and reuse the centralized windows cleanup retry helper used by the other
suite: move the shared logic into a test utility (e.g., export a function like
removeDirectoryWithRetry or reuse the existing helper function signature),
update this test to import and call that shared helper instead of its local
removeDirectoryWithRetry, and ensure the retryableCodes and backoff timing are
taken from the single shared implementation so both suites use identical
semantics.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 807682f1-86cc-4607-936c-c74c54c45cb9

📥 Commits

Reviewing files that changed from the base of the PR and between dd077ff and f14e7db.

📒 Files selected for processing (6)
  • package.json
  • scripts/codex-routing.js
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts

@ndycode ndycode changed the base branch from git-clean/pr147-supervisor-plumbing to dev March 21, 2026 19:57
@ndycode ndycode removed the needs-human-review Flagged by automated PR quality screening; maintainer review required. label Mar 22, 2026
@ndycode ndycode dismissed coderabbitai[bot]’s stale review March 22, 2026 15:50

All review threads are resolved and later commits addressed this stale automated change request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant