Skip to content

add session resume supervisor for faster codex rotation#139

Closed
ndycode wants to merge 25 commits intomainfrom
git-plan/05-fast-supervised-resume
Closed

add session resume supervisor for faster codex rotation#139
ndycode wants to merge 25 commits intomainfrom
git-plan/05-fast-supervised-resume

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 20, 2026

Summary

  • add an experimental session-resume supervisor for interactive Codex runs so account rotation can relaunch with codex resume instead of forcing a cold restart
  • keep the feature off by default behind codexCliSessionSupervisor
  • harden the supervisor and storage paths with the follow-up fixes raised during review

What Changed

  • added scripts/codex-supervisor.js and wired it through scripts/codex.js for interactive-session supervision
  • added supervisor command-routing helpers in scripts/codex-routing.js
  • added the codexCliSessionSupervisor config/schema/settings toggle
  • updated quota probing and preemptive quota scheduling to support supervisor prewarm behavior
  • tightened storage transaction handling in lib/storage.ts, including path-safety checks and backup snapshot selection fixes
  • added supervisor, wrapper, and storage regressions for the reviewed edge cases

Validation

  • npm run lint
  • npm run typecheck
  • npm test
  • npm test -- test/documentation.test.ts
  • npm run build
  • npm test -- --run test/codex-supervisor.test.ts test/codex-bin-wrapper.test.ts test/storage.test.ts

Docs and Governance Checklist

  • README updated (if user-visible behavior changed)
  • docs/getting-started.md updated (if onboarding flow changed)
  • docs/features.md updated (if capability surface changed)
  • relevant docs/reference/* pages updated (if commands/settings/paths changed)
  • docs/upgrade.md updated (if migration behavior changed)
  • SECURITY.md and CONTRIBUTING.md reviewed for alignment

Risk and Rollback

Additional Notes

  • this branch includes the review follow-up commit 71f5b3f and all live review threads were resolved before rerunning CodeRabbit
  • the supervisor remains experimental and disabled by default

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this PR lands the experimental session-resume supervisor — a new codex.js wrapper that monitors a running codex process, detects quota exhaustion via periodic probe, pre-warms a successor account, and relaunches with codex resume <sessionId> so the user's session survives account rotation. the feature is gated behind codexCliSessionSupervisor: false by default and exposed through the experimental settings menu. alongside the supervisor, the PR hardens lib/storage.ts transaction handling (storage path pinned at transaction open, state.active=false in finally, hard error on path mismatch during export) and fixes the latestValidSnapshot stable sort.

key changes

  • scripts/codex-supervisor.js (2205 lines): full supervisor with file-lock heartbeat, probe-snapshot cache, session JSONL binding, pre-warm selection, restart loop (max 16 restarts), Windows-safe safeUnlink with EPERM/EBUSY retry
  • scripts/codex-routing.js: clean arg-parser helpers (findPrimaryCodexCommand, splitCodexCommandArgs, hasTopLevelHelpOrVersionFlag) used by the supervisor to route commands
  • scripts/codex.js: supervisor wired into main path; autoSyncManagerActiveSelectionIfEnabled correctly scoped to avoid double-sync on supervised exits
  • lib/storage.ts: storagePath snapshotted at transaction start and threaded through to all save/load calls; exportAccounts now throws on path mismatch instead of silently loading from a different path; latestValidSnapshot stable sort by original index breaks mtimeMs ties deterministically
  • lib/quota-probe.ts: AbortSignal support added to fetchCodexQuotaSnapshot — per-model abort check, proper listener cleanup in finally
  • lib/config.ts / lib/schemas.ts / lib/preemptive-quota-scheduler.ts: codexCliSessionSupervisor toggle plumbed through config, schema, and defaults; DEFAULT_PREEMPTIVE_QUOTA_REMAINING_PERCENT_5H exported for shared use

issues found

  • getSessionBindingEntryPasses second pass still reads every JSONL file when sessionId is known but the rollout path is not yet cached — Windows antivirus risk during initial session capture (mitigated by the listing cache, but second-pass dedup would eliminate redundant file opens)
  • module-level snapshotProbeCache can hold TTL-valid entries from a rotated-away account if the cache window overlaps the pre-warm/rotation boundary; cleared correctly in markAccountUnavailable but not flushed between outer restart iterations
  • when all probe candidates return QuotaProbeUnavailableError (full network outage), ensureLaunchableAccount spins through all 32 attempts with no feedback and no early exit — up to ~80s of silent blocking; no vitest coverage for this path
  • npm test and npm run build are unchecked in the PR checklist; the latestValidSnapshot tiebreaker and the all-unavailable probe path lack targeted test coverage

Confidence Score: 3/5

  • safe to merge with the feature flag off; the supervisor path has remaining edge cases that need test coverage and small hardening before enabling in production
  • the storage.ts fixes and quota-probe abort support are clean and well-tested. the supervisor itself addresses every concern raised in the previous review round (coolingDownUntil field, safeUnlink retry, probe catch dead-code, autoSync double-sync, path safety). however: npm test and npm run build are unchecked, the all-accounts-probe-unavailable spin loop has no coverage or early exit, the second-pass JSONL scan is still O(N) on first binding capture (windows antivirus risk), and the module-level probe cache is not flushed between restart iterations. none of these are regressions from main since the feature defaults to off, but they should be addressed before the flag is turned on for users.
  • scripts/codex-supervisor.js — getSessionBindingEntryPasses second-pass dedup, probe-unavailable spin loop, and cache flush between restarts; test/codex-supervisor.test.ts — missing coverage for all-unavailable and latestValidSnapshot tiebreaker paths

Important Files Changed

Filename Overview
scripts/codex-supervisor.js 2205-line new file implementing the session-resume supervisor. core logic is well-structured with proper abort signal propagation, safeUnlink retry, and coolingDownUntil field name fix. key concerns: module-level probe cache can hold stale entries across restarts; getSessionBindingEntryPasses second pass still reads all JSONL files when rollout path is unknown; probe-unavailable spin loop can block up to 80s under network outage; npm test was not run end-to-end.
lib/storage.ts tightened transaction handling: storagePath snapshotted at transaction start and passed through to saveAccountsUnlocked/loadAccountsInternal, state.active=false in finally, exportAccounts now throws on path mismatch instead of silent fallback. latestValidSnapshot stable-sort fix is correct but missing targeted vitest coverage for the tie-break case.
scripts/codex-routing.js adds findPrimaryCodexCommand, hasTopLevelHelpOrVersionFlag, and splitCodexCommandArgs — clean arg-parsing helpers with correct -- separator handling and COMMAND_FLAGS_WITH_VALUE skip logic. no issues found.
scripts/codex.js wires supervisor into the main path via runCodexSupervisorIfEnabled; autoSyncManagerActiveSelectionIfEnabled moved inside forwardToRealCodexWithStartupSync to avoid double-sync on supervised exits. logic for the supervisorDidForward guard is correct. minor: requestChildRestart is called without signal in the SIGINT abort handler (intentional but undocumented).
lib/config.ts adds getCodexCliSessionSupervisor accessor, codexCliSessionSupervisor default, and aligns preemptiveQuotaRemainingPercent5h fallback with DEFAULT_PREEMPTIVE_QUOTA_REMAINING_PERCENT_5H constant. clean, no issues.
lib/quota-probe.ts adds AbortSignal support to fetchCodexQuotaSnapshot: abort checked at loop start and after each fetch attempt, onAbort listener properly removed in finally, createAbortError helper consistent with supervisor. clean implementation.
lib/codex-manager/settings-hub.ts adds codexCliSessionSupervisor toggle to experimental settings menu; hotkey numbering shifted (1→Supervisor, 2→Sync, 3→Backup, 4→Guard) and UI_COPY updated consistently. no issues, but the shift means any users with muscle-memory for old key bindings will need to re-learn.
test/codex-supervisor.test.ts 1729-line new test file covering session binding, probe cache, account selection, and restart loop. FakeManager now correctly uses coolingDownUntil. afterEach clears module-level caches and temp dirs with Windows-safe retry. missing: all-accounts-probe-unavailable path and latestValidSnapshot tiebreaker case.
test/storage.test.ts adds 422 lines covering path-mismatch export throw, Windows-style path equivalence, transaction-handler throw/success state reset, and path-drift persist behaviour. coverage is thorough for the new exportAccounts and state.active=false changes.

Sequence Diagram

sequenceDiagram
    participant U as User (CLI)
    participant CJ as codex.js
    participant SUP as codex-supervisor.js
    participant RT as Runtime (dist/)
    participant AM as AccountManager
    participant LOCK as .supervisor.lock
    participant CH as Child Codex Process
    participant FS as ~/.codex/sessions/

    U->>CJ: codex [args]
    CJ->>SUP: runCodexSupervisorIfEnabled()
    SUP->>RT: loadSupervisorRuntime()
    RT-->>SUP: runtime (config, AccountManager, probe)
    SUP->>RT: getCodexCliSessionSupervisor(pluginConfig)
    alt supervisor disabled
        SUP-->>CJ: null
        CJ->>CH: forwardToRealCodex()
    else supervisor enabled
        SUP->>LOCK: withSupervisorStorageLock()
        LOCK-->>SUP: lease + heartbeat
        SUP->>AM: loadFromDisk()
        AM-->>SUP: manager + currentAccount
        LOCK-->>SUP: release lock
        SUP->>RT: ensureLaunchableAccount()
        RT->>AM: getProbeCandidateBatch()
        RT->>RT: probeAccountSnapshot() × batch
        RT->>LOCK: withLockedManager (mark unavailable / persist)
        RT-->>SUP: {ok, account, manager}
        SUP->>CH: spawnRealCodex(codexBin, args)
        loop monitor poll (300ms)
            SUP->>FS: waitForSessionBinding / listJsonlFiles
            FS-->>SUP: sessionId + rolloutPath
            SUP->>LOCK: withLockedManager → loadFromDisk
            LOCK-->>SUP: freshManager
            SUP->>RT: probeAccountSnapshot(currentAccount)
            RT-->>SUP: quotaSnapshot
            alt quota near exhaustion (prewarm)
                SUP->>SUP: maybeStartPreparedResumeSelection()
                SUP-->>SUP: preparedResumeSelectionPromise (background)
            end
            alt quota exhausted (rotate)
                SUP->>CH: requestChildRestart (SIGINT→SIGTERM→SIGKILL)
                CH-->>SUP: exit
            end
        end
        CH-->>SUP: exit(code)
        SUP->>LOCK: markCurrentAccountForRestart
        SUP->>SUP: commitPreparedSelection / ensureLaunchableAccount
        SUP->>CJ: buildResumeArgs(sessionId, args)
        Note over SUP,CH: restart loop (max 16 times)
        SUP-->>CJ: final exitCode
        CJ->>CJ: autoSyncManagerActiveSelectionIfEnabled()
    end
Loading

Comments Outside Diff (8)

  1. lib/preemptive-quota-scheduler.ts, line 27-29 (link)

    default 5h threshold silently doubled for all users

    DEFAULT_PREEMPTIVE_QUOTA_REMAINING_PERCENT_5H is raised from 5 to 10, and all three callsites that previously hard-coded 5 now pull this constant: PreemptiveQuotaScheduler constructor, DEFAULT_PLUGIN_CONFIG.preemptiveQuotaRemainingPercent5h, and the fallback in getPreemptiveQuotaRemainingPercent5h. this means any user who has never explicitly set preemptiveQuotaRemainingPercent5h will now start preemptive rotation at 10% remaining instead of 5% — a 2× change in aggressiveness that is not gated behind codexCliSessionSupervisor and will affect every existing deployment on upgrade.

    the docs/features.md checklist entry is unchecked; if this is intentional, it should be documented and the upgrade guide should call it out explicitly. if the higher threshold is only needed by the supervisor's prewarm logic, consider keeping the scheduler default at 5 and only using the 10% constant inside the supervisor's computeQuotaPressure call, so the two behaviors remain independently tunable.

  2. scripts/codex-supervisor.js, line 695-700 (link)

    fn() outlives the lock after heartbeat rejection

    try {
        return await Promise.race([fn(), heartbeat.promise]);
    } finally {
        await heartbeat.stop();
        await safeUnlinkOwnedSupervisorLock(lockPath, ownerId);
    }

    when heartbeat.promise rejects first (e.g. because Windows antivirus held the lock file long enough for the write in writeSupervisorLockPayload to fail, setting failedError), Promise.race rejects and the finally block deletes the lock. but fn() is still running in the background — there is no way to cancel it because it receives no signal.

    if fn() is markCurrentAccountForRestart or the commit step inside ensureLaunchableAccount (both of which call manager.saveToDisk() after doing mutations), a second supervisor instance can acquire the lock and also call saveToDisk() concurrently. the two writes race against the same JSON file and one will silently clobber the other.

    AGENTS.md requires concurrency issues to be explicitly called out. the minimum fix is to propagate the signal into fn so it can abort in-flight storage writes when the heartbeat loses the lease:

    async function withLockedManager(runtime, mutate, signal) {
        return withSupervisorStorageLock(runtime, async (lockSignal) => {
            const manager = await runtime.AccountManager.loadFromDisk();
            return mutate(manager, lockSignal);
        }, signal);
    }

    and thread lockSignal through to every saveToDisk() path inside the mutator.

  3. scripts/codex-supervisor.js, line 501-511 (link)

    two-pass path re-scans the same recent files

    sortedEntries is a superset of recentEntries — every file in the first pass also appears in the second. the calling loop in findSessionBinding runs matchSessionBindingEntry (which opens a ReadStream + readline interface) for every entry in each pass, so files from the first pass get opened a second time immediately afterward.

    on Windows, re-opening the same file handle within the same tick/event loop tick can hit antivirus EBUSY locks, and AGENTS.md calls this exact risk out for rapid file-open patterns. with thousands of session files this creates visible lag even on unix.

    fix: filter overlap from the second pass so recently-scanned entries are not re-read:

    return [recentEntries, sortedEntries.filter((e) => !recentEntries.includes(e))];

    no existing vitest case catches this because the test at line 424 only validates the array structure (lengths) of the returned passes, not that the two arrays are disjoint.

  4. scripts/codex-supervisor.js, line 2276-2315 (link)

    child SIGINT-exit triggers restart probe when supervisor signal is not aborted

    result.exitCode is 130 and result.signal is "SIGINT" when the child was killed by SIGINT (e.g. kill -INT <child_pid> from a script, or the child re-raises SIGINT). in that case the supervisor's own signal may not be aborted (the process.once("SIGINT", abort) only fires if the supervisor process itself receives the signal from the terminal process group). the restart block runs, acquires the storage lock, and probes the current account — treating an operator-killed child as a quota-exhaustion event.

    adding a result.signal guard closes this gap:

    if (!restartDecision && result.exitCode !== 0 && result.signal !== "SIGINT" && knownSessionId) {

    no vitest coverage for the case where result.signal === "SIGINT" but signal.aborted === false.

  5. scripts/codex-supervisor.js, line 2434-2467 (link)

    unexpected storage errors propagate as uncaught exceptions instead of a non-zero exit code

    the top-level catch only intercepts AbortError and re-throws everything else. if withLockedManager (inside the monitor loop or the post-exit restart probe) throws a non-abort error — e.g. storage path resolution fails, AccountManager.loadFromDisk() throws EACCES, or a disk-full condition hits the lock-file write — the exception propagates through runCodexSupervisorWithRuntime, exits the try block here with the throw error branch, and crashes the node process as an unhandled rejection rather than returning exit code 1 and printing a diagnostic.

    codex.js's main() caller has no outer try/catch around runCodexSupervisorIfEnabled, so the process terminates with a stack trace visible to the user.

    consider wrapping with a fallback:

    } catch (error) {
      if (error?.name === "AbortError") {
        return 130;
      }
      relaunchNotice(
        `session supervisor failed unexpectedly: ${
          error instanceof Error ? error.message : String(error)
        }`,
      );
      return null; // fall through to the normal non-supervised path
    }

    no vitest coverage for this path.

  6. scripts/codex-supervisor.js, line 1261-1263 (link)

    TOCTOU race: concurrent lock acquisition survives stale-lock cleanup

    between isSupervisorLockStale returning true and safeUnlink removing the file, a concurrent supervisor can:

    1. try fs.open(lockPath, "wx") → fails (old file still present)
    2. call isSupervisorLockStale → reads the same stale payload, also returns true
    3. race A unlinks the stale file → succeeds
    4. race B acquires the lock with a fresh ownerId and expiresAt (via fs.open("wx"))
    5. race A calls safeUnlinkENOENT → returns true (treated as success by the code at line 965–967)
    6. race A retries fs.open("wx")succeeds — now both A and B believe they hold the lock

    the fix is to re-read and verify the payload ownerId hasn't changed before unlinking, e.g. read the lock again after isSupervisorLockStale and only unlink if the ownerId still matches the stale process (or the file is still expired). this is a concurrency issue explicitly called out in our AGENTS.md guidance.

    if (await isSupervisorLockStale(lockPath, ttlMs)) {
      const stalePayload = await readSupervisorLockPayload(lockPath);
      const removed = stalePayload?.ownerId
        ? await safeUnlinkOwnedSupervisorLock(lockPath, stalePayload.ownerId)
        : await safeUnlink(lockPath);
      if (removed) continue;
    }
  7. scripts/codex-supervisor.js, line 824-826 (link)

    Fallback ignores env-var override — supervisor silently disabled when dist is absent

    the getCodexCliSessionSupervisor fallback closure on line 826 checks only pluginConfig.codexCliSessionSupervisor === true. the real implementation in lib/config.ts calls resolveBooleanSetting(...) which also reads the corresponding environment variable. if dist/lib/config.js is absent or stale (e.g. first install before npm run build), the fallback ignores the env-var entirely. a user who sets the env-var in their shell to opt in gets the supervisor silently skipped with no diagnostic output.

    the same gap exists in the getRetryAllAccountsRateLimited and getPreemptiveQuotaEnabled fallbacks. at minimum, log a warning when falling back so users can diagnose stale builds, or replicate the env-var check inside each fallback lambda using parseBooleanEnv.

  8. lib/storage.ts, line 827-838 (link)

    latestValidSnapshot fix may silently return undefined for callers expecting a snapshot

    the old implementation returned BackupSnapshotMetadata | undefined but the new one chains [0]?.snapshot. this is safe when the filtered list is empty (both return undefined). however, every call-site that previously assumed the return was always a full BackupSnapshotMetadata object when truthy will continue to work — but the type now explicitly goes through an intermediate {snapshot, index} object and the ?. on line ending is the only guard.

    the real risk: if a future caller adds const latest = latestValidSnapshot(snapshots); if (latest) doSomethingWith(latest.valid)latest is now the raw BackupSnapshotMetadata (unwrapped via ?.snapshot) so .valid still works. this is fine. but the PR checklist shows npm test was not run — so no test has validated the stable-sort tiebreaker behaviour end-to-end. the getBackupMetadata export added to test/storage.test.ts would allow a targeted unit test for the tie scenario (two valid snapshots with identical mtimeMs). missing vitest coverage for this specific path.

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 211-222

Comment:
**second-pass still reads ALL JSONL files when rollout path is unknown**

`getSessionBindingEntryPasses` fixes the previous concern for the case where `hasKnownRolloutPath` is `true` — great. but when `sessionId` is set, `hasKnownRolloutPath` is `false`, and `recentEntries` is non-empty but smaller than `sortedEntries`, the function returns `[recentEntries, sortedEntries]`. the second pass (`sortedEntries`) includes every file from the cached listing, and `findSessionBinding` calls `matchSessionBindingEntry` (which opens and reads each file) for all of them.

in `waitForSessionBinding` the listing is cached for `listingRefreshMs ≈ 2400ms`, which prevents repeated `listJsonlFiles` walks — that's the right fix. but within each cache window, every poll tick that doesn't find a match in `recentEntries` falls through to the full `sortedEntries` pass and issues one read per file. on a windows machine with 6 months of daily codex sessions (thousands of `.jsonl` files), the antivirus will flag rapid file-open sequences in `~/.codex/sessions/`.

once `rememberSessionBinding` caches the path the fast path is taken for all subsequent calls, so this only matters during the initial binding-not-yet-captured phase. consider keeping a per-pass seen-set so the second pass only reads files that were not already matched in the first pass, halving redundant file opens:

```js
function getSessionBindingEntryPasses(entries, sinceMs, sessionId, hasKnownRolloutPath) {
  const sorted = [...entries].sort((l, r) => r.mtimeMs - l.mtimeMs);
  const recent = sorted.filter(e => e.mtimeMs >= sinceMs - 2_000);
  if (!sessionId || hasKnownRolloutPath) return [recent];
  if (recent.length === 0 || recent.length === sorted.length) return [sorted];
  const recentPaths = new Set(recent.map(e => e.filePath));
  return [recent, sorted.filter(e => !recentPaths.has(e.filePath))];
}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 48-52

Comment:
**module-level singletons leak state across supervised restarts**

`snapshotProbeCache` and `sessionRolloutPathById` are module-level `Map`s that survive the entire process lifetime. for normal single-session usage this is fine — but consider the restart loop in `runInteractiveSupervision`: after 16 restarts across different accounts, `snapshotProbeCache` may still hold TTL-valid entries from an account that was explicitly rotated away. an account that was marked unavailable and cooled-down could receive a cached `{rotate: false}` probe result during the next pre-warm window if the TTL (default `1500ms`) hasn't expired yet.

`clearProbeSnapshotCache(account)` is called in `markAccountUnavailable`, so rotated accounts are cleared correctly. but `clearAllProbeSnapshotCache` is only exposed through `TEST_ONLY_API`, which means in production there's no reset boundary between restarts. if the TTL window overlaps a pre-warm and a rotation in rapid succession, the stale cache entry survives into the next restart's `ensureLaunchableAccount` check.

calling `clearAllProbeSnapshotCache()` at the top of each outer `while` iteration in `runInteractiveSupervision` (after the child exits) would be a cheap safety flush with no correctness downside.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: lib/storage.ts
Line: 827-838

Comment:
**`latestValidSnapshot` fix may silently return `undefined` for callers expecting a snapshot**

the old implementation returned `BackupSnapshotMetadata | undefined` but the new one chains `[0]?.snapshot`. this is safe when the filtered list is empty (both return `undefined`). however, every call-site that previously assumed the return was always a full `BackupSnapshotMetadata` object when truthy will continue to work — but the type now explicitly goes through an intermediate `{snapshot, index}` object and the `?.` on line ending is the only guard.

the real risk: if a future caller adds `const latest = latestValidSnapshot(snapshots); if (latest) doSomethingWith(latest.valid)``latest` is now the raw `BackupSnapshotMetadata` (unwrapped via `?.snapshot`) so `.valid` still works. this is fine. but the PR checklist shows `npm test` was **not** run — so no test has validated the stable-sort tiebreaker behaviour end-to-end. the `getBackupMetadata` export added to `test/storage.test.ts` would allow a targeted unit test for the tie scenario (two valid snapshots with identical `mtimeMs`). missing vitest coverage for this specific path.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 1160-1267

Comment:
**probe-unavailable accounts not persisted — silent spin under full network outage**

when ALL accounts in a batch return `QuotaProbeUnavailableError`, the inner `withLockedManager` callback adds them all to `probeUnavailableAccounts`, calls no `markAccountUnavailable`, keeps `dirty = false`, and returns `{ kind: "retry" }`. on the next outer-loop iteration, `getProbeCandidateBatch` re-selects the same accounts (they have no `coolingDownUntil` set) and the cycle repeats.

with `MAX_ACCOUNT_SELECTION_ATTEMPTS = 32` and `DEFAULT_SELECTION_PROBE_TIMEOUT_MS = 2_500ms`, a complete network outage can cause `ensureLaunchableAccount` to block for up to ~80 seconds before returning `{ ok: false }`. the user sees the terminal freeze with no feedback. no vitest coverage exists for the all-accounts-unavailable path in the test suite.

two improvements to consider:
1. after two consecutive all-unavailable batches, return `{ ok: false }` early and surface a "network unreachable" notice via `relaunchNotice`.
2. add a vitest case that feeds all probes as `QuotaProbeUnavailableError` and asserts the function returns `{ ok: false }` in fewer than 3 iterations.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: scripts/codex-supervisor.js
Line: 1702-1706

Comment:
**`spawnRealCodex` passes full `process.env` — token material in child env**

`env: process.env` forwards every environment variable to the spawned codex process, including any `OPENAI_API_KEY`, refresh-token helpers, or internal auth overrides that may be set in the parent process. this is intentional for a transparent wrapper, but it means any future env-var that this tool sets for internal coordination (e.g. a lock-owner ID or session hint) will also bleed into the child.

worth noting in a comment so callers adding new coordinator env-vars know to strip them before forwarding:

```suggestion
function spawnRealCodex(codexBin, args) {
	// Forward the full environment — ensure coordinator-only vars are stripped here
	// if this list grows (e.g. CODEX_AUTH_INTERNAL_*).
	return spawn(process.execPath, [codexBin, ...args], {
		stdio: "inherit",
		env: process.env,
	});
}
```

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix remaining review..."

@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

walkthrough

adds an interactive session supervisor for the codex cli, a backend toggle and plugin flag, abort-signal support for quota probing, storage transaction path threading and tie-break fixes, routing helpers, and many tests plus integration into the codex wrapper. (≤50 words)

changes

Cohort / File(s) Summary
settings & ui
lib/codex-manager/settings-hub.ts, lib/ui/copy.ts, lib/schemas.ts
added codexCliSessionSupervisor backend toggle and experimental menu entry; updated help text and zod schema to accept optional boolean. (see lib/codex-manager/settings-hub.ts:1, lib/ui/copy.ts:1, lib/schemas.ts:1)
plugin config & defaults
lib/config.ts, lib/preemptive-quota-scheduler.ts
added codexCliSessionSupervisor default and exported getCodexCliSessionSupervisor() env-first getter; introduced exported DEFAULT_PREEMPTIVE_QUOTA_REMAINING_PERCENT_5H and split secondary defaulting logic. (see lib/config.ts:1, lib/preemptive-quota-scheduler.ts:1)
quota probing / abort
lib/quota-probe.ts
added signal?: AbortSignal to probe options; per-request AbortController wiring, pre-request abort checks, listener cleanup, and abort-specific error propagation. (see lib/quota-probe.ts:1)
storage & transactions
lib/storage.ts
added storage-path equivalence helpers, made internal load/save accept storagePath, threaded storagePath through transactions, deterministic mtime tie-break using original index, and ensured state.active = false cleared in finally. (see lib/storage.ts:1)
session supervisor core
scripts/codex-supervisor.js, scripts/codex.js
new runCodexSupervisorIfEnabled() and isInteractiveCommand() with lock acquisition, account selection (quota probing, caching, cooldown), session resume binding, monitor loop with quota-driven restart/prewarm/commit, and integration into scripts/codex.js startup/forward flow. exports test-only hooks under test env. (see scripts/codex-supervisor.js:1, scripts/codex.js:1)
routing helpers
scripts/codex-routing.js
added findPrimaryCodexCommand(), hasTopLevelHelpOrVersionFlag(), and splitCodexCommandArgs() to parse/split codex command args and detect top-level help/version flags. (see scripts/codex-routing.js:1)
tests: supervisor & integration
test/codex-supervisor.test.ts, test/codex-bin-wrapper.test.ts
large new supervisor test suite plus wrapper tests and fixtures; covers lock robustness, concurrency, selection/prewarm/commit, restart escalation, and integration with auto-sync. (see test/codex-supervisor.test.ts:1, test/codex-bin-wrapper.test.ts:1)
tests: unit & regressions
test/plugin-config.test.ts, test/quota-probe.test.ts, test/settings-hub-utils.test.ts, test/preemptive-quota-scheduler.test.ts, test/storage.test.ts
added tests for env precedence of CLI supervisor flag, quota-probe abort behavior, settings toggle behavior, preemptive thresholds, async rm retry helper, and transaction-path drift / mtime tie-break cases. (see test/plugin-config.test.ts:1, test/quota-probe.test.ts:1, test/storage.test.ts:1)
package & scripts
package.json
added npm script test:session-supervisor:smoke targeting supervisor-related tests. (see package.json:1)

sequence diagram(s)

sequenceDiagram
    participant cli as cli invocation
    participant sup as session supervisor
    participant lock as storage lock
    participant mgr as account manager
    participant probe as quota probe
    participant child as codex child process
    participant sess as session store
    participant monitor as monitor loop

    cli->>sup: runCodexSupervisorIfEnabled(rawArgs)
    alt supervisor enabled & interactive
        sup->>lock: acquire supervisor lock
        lock-->>sup: lock acquired
        sup->>mgr: select launchable account
        mgr->>probe: fetchCodexQuotaSnapshot(signal)
        probe-->>mgr: quota snapshot
        mgr-->>sup: chosen account
        sup->>child: spawn codex with resume args
        child-->>sess: append session .jsonl
        sup->>sess: scan session metadata / bind session
        loop monitor
            monitor->>probe: probe current account quota
            probe-->>monitor: quota status
            alt quota near exhaustion
                monitor->>child: request restart (SIGINT→SIGTERM→SIGKILL)
                child-->>monitor: exit
                monitor->>mgr: prepare/select next account (prewarm)
                monitor->>child: spawn resumed child
            end
        end
        child-->>sup: exit code
        sup->>lock: release lock
        lock-->>sup: released
        sup-->>cli: return exit code
    else supervisor disabled or non-interactive
        sup-->>cli: null (fallback to direct forward)
    end
Loading

estimated code review effort

🎯 5 (critical) | ⏱️ ~120 minutes

flagged issues (explicit):

  • concurrency risks: transaction active-flag lifecycle and storage-path threading require careful review for nested/overlapping transactions. see lib/storage.ts:1.
  • windows edge cases: lock unlink/EPERM retry and signal escalation differences on windows need dedicated regression tests. see scripts/codex-supervisor.js:1 and lib/storage.ts:1.
  • abort & listener leaks: quota probe abort wiring must remove listeners and preserve original error semantics without swallowing. see lib/quota-probe.ts:1.
  • supervisor complexity and async orchestration: prewarm/commit race windows, in-flight probe deduplication, and restart escalation need deep scrutiny. see scripts/codex-supervisor.js:1.
  • codex integration edge: ensure no duplicate startup sync runs across supervisor-forward vs fallback paths. see scripts/codex.js:1.
  • missing regression tests: add explicit tests for windows-style stale-lock cleanup under concurrent EPERM unlink and for nested concurrent withAccountStorageTransaction to reproduce active-flag races. reference test/codex-supervisor.test.ts:1 and test/storage.test.ts:1.
  • test-only exposure check: verify __testOnly is gated to test env only and cannot leak into production runtime. see scripts/codex-supervisor.js:1.
  • settings ui toggle resolution: verify persisted toggle respects env override at startup and across prompts. see lib/config.ts:1 and lib/codex-manager/settings-hub.ts:1.
🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive PR description covers summary, what changed, and validation/risk checklist. however, the 'Validation' section shows npm test and npm run build unchecked, contradicting the claim that review follow-up commit 71f5b3f is complete. docs and governance checklist items remain largely unchecked. before merge, either complete the validation checklist (npm test, npm run build) and docs updates, or explicitly document why they are deferred (e.g. 'feature disabled by default, docs deferred to feature-enablement PR'). clarify the state of the branch.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed title clearly summarizes the main change: adding a session resume supervisor feature for faster account rotation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch git-plan/05-fast-supervised-resume
✨ Simplify code
  • Create PR with simplified code
  • Commit simplified code in branch git-plan/05-fast-supervised-resume

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lib/storage.ts (3)

2324-2337: ⚠️ Potential issue | 🟡 Minor

storage path comparison correctly prevents stale snapshot reuse after rotation, but missing regression test for path-mismatch scenario.

the dual check at lib/storage.ts:2331-2333 handles the supervisor rotation scenario correctly:

  1. transactionState?.active — transaction still in progress
  2. transactionState.storagePath === activeStoragePath — storage path hasn't changed since transaction start

if setStoragePath() is called during a supervisor-driven account rotation, subsequent exportAccounts calls correctly fall back to a fresh read instead of returning data from the old path.

however: no regression test exists for this scenario. test/storage.test.ts has basic exportAccounts tests (file exists, no accounts) but none that verify correct behavior when storage path changes mid-transaction. add a test that:

  • starts a transaction via withAccountStorageTransaction
  • calls setStoragePath() to a different path
  • calls exportAccounts from within the handler or after
  • verify it reads from the new path, not the stale snapshot

the deadlock concern about withAccountStorageTransaction((current) => Promise.resolve(current)) fallback is valid only if exportAccounts is called from within an existing transaction handler. the current codebase doesn't do this, but lib/storage.ts lacks re-entrancy guards on withStorageLock (lines 174-181), which could allow this pattern to deadlock if introduced. this risk should be documented or guarded explicitly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2324 - 2337, Add a regression test that verifies
exportAccounts does not return a stale snapshot when the storage path changes
mid-transaction: write a test that calls withAccountStorageTransaction (to
create an active transaction via withAccountStorageTransaction), then calls
setStoragePath(...) to a different path, and then calls exportAccounts (either
inside the transaction handler or immediately after) and asserts the returned
data is read from the new path; ensure the test exercises the code path that
uses transactionSnapshotContext, getStoragePath, and the fallback to
withAccountStorageTransaction so the conditional transactionState?.active &&
transactionState.storagePath === activeStoragePath is validated. Also add a
short note or additional test to cover potential deadlock re-entrancy by
exercising exportAccounts while inside a withStorageLock-held handler (using
withStorageLock) to document/guard the risk.

2331-2337: ⚠️ Potential issue | 🟡 Minor

potential deadlock if storagePath changes during active transaction before exportAccounts is called.

exportAccounts safely avoids deadlock when called from inside withAccountStorageTransaction via a storagePath check: if transactionState.active && transactionState.storagePath === activeStoragePath, it reuses the snapshot without re-acquiring the mutex. however, if supervisor rotation triggers setStoragePath() to a new path while inside a transaction callback, and then exportAccounts is called from that handler (as demonstrated in test/named-backup-export.test.ts:347-362), the fallback to withAccountStorageTransaction() would deadlock trying to re-acquire storageMutex.

current usage is safe because setStoragePath() is only called at command start, never mid-execution. but the code is fragile: if future auth rotation logic changes storagePath during an active transaction, callers would hang.

either:

  1. document explicitly that setStoragePath() must never be called during active withAccountStorageTransaction callbacks
  2. add a withoutLock variant for internal reads when already holding the mutex (lib/storage.ts)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2331 - 2337, The current exportAccounts path can
deadlock if setStoragePath changes during an active transaction because it falls
back to withAccountStorageTransaction and re-acquires storageMutex; add an
internal "withoutLock" read path to avoid re-locking when already in a
transaction: implement a new helper (e.g., readAccountStorageWithoutLock or
withAccountStorageSnapshotNoLock) that returns the current
transactionState.snapshot or the current storage snapshot when not in a
transaction without acquiring storageMutex, update exportAccounts to use that
helper when transactionState.active is true (or when storagePath equality check
indicates the caller already holds the snapshot), and mark the helper
internal/private and document that it must not acquire the mutex so callers
inside withAccountStorageTransaction use it instead of
withAccountStorageTransaction to avoid re-entrancy deadlocks; keep
withAccountStorageTransaction unchanged for external callers.

1909-1935: ⚠️ Potential issue | 🟠 Major

missing regression test for try/finally cleanup fix.

the try/finally at lib/storage.ts:1927-1933 correctly ensures state.active = false even if handler throws. this fixes the stale snapshot bug in exportAccounts at lib/storage.ts:2331 which reuses snapshot when transactionState?.active is true.

however, per coding guidelines for lib/**, this change must cite affected regression tests (vitest). no test exists covering: handler throws → state.active still cleared → subsequent exportAccounts does fresh read. add this to test/storage.test.ts.

also check withAccountAndFlaggedStorageTransaction at lib/storage.ts:1989 — it has the same pattern and likely needs the same test coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1909 - 1935, Add a vitest regression test in
test/storage.test.ts that verifies when withAccountStorageTransaction's handler
throws, the finally block clears state.active so subsequent calls to
exportAccounts perform a fresh load (i.e., do not reuse a stale snapshot);
specifically, simulate a handler that throws inside
withAccountStorageTransaction and then call exportAccounts to confirm it reads
updated data, and add the same test scenario for
withAccountAndFlaggedStorageTransaction to ensure both transaction
implementations (transactionSnapshotContext, state.active) are correctly cleared
after exceptions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/config.ts`:
- Line 159: The scheduler's default (DEFAULT_REMAINING_PERCENT_THRESHOLD) is out
of sync with the config value (preemptiveQuotaRemainingPercent5h) so update
lib/preemptive-quota-scheduler.ts to read the configured value via
getPreemptiveQuotaRemainingPercent5h() (or call the config accessor you have)
instead of hardcoding DEFAULT_REMAINING_PERCENT_THRESHOLD, and initialize
primaryRemainingPercentThreshold and secondaryRemainingPercentThreshold from
that accessor; alternatively, if you prefer constants, change
DEFAULT_REMAINING_PERCENT_THRESHOLD to 10 to match lib/config.ts:159 and
add/adjust tests for preemptive quota rotation to cover both code paths that set
thresholds so regression is caught.

In `@scripts/codex-supervisor.js`:
- Around line 413-419: The deduplication key currently includes
item.refreshToken which is unstable; change the key construction used with
seen.has to exclude refreshToken and use only stable identifiers (item.index,
item.accountId, item.email) so that accounts with refreshed tokens aren't
treated as distinct; update the code that builds the key variable and any
related checks (seen.has, seen.add) to reference the new key composed from those
stable fields only.

In `@scripts/codex.js`:
- Around line 529-540: The supervisor path intentionally calls
runCodexSupervisorIfEnabled followed by autoSyncManagerActiveSelectionIfEnabled
so auto-sync runs unconditionally even when the supervisor returns early; add a
regression test that enables the supervisor and exercises the early-return case
(where runCodexSupervisorIfEnabled returns a non-null supervisedExitCode) and
assert that autoSyncManagerActiveSelectionIfEnabled (and underlying
codex-manager autoSyncActiveAccountToCodex / queuedRefresh behavior) was invoked
before the function returns, and that forwardToRealCodex is not called in that
path; use mocks/stubs for runCodexSupervisorIfEnabled,
autoSyncManagerActiveSelectionIfEnabled, and forwardToRealCodex to verify the
call ordering and side effects.

In `@test/codex-bin-wrapper.test.ts`:
- Around line 596-610: Add a regression test that covers the interactive command
path: create a fixture (use createWrapperFixture and
writeSupervisorRuntimeFixture), create the fake codex binary with
createFakeCodexBin, then call runWrapper with no args (simulating an interactive
`codex` invocation) and CODEX_AUTH_CLI_SESSION_SUPERVISOR=1; assert that the
supervisor behavior differs from the non-interactive path (per
scripts/codex-supervisor.js) — e.g., verify the wrapper invokes the
resume/interactive supervisor wrapper instead of forwarding with the file auth
store injection used in the existing test (check stdout/stderr or exit status
accordingly).

In `@test/codex-supervisor.test.ts`:
- Around line 1-6: Add a regression test that exercises concurrent lock
acquisition for withSupervisorStorageLock: create a temporary storage dir,
import the test API (__testOnly as supervisorTestApi) and spawn two async
callers that race to call supervisorTestApi.withSupervisorStorageLock; assert
only one caller holds the lock at a time (e.g., the second waits until the first
releases) and that a stale lock file is detected/cleaned by simulating an old
lock file (set mtime/contents) before calling; ensure the test cleans up temp
files and uses deterministic timing/Promises to avoid flakiness.
- Around line 91-111: The test currently uses platform "win32" which skips the
SIGINT step; update or add a test that calls
supervisorTestApi?.requestChildRestart with a non-win32 platform (e.g., "linux")
and the same AbortController flow, then assert the child.kill mock was called
with "SIGINT" as well as "SIGTERM" and "SIGKILL" in the expected sequence;
reference the FakeChild class, requestChildRestart call, controller.abort(), and
the child.kill expectations so the test covers the non-win32 SIGINT → SIGTERM →
SIGKILL sequence.

In `@test/plugin-config.test.ts`:
- Around line 991-1001: Add a test case to verify the config-only override for
getCodexCliSessionSupervisor: ensure CODEX_AUTH_CLI_SESSION_SUPERVISOR is unset,
call getCodexCliSessionSupervisor({ codexCliSessionSupervisor: true }) and
expect true, and then clean up by ensuring the env var remains unset; this
mirrors the pattern used for getFastSession and complements the existing default
and env-override tests.

In `@test/quota-probe.test.ts`:
- Around line 173-205: Add a test that verifies fetchCodexQuotaSnapshot
immediately rejects when passed a signal that's already aborted: create an
AbortController, call controller.abort() before invoking
fetchCodexQuotaSnapshot, stub global fetch (vi.stubGlobal("fetch", ...)) to a
mock, then assert the promise rejects with an abort error and that fetch was
never called; this covers the early-return branch in fetchCodexQuotaSnapshot
(the pre-aborted signal check in lib/quota-probe.ts).

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 2324-2337: Add a regression test that verifies exportAccounts does
not return a stale snapshot when the storage path changes mid-transaction: write
a test that calls withAccountStorageTransaction (to create an active transaction
via withAccountStorageTransaction), then calls setStoragePath(...) to a
different path, and then calls exportAccounts (either inside the transaction
handler or immediately after) and asserts the returned data is read from the new
path; ensure the test exercises the code path that uses
transactionSnapshotContext, getStoragePath, and the fallback to
withAccountStorageTransaction so the conditional transactionState?.active &&
transactionState.storagePath === activeStoragePath is validated. Also add a
short note or additional test to cover potential deadlock re-entrancy by
exercising exportAccounts while inside a withStorageLock-held handler (using
withStorageLock) to document/guard the risk.
- Around line 2331-2337: The current exportAccounts path can deadlock if
setStoragePath changes during an active transaction because it falls back to
withAccountStorageTransaction and re-acquires storageMutex; add an internal
"withoutLock" read path to avoid re-locking when already in a transaction:
implement a new helper (e.g., readAccountStorageWithoutLock or
withAccountStorageSnapshotNoLock) that returns the current
transactionState.snapshot or the current storage snapshot when not in a
transaction without acquiring storageMutex, update exportAccounts to use that
helper when transactionState.active is true (or when storagePath equality check
indicates the caller already holds the snapshot), and mark the helper
internal/private and document that it must not acquire the mutex so callers
inside withAccountStorageTransaction use it instead of
withAccountStorageTransaction to avoid re-entrancy deadlocks; keep
withAccountStorageTransaction unchanged for external callers.
- Around line 1909-1935: Add a vitest regression test in test/storage.test.ts
that verifies when withAccountStorageTransaction's handler throws, the finally
block clears state.active so subsequent calls to exportAccounts perform a fresh
load (i.e., do not reuse a stale snapshot); specifically, simulate a handler
that throws inside withAccountStorageTransaction and then call exportAccounts to
confirm it reads updated data, and add the same test scenario for
withAccountAndFlaggedStorageTransaction to ensure both transaction
implementations (transactionSnapshotContext, state.active) are correctly cleared
after exceptions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 888f1f7f-29bf-4a9d-8cd2-035939b924a3

📥 Commits

Reviewing files that changed from the base of the PR and between b918aac and 36e6070.

📒 Files selected for processing (13)
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
  • lib/quota-probe.ts
  • lib/schemas.ts
  • lib/storage.ts
  • lib/ui/copy.ts
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/plugin-config.test.ts
  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/ui/copy.ts
  • lib/schemas.ts
  • lib/codex-manager/settings-hub.ts
  • lib/storage.ts
  • lib/config.ts
  • lib/quota-probe.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/settings-hub-utils.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/quota-probe.test.ts
  • test/plugin-config.test.ts
🔇 Additional comments (37)
lib/storage.ts (2)

168-172: type extension for storagePath tracking looks correct.

the added storagePath: string field enables downstream checks to detect when the storage path has changed since the transaction started. this prevents stale snapshot reuse during supervisor-driven account rotation where setStoragePath() may be called.


1937-1993: consistent try/finally pattern applied here as well.

the same state cleanup fix is correctly applied to the dual-storage transaction wrapper. the rollback logic inside persist() can throw AggregateError at line 1969, and the try/finally ensures state.active is still cleared in that case.

lib/quota-probe.ts (4)

308-315: abort signal support looks clean.

the interface extension and helper are straightforward. createAbortError follows the standard pattern for abort errors with name = "AbortError".


341-343: early abort check before each model probe is correct.

checking options.signal?.aborted before starting a probe iteration prevents unnecessary network calls when cancellation is requested between retries.


368-387: abort signal wiring to per-request controller is well-structured.

the sequence handles the race correctly: check if already aborted before adding listener, then cleanup in finally block. the { once: true } on addEventListener prevents listener leaks.


426-435: abort error propagation preserves original error context.

rethrowing the original error when signal.aborted is true (line 427) keeps the stack trace intact. the final check after exhausting probes (lines 433-435) ensures abort is surfaced even if all models failed for other reasons first.

lib/ui/copy.ts (1)

75-75: ui label for session supervisor toggle looks good.

matches the setting key codexCliSessionSupervisor and is consistent with other experimental setting labels in this file.

lib/schemas.ts (1)

47-47: schema field for supervisor toggle is correctly defined.

optional boolean matches the pattern of other feature flags in the schema.

lib/codex-manager/settings-hub.ts (5)

187-199: toggle key addition to union type is correct.

codexCliSessionSupervisor is added alongside other backend toggle keys. note that it's intentionally not added to any BACKEND_CATEGORY_OPTIONS since it's experimental-only.


275-284: experimental settings action type extended correctly.

toggle-session-supervisor follows the naming pattern of other experimental actions like toggle-refresh-guardian.


292-296: backend toggle option definition is complete.

includes key, label, and description matching the pattern of other toggle options.


2540-2544: menu item wiring uses correct fallback chain.

draft.codexCliSessionSupervisor ?? BACKEND_DEFAULTS.codexCliSessionSupervisor ?? false resolves correctly since BACKEND_DEFAULTS.codexCliSessionSupervisor is explicitly false per lib/config.ts:148.


2601-2611: toggle handler inverts value correctly.

the negation logic !(draft.codexCliSessionSupervisor ?? BACKEND_DEFAULTS.codexCliSessionSupervisor ?? false) matches the display logic, ensuring consistent behavior.

test/settings-hub-utils.test.ts (3)

137-154: async cleanup with retry is good for windows filesystem quirks.

the retry logic handles ENOTEMPTY, EPERM, EBUSY with linear backoff. this addresses windows file locking issues during test cleanup.


205-209: separate async afterEach for directory cleanup is correct.

vitest runs multiple afterEach hooks in registration order. separating async cleanup from synchronous env restoration is cleaner.


712-723: test for session supervisor toggle is deterministic and complete.

queues toggle action followed by save, then asserts the config value flipped from false to true. follows the pattern of other experimental settings tests in this file.

test/codex-bin-wrapper.test.ts (3)

60-63: fixture now includes supervisor script.

copying codex-supervisor.js to the fixture ensures the wrapper can import it during tests.


67-110: supervisor runtime fixture stubs are minimal and correct.

the stubs provide just enough functionality for the supervisor to run:

  • config.js returns supervisor enabled with quota settings
  • accounts.js provides a mock AccountManager with required methods
  • quota-probe.js returns null (no quota data needed for this test path)

note: lib/codex-bin-wrapper.test.ts:93 returns a mock account with email: "healthy@example.com" which is acceptable for test fixtures.


410-413: windows decoy test updated with supervisor script.

mirrors the change in createWrapperFixture() to keep fixtures consistent.

test/plugin-config.test.ts (2)

141-144: lgtm - default config assertion updated correctly.

the new default values preemptiveQuotaRemainingPercent5h: 10 and codexCliSessionSupervisor: false are consistently applied across all config assertion blocks in this file.


974-974: lgtm - default threshold change from 5 to 10.

the assertion at test/plugin-config.test.ts:974 correctly reflects the new default of 10 for preemptiveQuotaRemainingPercent5h, matching the change in lib/config.ts:159.

lib/config.ts (2)

861-878: lgtm - new accessor follows established pattern.

the getCodexCliSessionSupervisor accessor correctly uses resolveBooleanSetting with env var CODEX_AUTH_CLI_SESSION_SUPERVISOR, config key codexCliSessionSupervisor, and default false. the jsdoc accurately describes the behavior.


148-148: lgtm - default config entry added.

codexCliSessionSupervisor: false is correctly added to DEFAULT_PLUGIN_CONFIG, matching the accessor's default and the test assertions.

test/codex-supervisor.test.ts (3)

10-27: lgtm - retry-based cleanup handles windows edge cases.

the removeDirectoryWithRetry helper correctly handles ENOTEMPTY, EPERM, and EBUSY errors with exponential backoff, which addresses windows filesystem locking behavior.


43-67: lgtm - boundary test validates scan limit correctly.

the test creates exactly 199 preamble lines plus the session_meta line, placing it at position 200 (the limit). this correctly validates the SESSION_META_SCAN_LINE_LIMIT = 200 boundary in scripts/codex-supervisor.js:20.


69-89: lgtm - over-limit test validates scan cutoff.

placing session_meta at line 201 (after 200 preamble lines) correctly tests that the scanner stops at the limit and returns null.

scripts/codex-supervisor.js (11)

8-30: lgtm - timing constants are reasonable.

the reduced timings (poll=750ms, idle=400ms, probe=2-4s) support faster rotation detection. the platform-specific DEFAULT_SIGNAL_TIMEOUT_MS (125ms on win32 vs 500ms) correctly accounts for windows signal handling differences.


33-53: lgtm - sleep correctly handles abort signals.

the sleep implementation properly cleans up the abort listener and returns false on abort, true on completion. the guard against double-settle is correct.


378-381: potential TOCTOU race in stale lock cleanup.

at scripts/codex-supervisor.js:378-380, between checking isSupervisorLockStale and calling safeUnlink, another process could acquire the lock. this is a benign race (the new lock holder will simply retry), but worth noting. the current implementation is acceptable given the TTL fallback.


498-539: lgtm - quota evaluation logic is correct.

the function correctly prioritizes 429 rate-limit detection, then checks preemptive quota thresholds when enabled. the remaining percentage calculation 100 - usedPercent is correct.


773-815: lgtm - session meta extraction is well-bounded.

the scanner correctly limits to 200 non-empty lines, handles malformed JSON, and cleans up readline/stream resources in the finally block.


890-914: child.kill return value not checked.

child.kill("SIGTERM") returns false if the process doesn't exist, but this is not checked. while the subsequent exitCode !== null check handles this case, the code could be clearer. this is minor since the behavior is correct.


1010-1011: manager reassignment may lose state.

at scripts/codex-supervisor.js:1010, if currentState.manager is null (which shouldn't happen given withLockedManager always returns the manager), the outer manager is retained. however, if withLockedManager returns a newer manager with updated state, and then fails on a subsequent call returning the stale outer manager, state could be inconsistent. the current implementation looks safe since withLockedManager reloads from disk each time.


184-186: lgtm - no token or pii leakage in logs.

relaunchNotice only logs session IDs, wait times, and rotation reasons. no access tokens, refresh tokens, or email addresses are included in log output.


750-752: lgtm - symlink handling prevents traversal issues.

skipping symbolic links at scripts/codex-supervisor.js:750-752 prevents potential infinite loops or symlink-based path traversal attacks.


765-771: lgtm - cwd normalization handles windows case-insensitivity.

normalizeCwd correctly lowercases paths on win32 and normalizes path separators, enabling reliable cwd matching across platforms.


1183-1245: lgtm - entry point has proper signal cleanup.

the runCodexSupervisorIfEnabled function correctly registers and unregisters SIGINT/SIGTERM handlers, returns null when supervisor is disabled (allowing fallback to non-supervised path), and handles abort errors gracefully.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (1)
test/codex-supervisor.test.ts (1)

334-374: ⚠️ Potential issue | 🟠 Major

add the actual two-caller lock race here.

this only covers stale-lock cleanup. it still never runs two withLockedManager callers against the same storage path, so a regression in the mutual-exclusion path in scripts/codex-supervisor.js:322-395 would still pass. add a deterministic race where the first holder blocks on a promise and the second caller can only enter after release.

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-supervisor.test.ts` around lines 334 - 374, Add a deterministic
two-caller race to this test: create a deferred Promise (resolve/reject
handlers) and use supervisorTestApi.withLockedManager with two concurrent
callers against the same runtime/storage path—have the first caller (using
FakeManager and createFakeRuntime) acquire the lock and await the deferred
Promise to simulate a held lock, then start the second caller which should block
until you resolve the deferred Promise; after resolving, assert the second
caller runs only after the first releases (e.g., order of results or timestamps)
and keep the existing stale-lock/EPERM unlink behavior; reference
withLockedManager, supervisorTestApi, FakeManager, createFakeRuntime and the
unlinkSpy mock to place the new concurrent-caller logic in this test while
keeping it deterministic via the deferred Promise and vitest async assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/preemptive-quota-scheduler.ts`:
- Line 27: Split the single DEFAULT_REMAINING_PERCENT_THRESHOLD into two
distinct constants (e.g., DEFAULT_REMAINING_PERCENT_THRESHOLD_PRIMARY and
DEFAULT_REMAINING_PERCENT_THRESHOLD_SECONDARY) and update the
PreemptiveQuotaScheduler constructor so it seeds the primary and secondary
windows with their respective defaults instead of the single shared value; also
ensure the config default named remainingPercentThresholdSecondary remains 5 and
adjust code that falls back to use the new secondary constant. Add a Vitest
regression adjacent to the existing preemptive-quota-scheduler tests (near
test/preemptive-quota-scheduler.test.ts) that verifies omitting
remainingPercentThresholdSecondary keeps the 7d fallback at 5 and does not
change behavior when PRIMARY is 10, and mention the changed symbols
DEFAULT_REMAINING_PERCENT_THRESHOLD_PRIMARY,
DEFAULT_REMAINING_PERCENT_THRESHOLD_SECONDARY, PreemptiveQuotaScheduler
constructor, and remainingPercentThresholdSecondary in your commit message.

In `@lib/storage.ts`:
- Around line 1916-1933: The transaction binds to a mutable global storage path,
but loadAccountsInternal and saveAccountsUnlocked inside the transaction still
call helpers that read getStoragePath() dynamically, allowing cross-path writes;
fix by capturing storagePath and passing it into loadAccountsInternal and into
the persist closure so both load and persist use the captured path (or
alternatively make persist immediately throw if getStoragePath() !==
state.storagePath), i.e., update the call sites around
transactionSnapshotContext.run (including the loadAccountsInternal invocation
and the persist closure that calls saveAccountsUnlocked) to accept/forward the
captured storagePath and ensure exportAccounts() and the mismatch check see
consistent paths; update/annotate tests in test/storage.test.ts (959-1097) to
cover the cross-path write case per lib/** guidelines.

In `@scripts/benchmark-session-supervisor.mjs`:
- Around line 202-267: The benchmark loop reuses probe snapshots so later
iterations see cache hits and report artificially low durations; fix by
disabling or clearing the snapshot cache before each iteration — e.g. set
process.env.CODEX_AUTH_CLI_SESSION_SNAPSHOT_CACHE_TTL_MS = "0" (or call the
test-only cache-clear helper on the imported api if available) at the top of the
for loop that uses buildRuntime / api (the loop that measures serialDurations,
overlapDurations, prewarmedDurations), ensuring the environment var or
cache-clear happens before each buildRuntime/api call so every iteration
measures a fresh probe.

In `@scripts/codex-supervisor.js`:
- Around line 1334-1358: The escalation from SIGTERM to SIGKILL in
requestChildRestart is redundant on Windows because SIGTERM/SIGKILL both
forcibly terminate; update requestChildRestart to either skip the final SIGKILL
step when platform === "win32" (i.e., only attempt SIGINT/SIGTERM there) or add
a clear comment in the function explaining the Windows behavior and why the
SIGKILL branch is intentionally left (to avoid unnecessary waits); make the
change around the kill calls (child.kill("SIGTERM") / child.kill("SIGKILL")) so
reviewers can see the platform-specific behavior documented or the SIGKILL call
gated by platform.
- Around line 78-94: The abortablePromise function currently leaves the abort
listener attached when the input promise resolves first; update abortablePromise
to always remove the onAbort listener when the race settles: keep the existing
onAbort handler and add a cleanup step that calls
signal.removeEventListener("abort", onAbort) when the original promise resolves
or rejects (e.g., attach a finally/then/catch on the input promise before
returning the race), ensuring the listener is removed on normal completion as
well as when the abort fires; reference the abortablePromise function and the
onAbort listener to locate where to add the cleanup.
- Around line 796-867: probeAccountSnapshot has a race where two callers can
both miss snapshotProbeCache.pending between the read of pendingEntry and the
later set of fetchPromise; to fix, allocate and insert a placeholder pending
entry into snapshotProbeCache for the cacheKey before starting the actual async
fetch so subsequent callers will see it and await it: create a deferred/promise
placeholder, set snapshotProbeCache.set(cacheKey, { snapshot: current?.snapshot
?? null, expiresAt: current?.expiresAt ?? 0, pending: placeholder }), then start
the async work that resolves/rejects that placeholder and in its finally replace
the cache entry with snapshot/expiresAt (clearing pending) as done in the
current finally block; update references to fetchPromise/pendingEntry
accordingly so all callers await the same pending promise.

In `@test/codex-bin-wrapper.test.ts`:
- Around line 613-627: The test currently only asserts the generic forwarded
output and can pass even when the supervisor path isn't used; modify the test
(the "uses the supervisor wrapper for non-interactive commands when enabled"
case that calls createWrapperFixture, writeSupervisorRuntimeFixture,
createFakeCodexBin and runWrapper) to assert a supervisor-only
marker/interaction—for example assert result.stdout contains a distinct token
emitted only by the supervisor path (e.g. "SUPERVISOR:FORWARDED" or similar) and
update createFakeCodexBin to emit that marker when
CODEX_AUTH_CLI_SESSION_SUPERVISOR is set so the test fails if scripts/codex.js
doesn't actually use the supervisor branch. Ensure the added assertion is
deterministic and uses the existing runWrapper/ createFakeCodexBin/
writeSupervisorRuntimeFixture helpers.

In `@test/codex-supervisor.test.ts`:
- Around line 475-529: Replace the brittle wall-clock assertions comparing
serialElapsedMs/overlapElapsedMs with explicit synchronization: remove
performance.now measurements and the expect(..toBeLessThan(..)) checks and
instead instrument the test flow (using promise latches/Deferreds or vitest fake
timers) to drive and observe the probe/restart phases deterministically; for
example, add latches inside your FakeManager/createFakeRuntime or FakeChild to
signal "probe started", "probe blocked", "restart requested" and "probe
finished", then coordinate supervisorTestApi?.requestChildRestart and
supervisorTestApi?.ensureLaunchableAccount so the test awaits those latches and
asserts ordering/concurrency (that the overlap path completes its
ensureLaunchableAccount before the serial path unblocks) using promise resolves
rather than elapsed ms. Ensure both occurrences (around
requestChildRestart/ensureLaunchableAccount pairs) are updated similarly.

In `@test/quota-probe.test.ts`:
- Around line 174-205: The test currently relies on a microtask hop (await
Promise.resolve()) which is brittle; change the fetch mock and test for
fetchCodexQuotaSnapshot to use an explicit "fetch started" latch: have the mock
expose or resolve a Promise as soon as it wires init.signal.addEventListener
(i.e., when the handler for "abort" is registered), await that latch in the test
before calling controller.abort(), then assert the pending promise rejects; this
ensures deterministic ordering regardless of internal awaits in
fetchCodexQuotaSnapshot and removes the fragile microtask race.

In `@test/storage.test.ts`:
- Around line 997-1044: The test only covers the rejection path; add a companion
test that verifies withAccountStorageTransaction reloads fresh storage after a
successful (non-throwing) transaction to catch a post-success AsyncLocalStorage
leak: create a test that uses saveAccounts to seed an initial account, call
withAccountStorageTransaction with a handler that calls persist(...) and returns
normally (no throw), then call saveAccounts again with a different account,
export via exportAccounts(exportPath) and assert the
exported.accounts[0].accountId is the post-success value; use the same helpers
(saveAccounts, withAccountStorageTransaction, exportAccounts, exportPath) and
vitest async patterns to keep the test deterministic.

---

Duplicate comments:
In `@test/codex-supervisor.test.ts`:
- Around line 334-374: Add a deterministic two-caller race to this test: create
a deferred Promise (resolve/reject handlers) and use
supervisorTestApi.withLockedManager with two concurrent callers against the same
runtime/storage path—have the first caller (using FakeManager and
createFakeRuntime) acquire the lock and await the deferred Promise to simulate a
held lock, then start the second caller which should block until you resolve the
deferred Promise; after resolving, assert the second caller runs only after the
first releases (e.g., order of results or timestamps) and keep the existing
stale-lock/EPERM unlink behavior; reference withLockedManager,
supervisorTestApi, FakeManager, createFakeRuntime and the unlinkSpy mock to
place the new concurrent-caller logic in this test while keeping it
deterministic via the deferred Promise and vitest async assertions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3118c672-3ac1-4af7-a929-61a9e9811578

📥 Commits

Reviewing files that changed from the base of the PR and between 36e6070 and a756824.

📒 Files selected for processing (11)
  • lib/preemptive-quota-scheduler.ts
  • lib/storage.ts
  • package.json
  • scripts/benchmark-session-supervisor.mjs
  • scripts/codex-supervisor.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/plugin-config.test.ts
  • test/preemptive-quota-scheduler.test.ts
  • test/quota-probe.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/preemptive-quota-scheduler.ts
  • lib/storage.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/preemptive-quota-scheduler.test.ts
  • test/quota-probe.test.ts
  • test/storage.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/plugin-config.test.ts
  • test/codex-supervisor.test.ts
🔇 Additional comments (10)
scripts/codex-supervisor.js (10)

504-508: lgtm: dedup key fix applied.

the dedup key now correctly excludes refreshToken and uses only stable identifiers (index, accountId, email). this addresses the prior review concern about token refresh causing duplicate entries.


1367-1381: lgtm: main supervision loop is well-bounded and properly cleans up.

the loop is bounded by MAX_SESSION_RESTARTS, abort signals are checked at each iteration, and the monitor controller is properly aborted before awaiting the monitor promise. good defensive design.


1747-1774: lgtm: test-only exports properly gated.

the __testOnly export is correctly guarded by NODE_ENV === "test", allowing comprehensive unit testing of internal functions without exposing them in production.


413-486: lgtm: file-based lock implementation is robust.

the lock acquisition loop at line 435 is properly bounded by abort signal checks and deadline enforcement. stale lock detection at lines 469-472 handles edge cases gracefully. the exclusive wx flag at line 441 provides atomic lock creation.


1175-1202: lgtm: safe directory traversal implementation.

listJsonlFiles correctly skips symbolic links at line 1189-1190, preventing potential symlink attacks. the iterative approach avoids stack overflow on deeply nested directories.


869-895: lgtm: account marking with proper fallbacks.

markAccountUnavailable correctly clears the probe cache at line 870 before marking, ensuring fresh data on reselection. the fallback from markRateLimitedWithReason to markAccountCoolingDown handles different manager API versions gracefully.


1212-1254: lgtm: safe session metadata extraction with proper cleanup.

extractSessionMeta correctly cleans up the stream and readline interface in the finally block (lines 1248-1251). the scan limit at line 1227 prevents excessive I/O on large session files.


1-40: regression tests already cover the key supervisor scenarios.

test/codex-supervisor.test.ts includes coverage for all major risks: line 334 handles stale lock cleanup with windows unlink failures (EPERM/EBUSY retries), line 446 validates snapshot cache ttl expiration, line 399 tests near-limit account skipping, and lines 505+ exercise overlapped handoff timing. line 614 also verifies probe batching doesn't add serial pause for degraded accounts, and line 660 confirms failed probe degradation. solid foundation.


39-39: cache isolation is already in place. test/codex-supervisor.test.ts:230 clears snapshotProbeCache in the afterEach hook, preventing cross-test pollution. no action needed.


1706-1708: no issue found. runCodexSupervisorIfEnabled is async at line 1683, so returning forwardToRealCodex(codexBin, initialArgs) at line 1707 is correct—it returns a Promise that resolves to an exit code. the caller at scripts/codex.js:529 properly awaits it, so supervisedExitCode receives the resolved number, not the Promise object. this is standard async/await pattern.

			> Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/storage.ts (2)

1767-1776: ⚠️ Potential issue | 🟠 Major

legacy migration still escapes the captured storage binding.

lib/storage.ts:1767-1776 captures storagePath for the main load, but migrateLegacyProjectStorageIfNeeded() at lib/storage.ts:1004-1102 still re-reads getStoragePathState() after cleanupStaleRotatingBackupArtifacts(path) has already awaited, and loadAccounts() at lib/storage.ts:1603-1605 still passes raw saveAccounts. if the storage path flips while loadAccountsInternal() is in flight, migration can load/delete files from one project and persist them into another. thread the captured storage state/path into the migration helper too, and add a vitest path-drift regression around test/storage.test.ts:2427-2894.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1767 - 1776, The migration helper currently
re-reads storage state causing path drift: update
migrateLegacyProjectStorageIfNeeded to accept the captured storagePath/state
(from loadAccountsInternal) instead of calling getStoragePathState() internally,
thread that new parameter through loadAccountsInternal and its caller
loadAccounts (which currently passes saveAccounts) and ensure
cleanupStaleRotatingBackupArtifacts(path) and
getIntentionalResetMarkerPath(path) use the same captured path; update all call
sites to pass the captured path/state and adjust types/signatures accordingly,
and add a vitest regression (in test/storage.test.ts around the 2427-2894 range)
that simulates a storagePath flip while loadAccountsInternal is in-flight to
assert migration uses the original captured path and does not move files between
projects.

2124-2145: ⚠️ Potential issue | 🔴 Critical

combined transactions still route flagged writes through the mutable global path.

lib/storage.ts:2124-2145 binds account saves to the captured storagePath, but saveFlaggedAccountsUnlocked() still resolves its target from getFlaggedAccountsPath() at lib/storage.ts:2401-2405. if setStoragePathDirect() drifts inside withAccountAndFlaggedStorageTransaction(), the account file stays on project a while the flagged file is written into project b, so this helper no longer gives you an atomic paired update. capture the flagged path up front as well, and add a vitest drift regression next to test/storage.test.ts:1153-1256.

possible direction
 async function saveFlaggedAccountsUnlocked(
-	storage: FlaggedAccountStorageV1,
+	storage: FlaggedAccountStorageV1,
+	storagePath = getFlaggedAccountsPath(),
 ): Promise<void> {
-	const path = getFlaggedAccountsPath();
+	const path = storagePath;
 	const markerPath = getIntentionalResetMarkerPath(path);
 	// ...
 }
 	return withStorageLock(async () => {
 		const storagePath = getStoragePath();
+		const flaggedStoragePath = getFlaggedAccountsPath();
 		const state = {
 			snapshot: await loadAccountsInternal(
 				(storage) => saveAccountsUnlocked(storage, storagePath),
 				storagePath,
 			),
@@
-			await saveFlaggedAccountsUnlocked(flaggedStorage);
+			await saveFlaggedAccountsUnlocked(flaggedStorage, flaggedStoragePath);

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2124 - 2145, The combined-transaction helper
binds account file writes to the captured storagePath but still lets
saveFlaggedAccountsUnlocked() resolve its path via getFlaggedAccountsPath() at
write time, allowing setStoragePathDirect() to drift the flagged file to a
different project and breaking atomicity; fix by capturing the flagged path
up-front inside withAccountAndFlaggedStorageTransaction (capture result of
getFlaggedAccountsPath() alongside storagePath when calling
loadAccountsInternal/saveAccountsUnlocked and pass that captured flaggedPath
into saveFlaggedAccountsUnlocked instead of letting it resolve internally),
update functions saveFlaggedAccountsUnlocked/getFlaggedAccountsPath usages to
accept an explicit path parameter, and add a vitest regression next to
test/storage.test.ts:1153-1256 that exercises setStoragePathDirect() drift
during the persist() code path to ensure paired writes remain atomic.
♻️ Duplicate comments (3)
test/codex-bin-wrapper.test.ts (1)

637-672: ⚠️ Potential issue | 🟠 Major

assert a supervisor-only side effect in these two cases.

these assertions are still satisfied by the normal wrapper path in scripts/codex.js:528-542: one forwarded invocation and one auto-sync. that means test/codex-bin-wrapper.test.ts:637-672 can pass even if runCodexSupervisorIfEnabled() is bypassed. reuse the supervisor\n marker check from test/codex-bin-wrapper.test.ts:618-635, or add another supervisor-only signal here.

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-bin-wrapper.test.ts` around lines 637 - 672, Tests assert a
supervisor-only side effect but currently pass via the normal wrapper path;
update the two cases in test/codex-bin-wrapper.test.ts (the "still auto-syncs
once..." and "supports interactive..." tests) to require a supervisor-only
marker so they fail if runCodexSupervisorIfEnabled() is bypassed—either reuse
the "supervisor\n" marker check used earlier in the file or add a new
supervisor-only signal written only by the supervisor wrapper, then assert that
marker (e.g., readFileSync(markerPath) or a new supervisor marker file) is
present in addition to the existing forwarded/auto-sync checks to ensure the
supervisor path actually executed.
test/codex-supervisor.test.ts (2)

870-914: ⚠️ Potential issue | 🟠 Major

drop wall-clock performance assertions; assert overlap/batching via explicit probe ordering.

lines 899-914 (test/codex-supervisor.test.ts:899-914) assert elapsedMs < 170 using performance.now(). this is brittle on noisy runners and does not prove batching semantics directly.

proposed deterministic pattern
- const startedAt = performance.now();
- const result = await supervisorTestApi?.ensureLaunchableAccount(
+ const d1 = createDeferred<void>();
+ const d2 = createDeferred<void>();
+ const d3 = createDeferred<void>();
+ const started: string[] = [];
+ const resultPromise = supervisorTestApi?.ensureLaunchableAccount(
    runtime,
    {},
    undefined,
    { probeTimeoutMs: 250 },
  );
- const elapsedMs = performance.now() - startedAt;
+
+ // assert concurrent probe startup/ordering via latches or onFetchStart hooks
+ // then release all degraded gates together
+ d1.resolve(); d2.resolve(); d3.resolve();
+ const result = await resultPromise;
@@
- expect(elapsedMs).toBeLessThan(170);

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-supervisor.test.ts` around lines 870 - 914, The test is brittle
because it asserts wall-clock timing (performance.now elapsedMs < 170); instead
remove the elapsedMs/performance.now assertions and prove batching
deterministically by instrumenting the probe path (e.g., FakeManager or the
probe handler used by createFakeRuntime) to record probe start/finish order or
sequence IDs for each account, then assert that the probes for "degraded-1",
"degraded-2", and "degraded-3" were started concurrently/batched (their start
sequence IDs are the same batch or their start timestamps fall within one
logical batch window) and that "healthy" was selected as the result; update the
test that calls supervisorTestApi?.ensureLaunchableAccount (and uses
quotaProbeDelayMs/probeTimeoutMs) to assert on those recorded probe
order/sequence markers rather than elapsedMs.

529-584: ⚠️ Potential issue | 🟠 Major

replace real sleep-based lock coordination with fake-timer or latch-driven sequencing.

line 574 (test/codex-supervisor.test.ts:574) uses sleep(40) as a correctness gate. this is scheduler-sensitive and can pass/fail for timing reasons instead of lock correctness.

proposed deterministic fix
 it("serializes concurrent callers behind the supervisor storage lock", async () => {
+  vi.useFakeTimers();
   process.env.CODEX_AUTH_CLI_SESSION_LOCK_WAIT_MS = "1000";
   process.env.CODEX_AUTH_CLI_SESSION_LOCK_POLL_MS = "10";
@@
-  await supervisorTestApi?.sleep(40);
+  await vi.advanceTimersByTimeAsync(40);
   expect(secondHasLock).toBe(false);
   releaseFirst?.();
+  await vi.runAllTimersAsync();

As per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-supervisor.test.ts` around lines 529 - 584, The test uses a real
sleep (supervisorTestApi.sleep(40)) to gate correctness; replace that
timing-based check with a deterministic latch check: after scheduling the second
with supervisorTestApi.withLockedManager, assert the secondEntered promise is
still pending (e.g. await Promise.race([secondEntered,
Promise.resolve("not-entered")]) and expect the result to be "not-entered")
instead of sleeping, then releaseFirst and await secondEntered; locate the code
around supervisorTestApi.withLockedManager, the firstEntered/secondEntered
promises and releaseFirst latch to implement this deterministic pending-check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/storage.ts`:
- Around line 816-822: The comparator that chooses latest valid snapshot
currently falls back to descending index, preferring older backups on equal
mtime; change the tie-breaker in the sorting used to compute the latest snapshot
(the map/filter/sort chain in lib/storage.ts that produces the [0]?.snapshot
result) to use ascending index (i.e., prefer lower index so primary/.bak take
precedence) and add a vitest that mirrors the existing storage tests (next to
test/storage.test.ts:3170-3370) which creates multiple snapshots with identical
mtimeMs and asserts that the chosen latestValidPath prefers the earlier-priority
entries (primary, wal, .bak, .bak.1, .bak.2) to lock in correct behavior. Ensure
the test runs deterministically on Windows/coarse-mtime filesystems.

In `@lib/ui/copy.ts`:
- Line 101: The experimental help text is now misnumbered because the new toggle
experimentalSessionSupervisor was inserted before the existing items; update the
static help string (near the entry that lists "1 sync | 2 backup | 3 guard") in
lib/ui/copy.ts so the numbering reflects the new ordering (e.g., insert the new
item's number or renumber subsequent items), or replace the hardcoded sequence
with a generated/templated sequence if preferred; locate the help string and the
experimentalSessionSupervisor key to adjust the copy consistently.

In `@scripts/codex-supervisor.js`:
- Around line 1618-1637: The prewarm task started by
maybeStartPreparedResumeSelection is using the outer supervisor signal and can
continue touching storage after runInteractiveSupervision returns; ensure the
prewarm work is tied to the child/session lifetime by creating and passing a
session-specific cancellation signal (or AbortController) into
maybeStartPreparedResumeSelection (use the same session-bound signal that
runInteractiveSupervision watches), and/or explicitly cancel
preparedResumeSelectionPromise when runInteractiveSupervision is about to return
with no restartDecision; update references to preparedResumeSelectionStarted and
preparedResumeSelectionPromise to respect and propagate that session signal so
the prewarm task is aborted when the session exits without rotating.
- Around line 1592-1601: The catch branch that handles error?.name ===
"QuotaProbeUnavailableError" currently continues immediately and causes a hot
loop; change it to await a backoff delay before retrying and add a
consecutive-failure guard: introduce a consecutiveProbeFailures counter
(increment on QuotaProbeUnavailableError, reset to 0 on successful probe), use
an exponential or capped backoff (e.g., await sleep(Math.min(base *
2**consecutiveProbeFailures, max))) instead of immediate continue, and when
consecutiveProbeFailures exceeds a safe threshold log an error and either
escalate (throw or break) or apply a longer cooldown; keep checks for
monitorController.signal.aborted and AbortError as-is. Ensure the sleep
implementation is used the same way as the normal poll sleep that occurs later
in this loop so you don't bypass the intended I/O pacing.
- Around line 1873-1894: The code returns the promise from
runCodexSupervisorWithRuntime without awaiting, so the finally block removes
SIGINT/SIGTERM handlers (abort) before the supervision finishes; change the
return to await the call to runCodexSupervisorWithRuntime({ codexBin, rawArgs,
buildForwardArgs, forwardToRealCodex, runtime, signal: controller.signal }) so
the try block waits for the supervision to settle and the finally uninstalls
handlers only after completion; ensure loadSupervisorRuntime() and
controller.signal usage remain unchanged and keep the AbortError early-return
behavior.

In `@scripts/codex.js`:
- Around line 529-540: The supervisor exit should not always fall through to
autoSync; change the wrapper around
runCodexSupervisorIfEnabled()/autoSyncManagerActiveSelectionIfEnabled() so that
if runCodexSupervisorIfEnabled returns the abort sentinel (e.g.
supervisedExitCode === 130) you immediately return without calling
autoSyncManagerActiveSelectionIfEnabled(), and for paths where the supervisor
forwarded to the real binary via the forwardToRealCodex callback move or invoke
the pre-sync logic inside that callback instead of running it unconditionally
afterwards; update references to runCodexSupervisorIfEnabled,
autoSyncManagerActiveSelectionIfEnabled, forwardToRealCodex and
supervisedExitCode accordingly and add two vitest regressions in
test/codex-bin-wrapper.test.ts asserting (a) the supervisor early-return/forward
path does not trigger autoSync and (b) the abort (130) path exits before any
sync is performed.

In `@test/codex-bin-wrapper.test.ts`:
- Around line 67-115: The tests leak a real-user lock because
writeSupervisorRuntimeFixture does not stub dist/lib/storage.js; update
writeSupervisorRuntimeFixture to create a dist/lib/storage.js that exports
getStoragePath() (returning a path under the fixtureRoot) so the supervisor uses
a fixture-local lock, or alternatively set CODEX_HOME/HOME to fixtureRoot for
the spawned worker processes; modify the fixture writer to produce the
storage.js stub (export function getStoragePath() { ... }) or ensure environment
variables are forced to fixtureRoot before tests that spawn
scripts/codex-supervisor.js so locks remain hermetic.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 1767-1776: The migration helper currently re-reads storage state
causing path drift: update migrateLegacyProjectStorageIfNeeded to accept the
captured storagePath/state (from loadAccountsInternal) instead of calling
getStoragePathState() internally, thread that new parameter through
loadAccountsInternal and its caller loadAccounts (which currently passes
saveAccounts) and ensure cleanupStaleRotatingBackupArtifacts(path) and
getIntentionalResetMarkerPath(path) use the same captured path; update all call
sites to pass the captured path/state and adjust types/signatures accordingly,
and add a vitest regression (in test/storage.test.ts around the 2427-2894 range)
that simulates a storagePath flip while loadAccountsInternal is in-flight to
assert migration uses the original captured path and does not move files between
projects.
- Around line 2124-2145: The combined-transaction helper binds account file
writes to the captured storagePath but still lets saveFlaggedAccountsUnlocked()
resolve its path via getFlaggedAccountsPath() at write time, allowing
setStoragePathDirect() to drift the flagged file to a different project and
breaking atomicity; fix by capturing the flagged path up-front inside
withAccountAndFlaggedStorageTransaction (capture result of
getFlaggedAccountsPath() alongside storagePath when calling
loadAccountsInternal/saveAccountsUnlocked and pass that captured flaggedPath
into saveFlaggedAccountsUnlocked instead of letting it resolve internally),
update functions saveFlaggedAccountsUnlocked/getFlaggedAccountsPath usages to
accept an explicit path parameter, and add a vitest regression next to
test/storage.test.ts:1153-1256 that exercises setStoragePathDirect() drift
during the persist() code path to ensure paired writes remain atomic.

---

Duplicate comments:
In `@test/codex-bin-wrapper.test.ts`:
- Around line 637-672: Tests assert a supervisor-only side effect but currently
pass via the normal wrapper path; update the two cases in
test/codex-bin-wrapper.test.ts (the "still auto-syncs once..." and "supports
interactive..." tests) to require a supervisor-only marker so they fail if
runCodexSupervisorIfEnabled() is bypassed—either reuse the "supervisor\n" marker
check used earlier in the file or add a new supervisor-only signal written only
by the supervisor wrapper, then assert that marker (e.g.,
readFileSync(markerPath) or a new supervisor marker file) is present in addition
to the existing forwarded/auto-sync checks to ensure the supervisor path
actually executed.

In `@test/codex-supervisor.test.ts`:
- Around line 870-914: The test is brittle because it asserts wall-clock timing
(performance.now elapsedMs < 170); instead remove the elapsedMs/performance.now
assertions and prove batching deterministically by instrumenting the probe path
(e.g., FakeManager or the probe handler used by createFakeRuntime) to record
probe start/finish order or sequence IDs for each account, then assert that the
probes for "degraded-1", "degraded-2", and "degraded-3" were started
concurrently/batched (their start sequence IDs are the same batch or their start
timestamps fall within one logical batch window) and that "healthy" was selected
as the result; update the test that calls
supervisorTestApi?.ensureLaunchableAccount (and uses
quotaProbeDelayMs/probeTimeoutMs) to assert on those recorded probe
order/sequence markers rather than elapsedMs.
- Around line 529-584: The test uses a real sleep (supervisorTestApi.sleep(40))
to gate correctness; replace that timing-based check with a deterministic latch
check: after scheduling the second with supervisorTestApi.withLockedManager,
assert the secondEntered promise is still pending (e.g. await
Promise.race([secondEntered, Promise.resolve("not-entered")]) and expect the
result to be "not-entered") instead of sleeping, then releaseFirst and await
secondEntered; locate the code around supervisorTestApi.withLockedManager, the
firstEntered/secondEntered promises and releaseFirst latch to implement this
deterministic pending-check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 8ca62a89-c0a1-482b-a271-33f360279fc9

📥 Commits

Reviewing files that changed from the base of the PR and between a756824 and 9d84cf8.

⛔ Files ignored due to path filters (1)
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (16)
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
  • lib/preemptive-quota-scheduler.ts
  • lib/schemas.ts
  • lib/storage.ts
  • lib/ui/copy.ts
  • package.json
  • scripts/benchmark-session-supervisor.mjs
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/preemptive-quota-scheduler.test.ts
  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/schemas.ts
  • lib/preemptive-quota-scheduler.ts
  • lib/ui/copy.ts
  • lib/storage.ts
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/preemptive-quota-scheduler.test.ts
  • test/storage.test.ts
  • test/codex-supervisor.test.ts

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/storage.ts (1)

2132-2145: ⚠️ Potential issue | 🟠 Major

capture the flagged path in the combined transaction too.

lib/storage.ts:2138-2140 pins account writes to the opening storagePath, but flagged writes still go through saveFlaggedAccountsUnlocked(), which resolves getFlaggedAccountsPath() at call time. if a caller flips setStoragePathDirect() inside withAccountAndFlaggedStorageTransaction(), the account file is written under the original root and the flagged file under the new root, so the transaction is no longer atomic. capture the flagged path up front and pass it through the flagged save call, then add a vitest beside test/storage.test.ts:1154-1257 for this drift case.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2132 - 2145, Persist currently writes accounts
to the captured storagePath but calls saveFlaggedAccountsUnlocked() which
resolves getFlaggedAccountsPath() at call-time, allowing
setStoragePathDirect()/withAccountAndFlaggedStorageTransaction() to flip roots
and break atomicity; fix by capturing the flagged path at the start of persist
(call getFlaggedAccountsPath() once into flaggedPath) and pass that flaggedPath
into a new or overloaded saveFlaggedAccountsUnlocked(flaggedStorage,
flaggedPath) so both account and flagged writes use the same root for both
commit and rollback; update withAccountAndFlaggedStorageTransaction(), persist,
and any helpers to accept the explicit flaggedPath, add a vitest beside
test/storage.test.ts covering the drift case (flip storage path between account
and flagged save) and ensure the new save/queue logic handles EBUSY/429 retry
semantics as per lib/** IO/concurrency guidelines and reference the affected
tests in the change.
♻️ Duplicate comments (1)
lib/storage.ts (1)

1767-1776: ⚠️ Potential issue | 🟠 Major

pin legacy migration to the captured transaction root.

lib/storage.ts:1774-1776 still calls migrateLegacyProjectStorageIfNeeded() with no path context. that helper reads getStoragePathState() internally, so a concurrent setStoragePath() can load and delete legacy files from one root while the bound persistMigration writes into another. thread the captured storage path state into the migration helper too, and add a vitest near test/storage.test.ts:1047-1106 that reproduces path drift during transaction startup.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1767 - 1776, The migration call in
loadAccountsInternal is unsafe because migrateLegacyProjectStorageIfNeeded()
reads getStoragePathState() itself; change loadAccountsInternal to pass the
captured storage path/state into migrateLegacyProjectStorageIfNeeded (update the
helper signature to accept the path or storage state) and use that value instead
of letting the helper call getStoragePathState(), so concurrent setStoragePath()
cannot cause path drift; update references to
migrateLegacyProjectStorageIfNeeded and ensure persistMigration is invoked
against the provided path/state (functions: loadAccountsInternal,
migrateLegacyProjectStorageIfNeeded, getStoragePathState, setStoragePath,
persistMigration) and add a vitest near test/storage.test.ts:1047-1106 that
reproduces concurrent path drift during transaction startup to assert the
migration used the captured path; also run/update related tests to handle
Windows IO/EAGAIN/EBUSY concurrency behaviors as noted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codex-supervisor.js`:
- Around line 1348-1363: The scan is unbounded when sessionId is set but
rolloutPathHint is missing because the files filter uses (sessionId ? true :
entry.mtimeMs >= sinceMs - 2_000), causing
listJsonlFiles/readSessionBindingEntry to iterate every JSONL; add a cached
mapping from sessionId→rolloutPath so subsequent supervisor iterations can
restrict the file list: introduce a module-level Map (e.g.,
sessionIdToRolloutPath) and, when matchSessionBindingEntry finds a binding with
a rollout path, store sessionId→rolloutPath; then, in the files
construction/filtering (the code building files using
listJsonlFiles/readSessionBindingEntry and variables sessionEntries, sinceMs,
rolloutPathHint, sessionId), short-circuit to only read files under the cached
rolloutPath (or persist the mapping in the binding state) when rolloutPathHint
is null to avoid full directory scans.

In `@scripts/codex.js`:
- Around line 68-72: The isSupervisorInteractiveCommand function currently only
inspects rawArgs[0]; modify it to iterate rawArgs from start and skip leading
option tokens (those starting with "-" or "--"), correctly handling flags that
consume a value (e.g., "--config", "-c" and any other known value-consuming
flags) by skipping their next token, then check the first non-option token and
return true if it equals "resume" or "fork"; update the same logic used around
the post-supervisor sync path (the block at ~545-551) so it uses this new
scanner instead of rawArgs[0], and add a vitest in
test/codex-bin-wrapper.test.ts (adjacent to lines 690-707) that verifies a
command like "codex -c foo=bar resume session-123" (and a long‑option variant)
is treated as interactive.

In `@test/codex-supervisor.test.ts`:
- Around line 958-1002: The test "batches probe work so multiple degraded
accounts do not add serial pause" currently uses a fragile wall-clock assertion
(elapsedMs < 170); change the test to avoid real-time timing by either enabling
vi.useFakeTimers()/fake timers around the call to
supervisorTestApi?.ensureLaunchableAccount and advancing timers
deterministically based on quotaProbeDelayMs and probeTimeoutMs, or remove the
elapsedMs assertion and instead assert parallelism by instrumenting FakeManager
(e.g., a concurrent-fetch counter inside FakeManager.fetch or createFakeRuntime)
to verify all degraded accounts were probed concurrently and that
manager.activeIndex ends up at 3; update the test to rely on these deterministic
checks rather than real scheduler timing.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 2132-2145: Persist currently writes accounts to the captured
storagePath but calls saveFlaggedAccountsUnlocked() which resolves
getFlaggedAccountsPath() at call-time, allowing
setStoragePathDirect()/withAccountAndFlaggedStorageTransaction() to flip roots
and break atomicity; fix by capturing the flagged path at the start of persist
(call getFlaggedAccountsPath() once into flaggedPath) and pass that flaggedPath
into a new or overloaded saveFlaggedAccountsUnlocked(flaggedStorage,
flaggedPath) so both account and flagged writes use the same root for both
commit and rollback; update withAccountAndFlaggedStorageTransaction(), persist,
and any helpers to accept the explicit flaggedPath, add a vitest beside
test/storage.test.ts covering the drift case (flip storage path between account
and flagged save) and ensure the new save/queue logic handles EBUSY/429 retry
semantics as per lib/** IO/concurrency guidelines and reference the affected
tests in the change.

---

Duplicate comments:
In `@lib/storage.ts`:
- Around line 1767-1776: The migration call in loadAccountsInternal is unsafe
because migrateLegacyProjectStorageIfNeeded() reads getStoragePathState()
itself; change loadAccountsInternal to pass the captured storage path/state into
migrateLegacyProjectStorageIfNeeded (update the helper signature to accept the
path or storage state) and use that value instead of letting the helper call
getStoragePathState(), so concurrent setStoragePath() cannot cause path drift;
update references to migrateLegacyProjectStorageIfNeeded and ensure
persistMigration is invoked against the provided path/state (functions:
loadAccountsInternal, migrateLegacyProjectStorageIfNeeded, getStoragePathState,
setStoragePath, persistMigration) and add a vitest near
test/storage.test.ts:1047-1106 that reproduces concurrent path drift during
transaction startup to assert the migration used the captured path; also
run/update related tests to handle Windows IO/EAGAIN/EBUSY concurrency behaviors
as noted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 080af616-f227-4b8d-80be-b179df052d4c

📥 Commits

Reviewing files that changed from the base of the PR and between 9d84cf8 and c080cd0.

📒 Files selected for processing (9)
  • lib/config.ts
  • lib/storage.ts
  • lib/ui/copy.ts
  • package.json
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/ui/copy.ts
  • lib/config.ts
  • lib/storage.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/storage.test.ts
  • test/codex-supervisor.test.ts
  • test/codex-bin-wrapper.test.ts
🔇 Additional comments (20)
scripts/codex-supervisor.js (10)

1-7: lgtm on imports and module structure.

clean separation of node builtins. the dynamic import pattern at scripts/codex-supervisor.js:169-183 handles missing modules gracefully for environments where the dist build isn't present.


41-100: abort handling and cleanup look solid now.

the sleep function at lines 41-61 properly removes the abort listener in finish(). abortablePromise at lines 78-100 now calls cleanup() in finally on the original promise, fixing the memory leak flagged in previous reviews.


328-353: safeUnlink retry logic handles windows antivirus edge cases well.

the EPERM/EBUSY retry with exponential backoff at scripts/codex-supervisor.js:341-350 mirrors common windows filesystem contention patterns. the max attempts (4) and base delay (25ms) seem reasonable for transient locks.


419-492: supervisor lock implementation looks robust.

TTL-based stale detection at lines 391-417, retry on EEXIST/EPERM/EBUSY at lines 471-478, and proper cleanup in finally at line 464. good foundation for cross-process coordination.


510-520: dedup key no longer includes refreshToken.

previous review flagged that including refreshToken in the dedup key could cause duplicates during token refresh. now using only stable identifiers (index, accountId, email) at lines 510-514.


802-872: race window in probe cache is closed.

the placeholder promise is now set at lines 826-837 before starting async work, preventing duplicate fetches under concurrency. pendingResolver/pendingRejecter propagate results to waiters correctly.


1424-1449: signal escalation now documents the windows behavior.

line 1446 explains that SIGTERM is already forceful on windows but SIGKILL is kept as the unix fallback. the comment addresses the previous review feedback about redundant waits.


1548-1684: monitor loop paces probe outages correctly.

lines 1605-1613 now sleep for pollMs after QuotaProbeUnavailableError before continuing, fixing the hot-loop concern from the previous review. the sleep also respects the monitor abort signal.


1631-1651: prewarm is cancelled when session exits without rotating.

preparedResumeSelectionController is created per-launch at line 1511 and connected to the outer signal at lines 1512-1516. line 1638 passes preparedResumeSelectionController.signal to the prewarm. on clean exit without restart, line 1764 aborts the controller and line 1766 awaits the promise to ensure cleanup.


1879-1912: signal handlers now stay installed for the full supervision duration.

line 1895 uses return await runCodexSupervisorWithRuntime(...) so the finally block at lines 1908-1911 only removes SIGINT/SIGTERM handlers after supervision settles. this fixes the critical issue from the previous review.

test/codex-supervisor.test.ts (10)

1-7: imports and test setup look clean.

vitest runner, node:events for fake child, perf_hooks for timing assertions. the __testOnly import pattern keeps the test surface explicit.


24-41: retry logic for temp dir cleanup handles windows filesystem contention.

ENOTEMPTY/EPERM/EBUSY retries at test/codex-supervisor.test.ts:25-40 with exponential backoff prevent flaky test teardown on windows when antivirus or indexing holds handles. as per coding guidelines, tests should handle windows filesystem behavior.


49-55: deferred helper enables deterministic async coordination.

the createDeferred<T>() pattern at lines 49-55 is used throughout for promise latches, avoiding wall-clock timing dependencies in concurrency tests.


267-313: boundary tests for 200-line scan limit are solid.

test at lines 267-291 verifies metadata is found on the 200th line; test at lines 293-313 verifies it's missed on line 201. good edge case coverage.


394-433: signal escalation tests cover both windows and non-windows paths.

test at lines 394-414 verifies win32 skips SIGINT and goes straight to SIGTERM→SIGKILL. test at lines 416-433 verifies linux sends SIGINT before escalating. addresses the missing non-win32 coverage from previous review.


529-584: concurrent lock serialization test is deterministic.

uses promise latches (resolveFirstEntered, releaseFirst, secondEntered) at lines 536-544 instead of wall-clock sleeps. asserts ordering via order array at line 583. this addresses the previous review request for concurrent lock regression testing.


656-742: test verifies prewarm abort signal fires on clean exit.

at test/codex-supervisor.test.ts:741, the test asserts preparedProbeSignal?.aborted is true after a clean child exit without rotation. this confirms the fix for the cancelled-prewarm concern raised in the past review.


773-812: in-flight probe deduplication test uses latches correctly.

nearLimitGate and probeStarted deferreds at lines 777-778 coordinate the test without wall-clock timing. asserts that calls array shows only one fetch despite two concurrent callers.


1114-1166: pacing test validates probe unavailable backoff.

fake timers at line 1118, poll interval set to 20ms at line 1119, advances 100ms at line 1163. the test verifies the supervisor completes without hot-looping. good regression for the previously flagged issue.


1057-1112: monitor failure test has 10s timeout which is long.

the { timeout: 10_000 } at line 1059 suggests the test might occasionally be slow. if the test consistently finishes quickly, consider lowering this or adding a note. otherwise, lgtm on the assertion that child exit code is preserved when the monitor throws.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codex-supervisor.js`:
- Around line 465-537: The lock currently writes a fixed expiresAt in
withSupervisorStorageLock (written via handle.writeFile) but never refreshes it
while fn() runs, so implement a heartbeat that periodically updates the lock's
expiresAt (or re-writes the lock file contents with a new acquiredAt/expiresAt)
for the duration of the critical section, or alternatively perform an ownership
check before removing the lock in isSupervisorLockStale/safeUnlink to ensure the
process still owns it; run the heartbeat in a background timer started before
invoking fn() and cleared in the finally that unlinks the lock so the lease
never appears stale while fn() is executing.
- Around line 1740-1769: The monitorFailure is only logged but the supervisor
still returns the child's exit code (variable result), allowing a successful
exit code despite monitor faults; after detecting monitorFailure (the block
using monitorFailure, monitorController, monitorActive, monitorPromise and
relaunchNotice) override the returned result to indicate failure instead of
leaving the child's exitCode intact — e.g. set result.exitCode to a non-zero
value and attach the error (or set a result.error) so the supervised run fails
when monitorFailure exists and signal?.aborted is false; update the handling
where result is constructed (the Promise that listens on child 'error' and
'exit') so callers consuming result or normalizeExitCode see the
supervisor-failure state.
- Around line 147-166: The bypass logic in isSupervisorAccountGateBypassCommand
currently returns false if no primary command is found and also scans all tokens
(allowing "codex exec -- --version" to bypass); change it to only inspect
top-level flags that appear before the primary command (and before any "--"
separator) or, if no primary exists, only the initial tokens (leading flags) for
"--help", "-h", or "--version". Use findPrimaryCodexCommand to locate the
primary command and its index (or treat index as rawArgs.length if missing),
stop scanning when you hit "--" or at the primary index, and only check tokens
in that prefix; keep the explicit primaryCommand name checks for
"auth","help","version"; add regression tests for both "codex --help"/"codex
--version" and "codex exec -- --version" shapes.
- Around line 1072-1085: The catch block that collapses probe failures into an
account-level "probe-error" evaluation must not mark accounts unhealthy when the
quota probe itself is down; detect QuotaProbeUnavailableError (or whatever
sentinel the probe throws) in the catch inside the quota probe invocation and
treat it like the monitor loop: do not return an evaluation that triggers
markAccountUnavailable/rotate, but instead surface the probe-unavailable
condition to the supervisor retry/backoff logic (either by rethrowing the
QuotaProbeUnavailableError or returning a non-mutating evaluation such as
rotate: false and a distinct reason like "probe-unavailable"); update the catch
around the probe (the try/catch that currently returns { evaluation: { rotate:
true, reason: "probe-error" } }) to special-case QuotaProbeUnavailableError, and
add a vitest regression that simulates a transient probe outage ensuring
markAccountUnavailable is not called.
- Around line 328-330: The current buildResumeArgs(buildForwardArgs)
implementation discards the caller's original CLI flags by returning only
["resume", sessionId]; instead, preserve and reuse the original argv and splice
in the new session id at the command boundary provided by codex-routing.js, then
pass that reconstructed argv into buildForwardArgs so caller-supplied flags are
kept for resumed children. Locate buildResumeArgs and the rotated relaunch site
that calls it, retrieve the original argv used for the child launch, use the
command boundary logic from codex-routing.js (the command boundary helper in
that module) to replace the session token with the new sessionId (and append any
required restart marker), and return buildForwardArgs(reconstructedArgv).

In `@test/storage.test.ts`:
- Around line 1164-1267: The tests that exercise
withAccountAndFlaggedStorageTransaction need to also verify flagged-storage was
refreshed: after each combined transaction case (both the throwing and
successful handlers) call loadFlaggedAccounts (or saveFlaggedAccounts/readback)
and assert the flagged accounts/state reflects the post-transaction write (or
was reloaded) rather than stale data; update the two specs that currently only
use saveAccounts/exportAccounts to include assertions using loadFlaggedAccounts
and/or saveFlaggedAccounts to deterministically prove flagged storage refresh
behavior for withAccountAndFlaggedStorageTransaction.
- Around line 964-1006: Add a deterministic vitest regression that proves the
mismatch guard in lib/storage.ts does not treat windows-equivalent paths as
different: write a test using withAccountStorageTransaction,
setStoragePathDirect and exportAccounts that starts a transaction on one path
(testStoragePath), then within the transaction call setStoragePathDirect with a
windows-equivalent alias of the same path (e.g., different case and backslashes
vs slashes or using path.win32.join/normalize/resolve to produce the alias),
then call exportAccounts and assert it does NOT throw and that export file
exists; ensure the test uses deterministic temp paths (testWorkDir/exportPath)
and mirrors the existing fail-fast test pattern so it verifies the guard uses
path normalization/canonicalization rather than raw string inequality.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f79ba2a-e9bb-4921-9d31-213760265479

📥 Commits

Reviewing files that changed from the base of the PR and between c080cd0 and 34db83e.

📒 Files selected for processing (8)
  • lib/config.ts
  • lib/preemptive-quota-scheduler.ts
  • scripts/codex-routing.js
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/preemptive-quota-scheduler.ts
  • lib/config.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/storage.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
🪛 ast-grep (0.41.1)
test/storage.test.ts

[warning] 998-1002: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
storage path mismatch: transaction path is ${escapeRegExp( testStoragePath, )}, active path is ${escapeRegExp(secondaryStoragePath)},
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (5)
test/storage.test.ts (1)

3383-3415: nice deterministic tie-break regression for equal mtime snapshots.

test/storage.test.ts:3383-3415 cleanly validates the intended tie-break behavior implemented in lib/storage.ts:812-823. this is stable and deterministic.

lib/preemptive-quota-scheduler.ts (1)

27-29: good split between the 5h and 7d defaults.

lib/preemptive-quota-scheduler.ts:27-29 and lib/preemptive-quota-scheduler.ts:153-154 now keep the aggressive 5h threshold separate from the 7d fallback, which removes the earlier config drift.

Also applies to: 153-154

scripts/codex.js (1)

535-557: good sync placement around supervisor forwarding.

scripts/codex.js:536-557 runs startup sync inside the real forwarding callback and gates the post-supervisor sync on supervisorDidForward. that keeps the abort fast-path intact while avoiding the earlier double-sync ordering problem.

test/codex-bin-wrapper.test.ts (1)

116-124: good fixture-local storage path for spawned supervisor tests.

test/codex-bin-wrapper.test.ts:116-124 stubs getStoragePath() under the fixture root, so the supervisor lock file stays hermetic instead of touching the real user home during wrapper tests.

test/codex-supervisor.test.ts (1)

481-630: good windows/concurrency lock coverage.

test/codex-supervisor.test.ts:481-630 adds stale-lock cleanup, transient eperm/ebusy, and concurrent-caller coverage against the supervisor lock. that materially improves the regression net around the new cross-process coordination path.

@ndycode
Copy link
Owner Author

ndycode commented Mar 20, 2026

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lib/storage.ts (2)

1949-1961: ⚠️ Potential issue | 🟠 Major

pin .gitignore writes to the captured project root too.

lib/storage.ts:1961 now saves to the captured file path, but ensureGitignore() still prefers the ambient currentProjectRoot at lib/storage.ts:298-309. if a transaction starts on project a and then calls setStoragePath(projectB) before persist(), the account file goes back to project a while .gitignore is updated in project b. pass a captured project-root hint through saveAccountsUnlocked()/ensureGitignore() and add a vitest that drifts between two git roots before persist().

possible fix
-async function ensureGitignore(storagePath: string): Promise<void> {
-	const state = getStoragePathState();
-	if (!state.currentStoragePath) return;
+async function ensureGitignore(
+	storagePath: string,
+	projectRootHint?: string | null,
+): Promise<void> {
 	const configDir = dirname(storagePath);
 	const inferredProjectRoot = dirname(configDir);
-	const candidateRoots = [state.currentProjectRoot, inferredProjectRoot].filter(
+	const candidateRoots = [projectRootHint, inferredProjectRoot].filter(
 		(root): root is string => typeof root === "string" && root.length > 0,
 	);
 async function saveAccountsUnlocked(
 	storage: AccountStorageV3,
 	storagePath = getStoragePath(),
+	projectRootHint?: string | null,
 ): Promise<void> {
 	const path = storagePath;
 	...
-		await ensureGitignore(path);
+		await ensureGitignore(path, projectRootHint);
 return withStorageLock(async () => {
 	const storagePath = getStoragePath();
+	const projectRootHint = getStoragePathState().currentProjectRoot;
 	const state = {
 		snapshot: await loadAccountsInternal(
-			(storage) => saveAccountsUnlocked(storage, storagePath),
+			(storage) =>
+				saveAccountsUnlocked(storage, storagePath, projectRootHint),
 			storagePath,
 		),
 	};
 	...
-		await saveAccountsUnlocked(storage, storagePath);
+		await saveAccountsUnlocked(storage, storagePath, projectRootHint);

as per coding guidelines, lib/**: focus on auth rotation, windows filesystem io, and concurrency. verify every change cites affected tests (vitest) and that new queues handle ebusy/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 1949 - 1961, saveAccountsUnlocked writes account
files to the captured storage path but calls ensureGitignore() which still uses
the ambient currentProjectRoot; this causes .gitignore to be written to a
different project when setStoragePath changes before persist. Fix by threading
the resolved project-root/path into ensureGitignore (add an optional projectRoot
or targetDir parameter) and update saveAccountsUnlocked to call
ensureGitignore(path) instead of the ambient version; update any callers
(persist, setStoragePath flows) to propagate the captured path and adjust
getStoragePath usage in saveAccountsUnlocked to use the passed-in storagePath
consistently. Add a vitest that switches between two temp git roots before
persist to assert both the account file and .gitignore are written to the same
captured root, and ensure tests cover Windows IO/EBUSY/429 retry behavior per
lib/** guidelines.

2136-2155: ⚠️ Potential issue | 🔴 Critical

pin flagged storage to the transaction root too.

lib/storage.ts:2154 still calls saveFlaggedAccountsUnlocked(flaggedStorage), and lib/storage.ts:2418-2419 resolves that path from the ambient getStoragePath(). if a caller switches projects inside withAccountAndFlaggedStorageTransaction(), account storage stays pinned to the captured storagePath but flagged storage moves to the new project root. that splits what should be one atomic auth-rotation update. capture the flagged path alongside storagePath and add a vitest in test/storage.test.ts that drifts the path before persist() and asserts both files stay under the original transaction root.

possible fix
 async function saveFlaggedAccountsUnlocked(
 	storage: FlaggedAccountStorageV1,
+	storagePath = getFlaggedAccountsPath(),
 ): Promise<void> {
-	const path = getFlaggedAccountsPath();
+	const path = storagePath;
 	const markerPath = getIntentionalResetMarkerPath(path);
 	...
 }
 return withStorageLock(async () => {
 	const storagePath = getStoragePath();
+	const flaggedStoragePath = join(
+		dirname(storagePath),
+		FLAGGED_ACCOUNTS_FILE_NAME,
+	);
 	const state = {
 		snapshot: await loadAccountsInternal(
 			(storage) => saveAccountsUnlocked(storage, storagePath),
 			storagePath,
 		),
 		active: true,
 		storagePath,
 	};
 	...
 	const persist = async (
 		accountStorage: AccountStorageV3,
 		flaggedStorage: FlaggedAccountStorageV1,
 	): Promise<void> => {
 		...
 		await saveAccountsUnlocked(nextAccounts, storagePath);
 		try {
-			await saveFlaggedAccountsUnlocked(flaggedStorage);
+			await saveFlaggedAccountsUnlocked(flaggedStorage, flaggedStoragePath);
 			state.snapshot = nextAccounts;

as per coding guidelines, lib/**: focus on auth rotation, windows filesystem io, and concurrency. verify every change cites affected tests (vitest) and that new queues handle ebusy/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2136 - 2155, Capture the flagged-storage path at
the start of the transaction and use that pinned path when persisting flagged
accounts so both account and flagged files stay under the same transaction root:
in withAccountAndFlaggedStorageTransaction()/persist, alongside the
already-captured storagePath (from getStoragePath() used with
loadAccountsInternal and saveAccountsUnlocked), capture the flaggedStoragePath
(call the same resolver used elsewhere for flagged files) and replace the
ambient saveFlaggedAccountsUnlocked(flaggedStorage) call with a pinned-save that
writes to flaggedStoragePath; update any helper signatures (persist,
loadAccountsInternal usage) to accept the pinned flaggedStoragePath and ensure
cloneAccountStorageForPersistence/saveAccountsUnlocked usage is unchanged. Add a
vitest in test/storage.test.ts that calls
withAccountAndFlaggedStorageTransaction(), mutates the global/project storage
path between transaction start and calling persist(), and asserts both account
and flagged files were written under the original transaction root (also include
retries/handling for Windows EBUSY and transient 429 behaviors per IO
guidelines).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/codex-supervisor.js`:
- Around line 471-515: When the lock heartbeat loses the lease (in
createSupervisorLockHeartbeat) we must fail fast instead of only debug-logging:
when readSupervisorLockPayload returns falsy, when payload.ownerId !== ownerId,
or when writeSupervisorLockPayload throws, set stopped = true,
clearInterval(timer) and reject/promise-reject the pendingRefresh with a new
Error describing the condition so callers observing the heartbeat promise can
fail immediately; ensure the returned stop() still awaits pendingRefresh so the
rejection propagates, and include the lockPath and ownerId in the Error message
to aid diagnosis.
- Around line 2066-2092: The child process wait in the interactive path (the
child.once("exit") wait in the block around runCodexSupervisorWithRuntime / the
interactive monitor) must be raced against the outer controller.signal: replace
the plain child.once("exit") Promise with abortablePromise(child.once("exit"),
controller.signal) (or equivalent) so the wrapper stops waiting when the outer
signal aborts; on abort, ensure you call child.kill() (and on platforms where
needed use the proper signal handling fallback) and also call
monitorController.abort() to stop any monitor loops. Update
runCodexSupervisorWithRuntime/interactive monitor logic to accept the outer
controller.signal and to clean up by killing the child and aborting the monitor
controller when that signal fires. Add a regression test at
test/codex-supervisor.test.ts that spawns a never-exiting child, sends SIGTERM
to the wrapper, and asserts the wrapper exits and the child is killed (account
for Windows signal differences in the test).
- Around line 403-419: The getSupervisorStoragePath function currently swallows
exceptions and empty/whitespace results from runtime.getStoragePath(), causing
the supervisor to lock a default file while AccountManager (which calls
lib/storage.ts's getStoragePath) uses a different path; change
getSupervisorStoragePath to propagate failures instead of falling back: if
runtime.getStoragePath() throws, rethrow that error (or wrap and throw a
descriptive Error referencing getSupervisorStoragePath/runtime.getStoragePath)
and treat empty/whitespace return values as an error to be thrown rather than
returning the default path; update any call sites that expect a fallback to
handle the thrown error, and add a regression test in
test/codex-supervisor.test.ts that covers runtime.getStoragePath() throwing and
returning empty/whitespace to ensure the supervisor does not acquire the default
lock when storage-path resolution fails (reference functions
getSupervisorStoragePath, AccountManager.loadFromDisk, and lib/storage.ts
getStoragePath).

In `@test/codex-supervisor.test.ts`:
- Around line 926-959: The current test path (using
supervisorTestApi.prepareResumeSelection → markCurrentAccountForRestart →
commitPreparedSelection) keeps the same account object/token throughout and
misses the token-refresh/stale-object regression; add a deterministic vitest
test that calls prepareResumeSelection, then simulates a token refresh by
replacing the stored account token (the object returned by
manager.getCurrentAccountForFamily() or the supervisor's account store) before
calling commitPreparedSelection, then call requestChildRestart(new FakeChild(),
"win32") and commitPreparedSelection; assert the logical account is resolved
exactly once (e.g., check the same accountId "healthy" and that calls array
length is 1), and use vitest deterministic controls (fake timers/promises or
synchronous mocks) so the refresh happens strictly between prepare and commit.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 1949-1961: saveAccountsUnlocked writes account files to the
captured storage path but calls ensureGitignore() which still uses the ambient
currentProjectRoot; this causes .gitignore to be written to a different project
when setStoragePath changes before persist. Fix by threading the resolved
project-root/path into ensureGitignore (add an optional projectRoot or targetDir
parameter) and update saveAccountsUnlocked to call ensureGitignore(path) instead
of the ambient version; update any callers (persist, setStoragePath flows) to
propagate the captured path and adjust getStoragePath usage in
saveAccountsUnlocked to use the passed-in storagePath consistently. Add a vitest
that switches between two temp git roots before persist to assert both the
account file and .gitignore are written to the same captured root, and ensure
tests cover Windows IO/EBUSY/429 retry behavior per lib/** guidelines.
- Around line 2136-2155: Capture the flagged-storage path at the start of the
transaction and use that pinned path when persisting flagged accounts so both
account and flagged files stay under the same transaction root: in
withAccountAndFlaggedStorageTransaction()/persist, alongside the
already-captured storagePath (from getStoragePath() used with
loadAccountsInternal and saveAccountsUnlocked), capture the flaggedStoragePath
(call the same resolver used elsewhere for flagged files) and replace the
ambient saveFlaggedAccountsUnlocked(flaggedStorage) call with a pinned-save that
writes to flaggedStoragePath; update any helper signatures (persist,
loadAccountsInternal usage) to accept the pinned flaggedStoragePath and ensure
cloneAccountStorageForPersistence/saveAccountsUnlocked usage is unchanged. Add a
vitest in test/storage.test.ts that calls
withAccountAndFlaggedStorageTransaction(), mutates the global/project storage
path between transaction start and calling persist(), and asserts both account
and flagged files were written under the original transaction root (also include
retries/handling for Windows EBUSY and transient 429 behaviors per IO
guidelines).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 62207bdd-c6c7-45cd-912e-52509921507a

📥 Commits

Reviewing files that changed from the base of the PR and between c080cd0 and 71f5b3f.

📒 Files selected for processing (9)
  • lib/config.ts
  • lib/preemptive-quota-scheduler.ts
  • lib/storage.ts
  • scripts/codex-routing.js
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/preemptive-quota-scheduler.ts
  • lib/config.ts
  • lib/storage.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/storage.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
🪛 ast-grep (0.41.1)
test/storage.test.ts

[warning] 998-1002: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
storage path mismatch: transaction path is ${escapeRegExp( testStoragePath, )}, active path is ${escapeRegExp(secondaryStoragePath)},
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (1)
scripts/codex-routing.js (1)

15-16: verify this scanner skips every value-taking codex option.

scripts/codex-routing.js:15-16 and scripts/codex-routing.js:55-66 still hard-code only -c/--config as consuming a value. if the upstream codex cli accepts any other top-level value-taking flag, that value becomes the "command" here and resume/fork will be missed. please validate the allowlist against current codex help and add a vitest for one supported non-config flag before relying on this parser.

Also applies to: 55-66

@ndycode
Copy link
Owner Author

ndycode commented Mar 20, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

@ndycode — on it. triggering a full review of pr #139 now, covering scripts/codex-supervisor.js, lib/storage.ts, lib/quota-probe.ts, scripts/codex.js, scripts/codex-routing.js, and the associated test files.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ndycode
Copy link
Owner Author

ndycode commented Mar 20, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

@ndycode — on it. triggering a full review of pr #139 now. will cover:

  • scripts/codex-supervisor.js — listener accumulation, unbounded jsonl scans, lock concurrency, windows i/o edge cases
  • lib/storage.ts — transaction snapshot pinning, path drift, exportAccounts error paths
  • lib/quota-probe.ts — abort-signal propagation correctness
  • scripts/codex.js / scripts/codex-routing.js — double-sync prevention, supervisor forwarding paths
  • test/codex-supervisor.test.ts and related — regression coverage gaps flagged in greptile notes
✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/storage.ts (1)

2146-2159: ⚠️ Potential issue | 🟠 Major

capture the flagged storage path at transaction start.

lib/storage.ts:2154 still resolves the flagged file through getFlaggedAccountsPath(). if a handler calls setStoragePathDirect() before persist(), accounts are written back to the captured root, but flagged accounts follow the new root. that splits one combined transaction across two storage trees and makes rollback partial. please thread a captured flagged path through saveFlaggedAccountsUnlocked() and add the same drift regression for withAccountAndFlaggedStorageTransaction() next to test/storage.test.ts:1095-1345.

possible fix
 async function saveFlaggedAccountsUnlocked(
 	storage: FlaggedAccountStorageV1,
+	flaggedStoragePath = getFlaggedAccountsPath(),
 ): Promise<void> {
-	const path = getFlaggedAccountsPath();
+	const path = flaggedStoragePath;
 	// ...
 }

 return withStorageLock(async () => {
 	const storagePath = getStoragePath();
+	const flaggedStoragePath = join(
+		dirname(storagePath),
+		FLAGGED_ACCOUNTS_FILE_NAME,
+	);
 	// ...
-	await saveFlaggedAccountsUnlocked(flaggedStorage);
+	await saveFlaggedAccountsUnlocked(flaggedStorage, flaggedStoragePath);

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/storage.ts` around lines 2146 - 2159, In persist (and the surrounding
transaction flow), capture the flagged accounts file path at the start (instead
of calling getFlaggedAccountsPath() later) and pass that captured path into
saveFlaggedAccountsUnlocked so flagged writes/rollbacks use the same root as the
accounts write; update saveFlaggedAccountsUnlocked signature/usages to accept a
flaggedPath parameter and use it instead of resolving getFlaggedAccountsPath(),
and add a regression test alongside withAccountAndFlaggedStorageTransaction
tests (test/storage.test.ts:1095-1345) that calls setStoragePathDirect()
mid-transaction to assert both accounts and flagged accounts are
written/rolled-back to the originally captured paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/codex-manager/settings-hub.ts`:
- Around line 2578-2582: Adding the new supervisor row shifts numeric hotkeys;
update the numeric-to-action mapping so the new first item (value: { type:
"toggle-session-supervisor" }, label built with formatDashboardSettingState and
UI_COPY.settings.experimentalSessionSupervisor) is bound to key "1" and all
subsequent shortcuts increment by one. Locate the hotkey dispatch table in
settings-hub (the block that currently dispatches 1/2/3 around the previous
305-313 area) and renumber the entries to account for the inserted supervisor
row, update the help copy in lib/ui/copy.ts to reflect the new numbering, and
update the failing test test/settings-hub-utils.test.ts to assert the revised
key-to-action mapping (ensure the supervisor toggle is tested for key "1" and
other actions shifted accordingly).

In `@lib/preemptive-quota-scheduler.ts`:
- Around line 27-29: The comment above
DEFAULT_PREEMPTIVE_QUOTA_REMAINING_PERCENT_5H and
DEFAULT_SECONDARY_REMAINING_PERCENT_THRESHOLD is inaccurate because it claims
the 7d window is less aggressive while both constants are currently set to 5;
update the comment to state that these are two separate thresholds (5h and
secondary/7d) which are currently equal rather than implying a more/less
aggressive relationship, and ensure the comment references the two symbols
DEFAULT_PREEMPTIVE_QUOTA_REMAINING_PERCENT_5H and
DEFAULT_SECONDARY_REMAINING_PERCENT_THRESHOLD so future changes remain clear.

In `@package.json`:
- Line 54: The smoke npm script "test:session-supervisor:smoke" currently runs
several supervisor/wrapper tests but omits the high-risk storage regressions
located in test/storage.test.ts (the path-binding and async-local test cases
covering regressions in lib/storage.ts around the path-binding and async-local
sections); update the script to either add those specific storage tests to the
command (include test/storage.test.ts or a new trimmed smoke file containing
just the path-binding and async-local cases) or extract the relevant test blocks
from test/storage.test.ts into a new smaller file (e.g.,
test/storage-smoke.test.ts) and add that filename to the
"test:session-supervisor:smoke" entry so the new storage regressions run in the
fast validation gate.

In `@test/settings-hub-utils.test.ts`:
- Around line 163-179: The removeDirectoryWithRetry helper currently only
retries ENOTEMPTY, EPERM, and EBUSY; update the retryableCodes Set inside
removeDirectoryWithRetry to also include "EACCES" and "EAGAIN" so transient
Windows deletion failures are retried (match behavior in
lib/codex-manager/settings-hub.ts); ensure the rest of the retry logic (attempt
count, backoff, final throw) remains unchanged.

---

Outside diff comments:
In `@lib/storage.ts`:
- Around line 2146-2159: In persist (and the surrounding transaction flow),
capture the flagged accounts file path at the start (instead of calling
getFlaggedAccountsPath() later) and pass that captured path into
saveFlaggedAccountsUnlocked so flagged writes/rollbacks use the same root as the
accounts write; update saveFlaggedAccountsUnlocked signature/usages to accept a
flaggedPath parameter and use it instead of resolving getFlaggedAccountsPath(),
and add a regression test alongside withAccountAndFlaggedStorageTransaction
tests (test/storage.test.ts:1095-1345) that calls setStoragePathDirect()
mid-transaction to assert both accounts and flagged accounts are
written/rolled-back to the originally captured paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3312a3d7-dbb0-4f5c-a76a-26910b3d9d96

📥 Commits

Reviewing files that changed from the base of the PR and between 1be5e95 and 0c7846d.

📒 Files selected for processing (18)
  • lib/codex-manager/settings-hub.ts
  • lib/config.ts
  • lib/preemptive-quota-scheduler.ts
  • lib/quota-probe.ts
  • lib/schemas.ts
  • lib/storage.ts
  • lib/ui/copy.ts
  • package.json
  • scripts/codex-routing.js
  • scripts/codex-supervisor.js
  • scripts/codex.js
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
  • test/plugin-config.test.ts
  • test/preemptive-quota-scheduler.test.ts
  • test/quota-probe.test.ts
  • test/settings-hub-utils.test.ts
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/schemas.ts
  • lib/quota-probe.ts
  • lib/codex-manager/settings-hub.ts
  • lib/preemptive-quota-scheduler.ts
  • lib/ui/copy.ts
  • lib/storage.ts
  • lib/config.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/preemptive-quota-scheduler.test.ts
  • test/plugin-config.test.ts
  • test/settings-hub-utils.test.ts
  • test/quota-probe.test.ts
  • test/storage.test.ts
  • test/codex-bin-wrapper.test.ts
  • test/codex-supervisor.test.ts
🪛 ast-grep (0.41.1)
test/storage.test.ts

[warning] 998-1002: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(
storage path mismatch: transaction path is ${escapeRegExp( testStoragePath, )}, active path is ${escapeRegExp(secondaryStoragePath)},
)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (33)
test/codex-bin-wrapper.test.ts (4)

68-125: fixture setup is now hermetic with storage.js stub.

the writeSupervisorRuntimeFixture at test/codex-bin-wrapper.test.ts:116-124 properly stubs getStoragePath() to return a fixture-local path. this addresses the previous concern about supervisor locks leaking to real user paths on windows.

the stub modules for config, accounts, quota-probe, and storage are minimal but sufficient for the wrapper routing tests.


632-649: supervisor marker assertion validates the supervisor path is actually taken.

the test at test/codex-bin-wrapper.test.ts:648 asserts the marker file contains "supervisor\n", which is only written by the stub AccountManager.loadFromDisk() in writeSupervisorRuntimeFixture. this ensures the test fails if scripts/codex.js stops honoring the supervisor flag.


669-691: exit code 130 sentinel behavior is correctly tested.

test/codex-bin-wrapper.test.ts:688-690 validates that:

  • exit code is 130
  • no FORWARDED: output (supervisor handled it)
  • no auto-sync marker file exists

this covers the abort-sentinel path in scripts/codex.js.


712-741: double-sync prevention test looks good.

test/codex-bin-wrapper.test.ts:740 asserts exactly one "sync\n" line in the marker file after a supervisor-forwarded interactive command. this validates the auto-sync deduplication logic.

scripts/codex-supervisor.js (14)

90-112: abortablePromise cleanup is correct.

scripts/codex-supervisor.js:103 attaches .finally(cleanup) to the resolved promise, ensuring the abort listener is removed on normal completion. this addresses the previous memory leak concern.


47-67: sleep implementation handles abort and cleanup correctly.

the finish callback at scripts/codex-supervisor.js:55-61 ensures both the timer and event listener are cleaned up regardless of how the sleep ends.


410-436: storage path resolution now fails fast on errors.

scripts/codex-supervisor.js:416-427 throws descriptive errors when runtime.getStoragePath() throws or returns empty, preventing the supervisor from locking a different file than AccountManager uses. this addresses the previous cross-lock concern.


487-566: heartbeat fail-fast on lost lease is implemented.

scripts/codex-supervisor.js:517-534 detects when the lock file disappears or changes owner and calls stopWithError() which rejects the heartbeat promise. the caller in withSupervisorStorageLock at line 649-654 catches this and aborts the linked controller, propagating failure to the critical section.


1042-1054: probe cache now reserves the slot before async work.

scripts/codex-supervisor.js:1042-1053 creates and stores a pendingPromise in the cache before starting the actual fetch. this prevents duplicate fetches when concurrent callers race for the same account.


1847-1856: monitor loop backs off on probe unavailability.

scripts/codex-supervisor.js:1848-1855 sleeps for pollMs before continuing when the quota probe throws QuotaProbeUnavailableError. this prevents the hot-loop that would hammer storage and the probe endpoint.


1928-1956: child wait is properly raced against outer signal.

scripts/codex-supervisor.js:1928-1944 wraps the child exit promise in abortablePromise(...) using the outer signal. when the outer signal aborts, the catch block at lines 1945-1956 kills the child and returns exit code 130.


2138-2171: signal handlers stay installed during supervision.

scripts/codex-supervisor.js:2154 uses await before runCodexSupervisorWithRuntime, ensuring the finally block at lines 2168-2169 only removes SIGINT/SIGTERM handlers after supervision completes. this addresses the previous concern about premature handler removal.


726-731: dedup key excludes refreshToken.

scripts/codex-supervisor.js:726-730 builds the dedup key from index, accountId, and email only. this prevents the same logical account from appearing twice when its token refreshes between snapshots.


330-338: resume args now preserve caller cli options.

scripts/codex-supervisor.js:331-337 uses splitCodexCommandArgs to extract leading config args (e.g., -c profile="dev") and trailing args, then rebuilds the resume command while preserving them. this addresses the previous concern about dropped flags.


1691-1692: windows signal behavior documented.

scripts/codex-supervisor.js:1691 adds a comment explaining that SIGTERM is already forceful on windows, clarifying why the SIGKILL step is retained as a unix fallback.


1961-1966: monitor failure overrides success exit code.

scripts/codex-supervisor.js:1965 returns 1 when the child exited with 0 but the monitor failed. this ensures the supervisor doesn't report success when quota monitoring crashed mid-session.


157-168: bypass detection restricted to top-level flags.

scripts/codex-supervisor.js:158 uses hasTopLevelHelpOrVersionFlag from codex-routing.js which only scans tokens before the primary command or -- separator. this prevents codex exec -- --version from bypassing account gating.


1316-1323: probe-unavailable accounts are not cooled down.

scripts/codex-supervisor.js:1316-1318 collects accounts with reason: "probe-unavailable" into a separate list without calling markAccountUnavailable. only accounts with actual probe errors (line 1321) are marked. this prevents transient probe outages from cooling down every account.

test/codex-supervisor.test.ts (15)

1-54: test setup is clean with proper env var capture/restore.

test/codex-supervisor.test.ts:19-21 captures original env values and 249-260 restores them in afterEach. the createDeferred helper at lines 48-54 enables deterministic synchronization.


56-150: FakeManager covers the supervisor's manager interface.

test/codex-supervisor.test.ts:56-150 implements all methods the supervisor calls: getAccountsSnapshot, getCurrentOrNextForFamilyHybrid, markRateLimitedWithReason, etc. the coolingDownUntil tracking enables quota exhaustion tests.


152-247: createFakeRuntime enables fine-grained control over quota probing.

test/codex-supervisor.test.ts:165-167 provides hooks for onFetch, onFetchStart, and waitForFetchByAccountId that allow tests to coordinate and assert on probe ordering without wall-clock timing.


575-630: lock serialization test uses deterministic coordination.

test/codex-supervisor.test.ts:585-630 uses deferred promises instead of sleep/timing to verify that the second caller waits until the first releases. the order array at line 629 provides a clear assertion on serialization.


481-573: windows lock retry behavior is covered.

test/codex-supervisor.test.ts:481-521 and 523-573 test that the supervisor retries after transient EPERM/EBUSY errors on unlink and open. this covers the windows filesystem edge cases required by the coding guidelines.


967-1043: token refresh race regression is covered.

test/codex-supervisor.test.ts:1024-1025 mutates the stored account's refreshToken and access between prepareResumeSelection and commitPreparedSelection. the assertion at lines 1034-1042 verifies the same logical account is resolved with the refreshed credentials.


1485-1543: heartbeat fail-fast test exercises lost lease detection.

test/codex-supervisor.test.ts:1524 deletes the lock file mid-section, and line 1537-1538 asserts the critical section rejects with the expected "lost lease" error. the second section at line 1540 confirms lock recovery.


1629-1701: probe outage pacing test bounds fetch attempts.

test/codex-supervisor.test.ts:1691-1693 asserts exactly 1 fetch after startup, then waits 20ms and confirms still 1 fetch. this verifies the backoff sleep is taken before the second probe attempt, addressing the previous concern about only checking exit code.


860-907: overlap test uses promise coordination instead of timing.

test/codex-supervisor.test.ts:869-876 uses nearLimitGate and probeStarted deferreds to verify that selection probing starts before restart finishes. no performance.now() or elapsed time assertions.


1155-1223: batch parallelism test uses explicit coordination.

test/codex-supervisor.test.ts:1207-1214 polls startedAccounts.size using setImmediate to confirm all 3 degraded probes started in parallel before any resolved. no wall-clock elapsedMs assertions.


1402-1439: storage path error propagation is tested.

test/codex-supervisor.test.ts:1402-1439 covers both throwing and empty-return cases for runtime.getStoragePath(). lines 1431-1437 verify the error propagates and the default lock file is not created.


1545-1600: monitor failure exit code test validates the override.

test/codex-supervisor.test.ts:1592 asserts the result is 1 even though the child exited with 0, and line 1593-1594 verifies the expected stderr output about monitor failure.


1331-1377: signal propagation test verifies child is killed on abort.

test/codex-supervisor.test.ts:1373-1375 asserts the supervisor exits with 130 and the child's kill method was called when the outer signal aborts. the HangingChild class at lines 1337-1348 only exits when explicitly killed.


1281-1302: nested --version bypass regression is covered.

test/codex-supervisor.test.ts:1287-1301 verifies that codex exec -- --version does NOT bypass account gating. the test expects ensureLaunchableAccount reached to be thrown, confirming the supervisor attempts account selection.


1-1729: comprehensive supervisor test suite.

the test file covers:

  • lock serialization (test/codex-supervisor.test.ts:575-630)
  • windows filesystem retries (test/codex-supervisor.test.ts:481-573)
  • token refresh race (test/codex-supervisor.test.ts:967-1043)
  • heartbeat fail-fast (test/codex-supervisor.test.ts:1485-1543)
  • probe outage pacing (test/codex-supervisor.test.ts:1629-1701)
  • signal propagation (test/codex-supervisor.test.ts:1331-1377)

all tests use deterministic coordination via deferred promises or explicit mocking instead of wall-clock timing.

Comment on lines +2578 to +2582
{
label: `${formatDashboardSettingState(draft.codexCliSessionSupervisor ?? BACKEND_DEFAULTS.codexCliSessionSupervisor ?? false)} ${UI_COPY.settings.experimentalSessionSupervisor}`,
value: { type: "toggle-session-supervisor" },
color: "yellow",
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

renumber the experimental hotkeys with this new first item.

adding the supervisor row at lib/codex-manager/settings-hub.ts:2578-2582 shifts the numeric shortcuts, but lib/codex-manager/settings-hub.ts:305-313 still dispatches 1/2/3 as sync/backup/guard. with the updated help in lib/ui/copy.ts:97-101, pressing 1 now runs sync instead of toggling the supervisor, and test/settings-hub-utils.test.ts:754-791 still locks in the stale mapping.

possible fix
 function mapExperimentalMenuHotkey(
 	raw: string,
 ): ExperimentalSettingsAction | undefined {
-	if (raw === "1") return { type: "sync" };
-	if (raw === "2") return { type: "backup" };
-	if (raw === "3") return { type: "toggle-refresh-guardian" };
+	if (raw === "1") return { type: "toggle-session-supervisor" };
+	if (raw === "2") return { type: "sync" };
+	if (raw === "3") return { type: "backup" };
+	if (raw === "4") return { type: "toggle-refresh-guardian" };
 	if (raw === "[" || raw === "-") return { type: "decrease-refresh-interval" };
 	if (raw === "]" || raw === "+") return { type: "increase-refresh-interval" };

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
{
label: `${formatDashboardSettingState(draft.codexCliSessionSupervisor ?? BACKEND_DEFAULTS.codexCliSessionSupervisor ?? false)} ${UI_COPY.settings.experimentalSessionSupervisor}`,
value: { type: "toggle-session-supervisor" },
color: "yellow",
},
function mapExperimentalMenuHotkey(
raw: string,
): ExperimentalSettingsAction | undefined {
if (raw === "1") return { type: "toggle-session-supervisor" };
if (raw === "2") return { type: "sync" };
if (raw === "3") return { type: "backup" };
if (raw === "4") return { type: "toggle-refresh-guardian" };
if (raw === "[" || raw === "-") return { type: "decrease-refresh-interval" };
if (raw === "]" || raw === "+") return { type: "increase-refresh-interval" };
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/codex-manager/settings-hub.ts` around lines 2578 - 2582, Adding the new
supervisor row shifts numeric hotkeys; update the numeric-to-action mapping so
the new first item (value: { type: "toggle-session-supervisor" }, label built
with formatDashboardSettingState and
UI_COPY.settings.experimentalSessionSupervisor) is bound to key "1" and all
subsequent shortcuts increment by one. Locate the hotkey dispatch table in
settings-hub (the block that currently dispatches 1/2/3 around the previous
305-313 area) and renumber the entries to account for the inserted supervisor
row, update the help copy in lib/ui/copy.ts to reflect the new numbering, and
update the failing test test/settings-hub-utils.test.ts to assert the revised
key-to-action mapping (ensure the supervisor toggle is tested for key "1" and
other actions shifted accordingly).

@ndycode
Copy link
Owner Author

ndycode commented Mar 20, 2026

@coderabbitai review
@greptile @cubic review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

@ndycode — on it. triggering a full review of pr #139 now. key areas being covered:

  • scripts/codex-supervisor.js — lock concurrency, unbounded jsonl scans, listener accumulation, windows i/o edge cases, restart safety bounds
  • lib/storage.ts — transaction snapshot pinning, areEquivalentStoragePaths windows normalization, exportAccounts error paths, state.active finally-block correctness
  • lib/quota-probe.ts — abort-signal propagation, per-model listener cleanup, pre-aborted signal short-circuit
  • scripts/codex.js / scripts/codex-routing.js — double-sync prevention, supervisor forwarding paths, supervisorDidForward flag correctness
  • test/codex-supervisor.test.ts, test/codex-bin-wrapper.test.ts, test/storage.test.ts — regression coverage gaps, concurrency test correctness, windows-specific assertions
✅ Actions performed

Full review triggered.

@ndycode
Copy link
Owner Author

ndycode commented Mar 20, 2026

Superseded by the split cleanup PRs:

Closing this bloated branch so review can continue on the narrowed replacements.

@ndycode ndycode closed this Mar 20, 2026
@ndycode ndycode reopened this Mar 20, 2026
@ndycode
Copy link
Owner Author

ndycode commented Mar 20, 2026

Superseded by #147. I rebuilt the session-supervisor work from current main on a fresh single branch so the review stays in one clean PR with only the intended keep-set.

@ndycode ndycode closed this Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant