Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,5 +106,15 @@ This project uses the **mfbt MCP server**, which exposes a virtual filesystem (V
- **Ralph in TUI:** Integrated into main TUI (`r` key), not a standalone app. Ralph runs *in place* inside the unified `FeatureListPanel` (`tui/screens/feature_list.py`) in `#main-content` — there is no separate Ralph panel — with `PreflightModal` for agent checks. Standalone `tui_app.py` and the old `ralph_panel.py` are deleted.
- **Ralph subcommand:** `src/mfbt/commands/ralph/` — orchestrator, display (console), tui_display (Textual adapter), ralph_widgets (TUI widgets), agent runner, progress (API), prompt builder, types.
- **Display duck typing:** `RalphOrchestrator.display` is typed as `Any` — both `RalphDisplay` and `RalphTUIDisplay` are structurally compatible (same display protocol).
- **Ralph sandboxing (two-mechanism, hard-fail — current design):** Exactly **two** sandbox mechanisms, each a **hard requirement** on its platform: `sandbox-exec` on macOS, `bwrap` (bubblewrap) on Linux. There is **no** unsandboxed mode, no `docker`/`firejail`, no config override, no `--sandbox`/`--yes` flag, no consent/feedback layer. If the platform's mechanism is unavailable (or the platform is neither macOS nor Linux), Ralph **aborts before any agent subprocess** with an actionable message.
- **`sandbox/` package:** `models.py` (`SandboxMechanism`={`SANDBOX_EXEC`,`BWRAP`} only; `SandboxConfig`, `SandboxPermissions`, `resolve_paths`), `detect.py`, `exceptions.py`, `wrapper.py`, `adapters/` (`sandbox_exec.py`, `bwrap.py`, `__init__.py`). No `models` `SandboxResult`/`SandboxType`/`CandidateOutcome`/`SandboxDetectionReport`; no `feedback.py`; no `docker.py`/`firejail.py`. `models.py` imports the sibling leaf `exceptions` (acyclic, still no external deps / no import-time side effects) and does **fail-loud, filesystem-free `__post_init__` validation**: `SandboxPermissions` rejects a writable filesystem-root or bare `$HOME` (incl. `project_dir`) → `SandboxSetupError`; `SandboxConfig` rejects `project_dir != permissions.project_dir` (the bound `permissions` is stored as-is, no silent replace). `SandboxConfig.base_url` is **inert** (no adapter consumes it; never a confinement input).
- **Detection:** one function `detect_sandbox(which_fn=shutil.which, system_fn=platform.system, *, run_fn=subprocess.run) -> SandboxMechanism`. macOS→`SANDBOX_EXEC` (binary-presence check), Linux→`BWRAP` (functional liveness probe `bwrap --ro-bind / / true`), else→raise `SandboxDetectionError`. No `detect_sandbox_with_report`, no `explicit=`, no config loader. `detect.py` does **not** import `mfbt.config` (cycle severed; `config.validate_config` has no sandbox block; no `_VALID_SANDBOX_TYPES`/`_CONFIG_OVERRIDE_KEY`).
- **Adapters:** each is pure `build_prefix(config: SandboxConfig) -> list[str]` + `classify_exit(rc, sandbox_stderr) -> SandboxError|None`. `adapters/__init__.py` `get_adapter`/`get_classifier` are strict 2-entry dispatch (unknown mechanism → `SandboxError`, no identity/no-op). SBPL asset `commands/ralph/profiles/claude.sb` (MCLI-126 write-confine: `(allow default)` + global write-deny + per-writable re-allow + a **curated `/dev` device-node allow-list** — `/dev/null|zero|random|urandom|tty|dtracehelper|ptmx`, `^/dev/ttys[0-9]+$`, `(subpath "/dev/fd")` — **not** a blanket `(subpath "/dev")`; kept a single line so the comment-breakout test's balanced-sexpr invariant holds; network open) loaded via `importlib.resources`; hatchling auto-bundles non-`.py` package files. `classify_exit` re-attributes a non-zero exit to the sandbox **only** on the sandbox's own **line-anchored** stderr signature (`[Sandbox] ` wrapper prefix stripped first; sandbox_exec: `deny(\d+)` regex anywhere on a line OR a line starting `sandbox-exec:`; bwrap: a line starting `bwrap:`); else `None` = Claude's own exit, returned as `AgentResult`.
- **`SandboxWrapper`:** `wrap_command(cmd)` canonical (`wrap(cmd)` alias); `prepare()` is a **no-arg no-op** lifecycle hook; `cleanup()` removes the sandbox-exec SBPL temp profile in `run()`'s `finally`. `mechanism=None`→`detect_sandbox()` opt-in (orchestrator passes a concrete mechanism; bad value→`SandboxError`). Sandbox stderr: pure `format_sandbox_stderr` (`[Sandbox] ` prefix) + `note_stderr_line` (always attributes) / `get_sandbox_stderr`; sandbox tool + `claude` share ONE stderr fd by design (attribution by `[Sandbox] ` marker, not OS-fd — do **not** split processes). `AgentResult.sandbox_stderr` defaulted.
- **`AgentRunner`:** `__init__(*, project_dir=None, base_url=None)` (no `assume_yes`/`presented`). `sandbox_wrapper is None`→auto-detect once; injected wrapper used as-is. The no-wrapper default `base_url` is sourced from `mfbt.config.DEFAULT_CONFIG["base_url"]` (single source — no hand-synced `_DEFAULT_BASE_URL` constant). `build_command` **always** inserts `--dangerously-skip-permissions` at argv[1] (a real OS sandbox is always the boundary; headless `claude -p`/`stdin=DEVNULL` can't answer tool prompts). The subprocess `Popen` is launched with **`cwd=` pinned to the sandbox's own realpath'd `project_dir`** (single source = the wrapper's `SandboxConfig`; resolved exactly as the adapters resolve it) so the run location is always inside the confined/bound area, never the inherited parent CWD — `cwd` is part of the frozen Popen signature. After the timed `stderr` join, `run()` logs (WARNING) if the drain was truncated, and on a non-zero exit not attributed to a denial logs INFO (WARNING if truncated) so a possibly-missed denial is never fully silent. `run()` = a `try/except SandboxError` setup phase (build→`prepare()`→`wrap_command`) + the subprocess `try/…/finally cleanup()` phase; ordered `except` clauses (base `SandboxError` last) each `_emit_sandbox_error(exc); raise` (re-raise, **not** `SystemExit`). Popen `OSError` EPERM/EACCES→`SandboxDeniedError`. No `_handle_none_sandbox_fallback`/`acknowledge_unsandboxed`/`_emit_sandbox_note`/`_NONE_SANDBOX_WARNING`.
- **Exceptions:** `SandboxError(message, *, mechanism=None)` (`__str__`=`"[<mechanism>] <message>"`); subclasses `SandboxDetectionError`, `SandboxSetupError`, `SandboxDeniedError`, `SandboxCompatibilityError` (reserved, defensively handled, never raised by current paths). All 5 re-exported from `sandbox/__init__`.
- **Hard-fail wiring (TWO chokepoints — construction AND run):** (1) construction: `RalphOrchestrator.__init__` calls `detect_sandbox()`, logs+re-raises `SandboxDetectionError`. (2) post-construction: `orchestrator._implement_feature` has an `except SandboxError: logger.error(...); raise` **before** its broad `except Exception` so a `SandboxSetupError`/`SandboxDeniedError` from `agent.run()` is **never** downgraded to a per-feature `FAILED` (which would loop on a broken sandbox and mask a denial) — it propagates out of `run()`. CLI `commands/ralph/__init__.py`: `except SandboxDetectionError`→`display.error`+`typer.Exit(1)`; `display.sandbox_status(...)` before `orchestrator.run()`; `except SandboxError`→`display.error("sandbox failure — Ralph aborted: …")`+`typer.Exit(1)`; `summary.failed>0`→`typer.Exit(1)`. TUI `feature_list.py::_run_orchestrator`: `except SandboxDetectionError` (now `logger.error(exc_info=True)` — parity with CLI) wrapping the `RalphOrchestrator(...)` construction → notify(error) + `_return_to_browsing`; a separate `except SandboxError` around `run()` → notify("Sandbox failure — Ralph aborted: …", error) + `_return_to_browsing` (both fail closed visibly). `RalphConfig` has no `assume_yes`/`sandbox`; the pre-run `typer.confirm` prompt is gone; `--status` only conflicts with `--quiet`. `RalphHeader._sandbox_label()` always shows the mechanism (no UNSANDBOXED branch). `RalphDisplay`/`RalphTUIDisplay` keep `sandbox_status` (stdout/log, quiet-suppressed, enum `.value` verbatim); `sandbox_warning` removed.
- **Debug:** module-scope `_debug_enabled = False` in `wrapper.py` AND `detect.py` (literal `# TODO: decision needed — wire to --<flag-name> and <ENV_VAR_NAME>`, intentionally unwired; tests `monkeypatch.setattr(<mod>, "_debug_enabled", True)`). `[sandbox-debug] ` via `wrapper.format_sandbox_debug`/`emit_debug_sandbox_stderr()`/`detect._probe_liveness`.
- **Test gotchas:** `__init__`'s `logger` param shadows the module logger — use `logging.getLogger(__name__)`. `test_orchestrator.py` autouse fixture patches `orchestrator.detect_sandbox` (NOT `_with_report`) returning a bare `SandboxMechanism`. `test_agent.py` autouse fixture patches `agent.detect_sandbox` + `agent.SandboxWrapper`; `_stub_wrapper` defaults to `SANDBOX_EXEC` (a real sandbox) and sets `classify_exit.return_value=None`, so `build_command` includes `--dangerously-skip-permissions` at index 1 — assert presence, never absence; mock wrappers need `wrap_command.side_effect` (not `.wrap`). `test_detect.py::test_import_has_no_side_effects` must keep the parent-package `detect` attr save/restore in `finally` (full-suite-order hermeticity). Real-wrapper integration tests in `test_agent.py` exercise BWRAP/SANDBOX_EXEC through `run()` mocking only `Popen`/`which`.
- **Key files:** `auth_flow.py` (shared OAuth), `coding_agents.py` (agent pre-flight checks), `tui/screens/phase_list.py`, `tui/screens/preflight_modal.py`, `tui/screens/feature_list.py` (unified browse + in-place Ralph panel), `commands/ralph/ralph_widgets.py`.
- **TUI shortcuts:** `r` = Ralph, `ctrl+r` = Refresh, `d` = Describe, `enter` = Open/Detail, `esc` = Back, `?` = Help, `q` = Quit.
40 changes: 27 additions & 13 deletions src/mfbt/commands/ralph/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,12 +119,6 @@ def ralph(
"-q",
help="Suppress live output; only show errors and final summary",
),
yes: bool = typer.Option(
False,
"--yes",
"-y",
help="Skip confirmation prompt and start immediately",
),
instructions: Optional[str] = typer.Option(
None,
"--instructions",
Expand All @@ -148,9 +142,9 @@ def ralph(
)
from mfbt.commands.ralph.types import AgentType, OrchestrationMode, PhaseInfo, RalphConfig

if status and (quiet or yes):
if status and quiet:
console.print(
"[red]Error:[/red] --status is mutually exclusive with --quiet and --yes."
"[red]Error:[/red] --status is mutually exclusive with --quiet."
)
raise typer.Exit(code=1)

Expand Down Expand Up @@ -244,24 +238,44 @@ def ralph(
if status:
raise typer.Exit(code=0)

if not yes:
typer.confirm("Continue?", abort=True)

from mfbt.commands.ralph.log_capture import RalphLogger
from mfbt.config import get_mfbt_dir

log_base_dir = get_mfbt_dir() / "logs"
ralph_logger = RalphLogger(log_base_dir, agent_type)

orchestrator = RalphOrchestrator(
ralph_config, client, display, logger=ralph_logger
from mfbt.commands.ralph.sandbox import (
SandboxDetectionError,
SandboxError,
)

try:
orchestrator = RalphOrchestrator(
ralph_config, client, display, logger=ralph_logger
)
except SandboxDetectionError as exc:
display.error(str(exc))
raise typer.Exit(code=1) from exc

# Surface the active sandbox (quiet-suppressed) before the run
# loop. Detection already hard-failed above if no sandbox is
# available, so this is purely informational.
if not quiet:
display.sandbox_status(orchestrator.sandbox_mechanism.value)

try:
summary = orchestrator.run()
except AgentNotFoundError as exc:
display.error(str(exc))
raise typer.Exit(code=1) from exc
except SandboxError as exc:
# A post-construction sandbox failure (setup/denial) is a hard
# abort, not a per-feature failure: the orchestrator re-raises
# it out of run() rather than continuing the loop. Surface the
# attributed message and exit non-zero, same as the
# construction-time SandboxDetectionError above.
display.error(f"sandbox failure — Ralph aborted: {exc}")
raise typer.Exit(code=1) from exc

exit_code = 1 if summary.failed > 0 else 0
raise typer.Exit(code=exit_code)
Expand Down
Loading
Loading