feat: add Sprites as a sandbox provider#3041
Conversation
Adds Sprites (https://sprites.dev) as a first-class sandbox backend, peer to E2B/Modal/Vercel/etc. The provider implements the full BaseSandboxClient and BaseSandboxSession contracts: lifecycle (ephemeral + named-attach), exec via sprites-py async ControlConnection, filesystem read/write, PTY parity with E2B, exposed-port resolution against the sprite's single public service URL, and tar-based workspace persistence. Includes a SpritesPlatformContext capability that injects /.sprite/llm.txt into the agent's instructions so it learns sprite-env services/checkpoints/ URL routing from the platform itself. Adds the [sprites] optional extra (sprites-py>=0.0.1rc37,<0.2), a mypy override, parametrized entries in tests/sandbox/test_compatibility_guards.py for all six guards, a unit suite at tests/extensions/test_sandbox_sprites.py, a sprites_runner.py example, the docs/sandbox/clients.md provider table entries, and the docs/ref/extensions/sandbox/sprites/sandbox.md ref page.
Adds two opt-in capabilities to agents.extensions.sandbox.sprites for platform-specific affordances that the in-VM CLI cannot reach: - SpritesUrlAccess(allow_public=False): exposes a set_sprite_url_visibility(visibility="public" | "sprite") tool that calls Sprite.update_url_settings via the SDK's authenticated client. Default-deny on "public" — apps must explicitly set allow_public=True to expose the option to the agent. Closes the loop where models fumble between unauthenticated `sprite update` and `sprite-env curl` attempts when asked to make the URL public. - SpritesCheckpoints(allow_restore=False): exposes create_sprite_checkpoint(comment), list_sprite_checkpoints(), and (when allow_restore=True) restore_sprite_checkpoint(id). Wraps the sync NDJSON-streaming sprites-py iterator in asyncio.to_thread and filters the platform's "Current" sentinel from the create result so the model gets the actual saved snapshot id (e.g. "v1") back. Restore is destructive (replaces the workspace) so it stays gated off until the application opts in. Capability.bind() receives the runtime SandboxSession wrapper, so the shared _resolve_sprite_handle helper now steps through ``_inner`` to reach the SpritesSandboxSession before reading ``_sprite``. Updates compat-guard exports, package re-exports, and adds 11 unit tests. mypy / ruff / 110-test focused suite all green.
When attaching to an existing sprite (created_by_us=False), SpritesSandboxSession now skips the eager wait-for-running poll and lets the first I/O operation drive the wake-up via a new _ensure_warm() guard. The platform auto-wakes a paused sprite when traffic arrives, so this is essentially free — and avoids paying 1–10s of polling latency just to hand back a session handle. The created-by-us path is unchanged: a fresh sprite still needs a provisioning poll, and we set _warmth_verified=True after. A new _invalidate_warmth() hook lets recovery flows force a re-poll after a transport error. Live timing of named-attach: - create(): 3s → 0.01s - first exec: adds the wake-up roundtrip (formerly paid eagerly) - subsequent I/O: unchanged (cached warm flag) 3 new lazy-warm tests; the _attach test helper now also marks _warmth_verified=True so existing tests that bypass startup keep working.
Extends the lazy wait-for-running pattern from named-attach to the created_by_us=True path. Sprites can transition between cold/warm/ running freely with auto-wake-on-traffic, and ``create_sprite`` already raises eagerly on platform rejection — so the eager poll on the ephemeral path was paying ~1.5s per session just to confirm something the platform implicitly handles. Now both paths skip the poll in ``_ensure_sprite``; ``_ensure_warm`` runs once on first I/O. ``_resolve_exposed_port`` calls ``_ensure_warm`` explicitly because it needs ``Sprite.url`` populated (which happens during the post-poll refresh). TUI startup goes from ~7.6s to ~6.0s. The ~1.5s saved is exactly the poll we now skip; the wake-up cost shifts to the agent's first exec, where it overlaps with model thinking time and is invisible to the user. Updates ``test_create_ephemeral_sprite`` to assert no eager get_sprite poll fires, mirroring the named-attach assertion.
Adds an idle-close watcher to SpritesSandboxSession that closes pooled control connections after ``idle_close_seconds`` of no I/O. Once the last WS closes, the sprite drops back to ``warm`` and stops accruing running-state cost; the next I/O reopens a connection and the platform auto-wakes the sprite on traffic arrival (~1s wake-up). The watcher is a single asyncio Task scheduled on first activity. Each I/O hook calls _touch_activity(), which (a) updates the last-activity timestamp and (b) respawns the watcher if it has previously exited. The loop sleeps until the deadline, re-checks (since activity may have shifted the deadline forward), and exits after firing the close so the next activity gets a fresh task. PTY operations skip the close (their connections must stay open). Shutdown cancels and awaits the watcher before tearing down. New public knob: - SpritesSandboxClientOptions.idle_close_seconds (default 60.0) - SpritesSandboxSessionState.idle_close_seconds Set to 0 (or negative) to disable. Field appended at end of both classes; compat-guard parametrize entries updated. 4 new unit tests cover the watcher closing, the disabled mode, the PTY-active skip, and the activity-resets-deadline behavior.
``Capability.clone`` runs every agent turn and resets per-instance attribute state, so the previous ``_cached_text`` PrivateAttr never hit — every turn re-executed ``cat /.sprite/llm.txt`` to inject the platform doc, waking the sprite even when the model never made a tool call. Promotes the cache to module scope keyed by ``(sprite_name, path)``. Survives across all clones for the same sprite, so the file lands exactly once per sprite for the life of the process. Adds ``clear_platform_context_cache`` for applications that need to force a re-fetch (e.g. after a sprite image upgrade). Adds an autouse fixture in the test suite that clears the cache between tests so state doesn't leak. 2 new regression tests cover the cross-clone caching and the explicit clear path.
Conflict was uv.lock only — regenerated with `uv lock` against the merged pyproject.toml so all extras (sprites + upstream's recent ones) lock consistently. pyproject.toml auto-merged: our [sprites] extra and the upstream 0.14.6→0.14.7 version bump and compaction changes coexist. Verified: 725 sandbox+sprites tests pass, mypy/ruff clean repo-wide.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30df3ddecd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| deadline_s = max(0.0, float(self.state.timeout_ms or 0) / 1000.0) or float( | ||
| DEFAULT_SPRITES_WAIT_FOR_RUNNING_TIMEOUT_S |
There was a problem hiding this comment.
Honor wait_for_running_timeout_s during warm-up polling
SpritesSandboxClientOptions exposes wait_for_running_timeout_s as the readiness-poll timeout, but warm-up currently computes deadline_s from state.timeout_ms instead. Because wait_for_running_timeout_s is never consumed here, callers cannot control the poll window as documented (they always get the default or an unrelated timeout_ms value), which can cause unexpected startup delays/timeouts in production.
Useful? React with 👍 / 👎.
| timeout_ms=options.timeout_ms, | ||
| workspace_persistence=options.workspace_persistence, | ||
| ) |
There was a problem hiding this comment.
Propagate idle_close_seconds into created session state
create() builds SpritesSandboxSessionState without passing options.idle_close_seconds, so every new session silently falls back to the model default (60.0s) regardless of caller configuration. This breaks the options contract and prevents applications from disabling or tightening idle close behavior for cost/control reasons.
Useful? React with 👍 / 👎.
| await self._close_idle_control_connections() | ||
| # Watcher exits; the next I/O calls ``_touch_activity`` which | ||
| # will respawn it. | ||
| return |
There was a problem hiding this comment.
Keep idle watcher alive when PTY sessions are active
The idle watcher exits unconditionally right after _close_idle_control_connections(). When a PTY is active, _close_idle_control_connections() returns early by design, but the watcher still terminates; after that PTY ends, there is no running watcher left to close pooled control connections, so the sprite can remain in a billable running state until another I/O event happens.
Useful? React with 👍 / 👎.
|
Thanks; just emailed re integration testing credentials |
| inner = SpritesSandboxSession.from_state(state, token=self._token, base_url=self._base_url) | ||
| try: | ||
| await inner._ensure_sprite() | ||
| inner._set_start_state_preserved(True) |
There was a problem hiding this comment.
Could you distinguish “reattached existing sprite” from “created a replacement sprite” here?
For created_by_us=True, _ensure_sprite() creates a new sprite when _sprite is empty. If this is a replacement after the original ephemeral sprite was deleted, resume() still calls _set_start_state_preserved(True). With NoopSnapshot and state.workspace_root_ready=True from the previous run, BaseSandboxSession.start() can treat the fresh sprite as preserved and skip the full durable manifest apply.
I think replacement should clear workspace_root_ready and not mark start state as preserved; only a true reattach to an existing sprite should do that.
| env=dict(options.env or {}) or None, | ||
| timeout_ms=options.timeout_ms, | ||
| workspace_persistence=options.workspace_persistence, | ||
| ) |
There was a problem hiding this comment.
It looks like wait_for_running_timeout_s and idle_close_seconds are exposed on SpritesSandboxClientOptions, but not copied into SpritesSandboxSessionState here. _wait_for_sprite_running() also uses timeout_ms/default rather than wait_for_running_timeout_s.
Can you either wire these through fully, including tests from options -> state -> behavior, or remove the public knobs until they are implemented? Right now callers cannot actually tune the advertised wait timeout or disable idle close through client options.
| try: | ||
| control = await self._ensure_control() | ||
| try: | ||
| op_conn = await control.start_op( |
There was a problem hiding this comment.
I do not see manifest.environment being resolved or applied for Sprites exec/PTY/create. Other hosted backends pass manifest env at sandbox creation or inject it into command execution, so manifests that rely on env vars will behave differently here.
Can Sprites apply the merged env, or explicitly reject/document env as unsupported? Silent no-op behavior seems hard to debug.
|
Could you add cloud mount support for Sprites as part of this provider? Please add a Sprites mount strategy and focused tests for the common bucket flows, or link a clearly tracked follow-up if cloud mounts should land separately. |
The platform-context framing now tells the agent that ``sprite-env services create`` ignores the cwd of the calling shell — services launch with the host's home directory as cwd by default, which silently breaks any service the agent tries to expose unless it's pointed at the workspace explicitly. The framing also picks up the actual ``manifest.root`` so the example uses the agent's real workspace path. Two new tests cover the warning text and that the framing reflects a non-default manifest root.
…ounts Adds an rclone-backed mount strategy for AWS S3 / Cloudflare R2 / Google Cloud Storage / Azure Blob / Box mounts on Sprites sandboxes. Mirrors the E2B / Daytona / Runloop pattern: lazy-installs ``rclone`` and the ``fuse`` package via ``sudo -n apt-get`` if they aren't preinstalled in the sprite image, writes a per-session rclone config, runs ``rclone mount`` in daemon mode, and tears down via ``fusermount -u`` on session/snapshot stop. Sprite VMs run as the unprivileged ``sprite`` user with passwordless sudo; ``SpritesSandboxSession.exec`` rejects ``user=`` kwargs, so the install path prefixes ``sudo -n`` rather than escalating through the framework. Two SDK-side workarounds for current platform behavior: 1. **Stdout sentinels for tool detection.** sprite-env's WS control protocol currently ships ``op.complete`` with no ``exitCode`` field, so ``ExecResult.ok()`` always reports success. The strategy's detection commands wrap the conditional in ``if … then echo __SPRITES_PRESENT__; else echo __SPRITES_MISSING__; fi`` so stdout drives the decision instead of the (currently unreliable) exit code. The platform-side fix is tracked in sprite-env#446; once that lands, ``ExecResult.ok()`` becomes accurate again, and the stdout-sentinel approach remains valid (it's strictly more robust than exit-code checks anyway). 2. **Post-mount dir-cache warmup.** ``rclone mount --daemon`` forks and the parent returns immediately; FUSE's first ``readdir`` on the mountpoint root then races the daemon's first remote listing fetch and can briefly observe an empty directory. ``_verify_mount_active`` uses ``mountpoint -q`` (with a 5s poll) to confirm the bind landed, then issues a throw-away ``ls`` to prime rclone's dir cache before handing control back to the caller. Discriminator: ``"sprites_cloud_bucket"``. Registered through the polymorphic ``MountStrategyBase`` registry. Compat-guard parametrize entries pin the public export and the discriminator string. Twenty unit tests in ``tests/extensions/test_sandbox_sprites.py`` cover the rclone-installed path, the lazy-install path, the silent-no-op recovery path, FUSE-support detection, the per-session pattern adjustment, the session-type guard, the post-mount verification (mounted + not-mounted + warmup-ordering), and JSON roundtrip through the manifest and registry. ``docs/sandbox/clients.md`` storage-entries matrix updated to show ✓ for S3 / R2 / GCS / Azure / Box on Sprites. Verified end-to-end against a Tigris (S3-compatible) read-only bucket.
8768979 to
0d8fc85
Compare
This PR adds Sprites (Fly.io's sandbox-VM-as-a-service platform) as a first-class sandbox backend, peer to E2B / Modal / Vercel / Daytona / Runloop / Blaxel / Cloudflare. It uses the same
BaseSandboxClient/BaseSandboxSessioncontract; sameWorkspaceShellCapability/Filesystem()/apply_patch/ PTY surface, and the same opt-in extra (pip install "openai-agents[sprites]").The new module lives at
agents.extensions.sandbox.spritesand ships:SpritesSandboxClient/SpritesSandboxClientOptions— token fromSPRITES_API_TOKENenv var or kwarg, base URL fromSPRITES_API_URL.SpritesSandboxSession/SpritesSandboxSessionState— exec runs over the multiplexedControlConnectionWebSocket fromsprites-py(no sync thread blocking); fs read/write viaSpriteFilesystem; PTY parity with E2B; tar-based workspace persistence.options.sprite_nameis set" — both modes verified end-to-end.SpritesPlatformContext()— auto-injects/.sprite/llm.txtinto the system prompt so the model knows aboutsprite-env services, checkpoints, URL routing, security rules. Module-cached per sprite so purely-conversational turns don't re-executecat.SpritesUrlAccess(allow_public=False)— adds aset_sprite_url_visibility(visibility)tool. Default-deny on"public"— apps must explicitly opt in.SpritesCheckpoints(allow_restore=False)— addscreate_sprite_checkpoint,list_sprite_checkpoints, and (whenallow_restore=True)restore_sprite_checkpointtools. Wraps sprites-py's iterator-streaming checkpoint API inasyncio.to_thread._ensure_sprite()skips the wait-for-running poll; the first I/O operation drives the wake-up via_ensure_warm().idle_close_seconds=60.0(configurable, set to 0 to disable): closes pooled control connections so the sprite drops back towarmand stops accruing running-state cost. Next I/O reopens.warmstate while it's not needed (after 60 seconds of inactivity), and quickly wakes up as soon as the agent tries to use it.Compatibility
Pure additive. New optional extra (
[sprites]), new sub-package, no changes toRunStateschema, no reordering of existing public dataclass fields. All six entries intests/sandbox/test_compatibility_guards.pyparametrized for the new types.Test plan
tests/extensions/test_sandbox_sprites.pycovering: options/state roundtrip, env-var auth resolution, ephemeral and named-attach lifecycles, exec mapping (success / timeout / transport error), PTY (start / write_stdin / terminate / finalize), exposed-port resolution and validation, fs read/write with error mapping, tar-based persistence, the three capabilities (default-deny paths, opt-in paths, edge cases), lazy-warm behavior across both paths, the idle-close watcher, and the platform-context cache.tests/sandbox/test_compatibility_guards.pyparametrized for the new options / state / discriminator strings / positional-field-order tests.make format,make lint,make typecheck,make testsall green. Coverage holds.shell+apply_patch, manifest materializes, sprite is deleted on shutdown. Also verified the capability flow (made the sprite URL public, snapshotted the workspace, listed checkpoints).