Skip to content

feat: add Sprites as a sandbox provider#3041

Draft
vaurdan wants to merge 9 commits intoopenai:mainfrom
vaurdan:feat/sprites-sandbox
Draft

feat: add Sprites as a sandbox provider#3041
vaurdan wants to merge 9 commits intoopenai:mainfrom
vaurdan:feat/sprites-sandbox

Conversation

@vaurdan
Copy link
Copy Markdown

@vaurdan vaurdan commented Apr 28, 2026

This PR adds Sprites (Fly.io's sandbox-VM-as-a-service platform) as a first-class sandbox backend, peer to E2B / Modal / Vercel / Daytona / Runloop / Blaxel / Cloudflare. It uses the same BaseSandboxClient / BaseSandboxSession contract; same WorkspaceShellCapability / Filesystem() / apply_patch / PTY surface, and the same opt-in extra (pip install "openai-agents[sprites]").

The new module lives at agents.extensions.sandbox.sprites and ships:

  • SpritesSandboxClient / SpritesSandboxClientOptions — token from SPRITES_API_TOKEN env var or kwarg, base URL from SPRITES_API_URL.
  • SpritesSandboxSession / SpritesSandboxSessionState — exec runs over the multiplexed ControlConnection WebSocket from sprites-py (no sync thread blocking); fs read/write via SpriteFilesystem; PTY parity with E2B; tar-based workspace persistence.
  • Lifecycle is "ephemeral by default, named-attach when options.sprite_name is set" — both modes verified end-to-end.
  • Three opt-in capabilities for Sprites-specific affordances:
    • SpritesPlatformContext() — auto-injects /.sprite/llm.txt into the system prompt so the model knows about sprite-env services, checkpoints, URL routing, security rules. Module-cached per sprite so purely-conversational turns don't re-execute cat.
    • SpritesUrlAccess(allow_public=False) — adds a set_sprite_url_visibility(visibility) tool. Default-deny on "public" — apps must explicitly opt in.
    • SpritesCheckpoints(allow_restore=False) — adds create_sprite_checkpoint, list_sprite_checkpoints, and (when allow_restore=True) restore_sprite_checkpoint tools. Wraps sprites-py's iterator-streaming checkpoint API in asyncio.to_thread.
  • Lazy wake-up across both lifecycle paths: _ensure_sprite() skips the wait-for-running poll; the first I/O operation drives the wake-up via _ensure_warm().
  • Idle close after idle_close_seconds=60.0 (configurable, set to 0 to disable): closes pooled control connections so the sprite drops back to warm and stops accruing running-state cost. Next I/O reopens.
  • These guarantee that the sprite is in the warm state while it's not needed (after 60 seconds of inactivity), and quickly wakes up as soon as the agent tries to use it.

Compatibility

Pure additive. New optional extra ([sprites]), new sub-package, no changes to RunState schema, no reordering of existing public dataclass fields. All six entries in tests/sandbox/test_compatibility_guards.py parametrized for the new types.

Test plan

  • 119 new unit tests in tests/extensions/test_sandbox_sprites.py covering: options/state roundtrip, env-var auth resolution, ephemeral and named-attach lifecycles, exec mapping (success / timeout / transport error), PTY (start / write_stdin / terminate / finalize), exposed-port resolution and validation, fs read/write with error mapping, tar-based persistence, the three capabilities (default-deny paths, opt-in paths, edge cases), lazy-warm behavior across both paths, the idle-close watcher, and the platform-context cache.
  • All six entries in tests/sandbox/test_compatibility_guards.py parametrized for the new options / state / discriminator strings / positional-field-order tests.
  • make format, make lint, make typecheck, make tests all green. Coverage holds.
  • Smoke-tested end-to-end against a real Sprites org: ephemeral session creates a sprite, agent uses native shell + apply_patch, manifest materializes, sprite is deleted on shutdown. Also verified the capability flow (made the sprite URL public, snapshotted the workspace, listed checkpoints).

vaurdan added 6 commits April 27, 2026 14:24
Adds Sprites (https://sprites.dev) as a first-class sandbox backend, peer
to E2B/Modal/Vercel/etc. The provider implements the full BaseSandboxClient
and BaseSandboxSession contracts: lifecycle (ephemeral + named-attach),
exec via sprites-py async ControlConnection, filesystem read/write,
PTY parity with E2B, exposed-port resolution against the sprite's single
public service URL, and tar-based workspace persistence. Includes a
SpritesPlatformContext capability that injects /.sprite/llm.txt into
the agent's instructions so it learns sprite-env services/checkpoints/
URL routing from the platform itself.

Adds the [sprites] optional extra (sprites-py>=0.0.1rc37,<0.2), a mypy
override, parametrized entries in tests/sandbox/test_compatibility_guards.py
for all six guards, a unit suite at tests/extensions/test_sandbox_sprites.py,
a sprites_runner.py example, the docs/sandbox/clients.md provider table
entries, and the docs/ref/extensions/sandbox/sprites/sandbox.md ref page.
Adds two opt-in capabilities to agents.extensions.sandbox.sprites for
platform-specific affordances that the in-VM CLI cannot reach:

- SpritesUrlAccess(allow_public=False): exposes a
  set_sprite_url_visibility(visibility="public" | "sprite") tool that
  calls Sprite.update_url_settings via the SDK's authenticated client.
  Default-deny on "public" — apps must explicitly set allow_public=True
  to expose the option to the agent. Closes the loop where models
  fumble between unauthenticated `sprite update` and `sprite-env curl`
  attempts when asked to make the URL public.

- SpritesCheckpoints(allow_restore=False): exposes
  create_sprite_checkpoint(comment), list_sprite_checkpoints(), and
  (when allow_restore=True) restore_sprite_checkpoint(id). Wraps the
  sync NDJSON-streaming sprites-py iterator in asyncio.to_thread and
  filters the platform's "Current" sentinel from the create result so
  the model gets the actual saved snapshot id (e.g. "v1") back.
  Restore is destructive (replaces the workspace) so it stays gated
  off until the application opts in.

Capability.bind() receives the runtime SandboxSession wrapper, so the
shared _resolve_sprite_handle helper now steps through ``_inner`` to
reach the SpritesSandboxSession before reading ``_sprite``.

Updates compat-guard exports, package re-exports, and adds 11 unit
tests. mypy / ruff / 110-test focused suite all green.
When attaching to an existing sprite (created_by_us=False),
SpritesSandboxSession now skips the eager wait-for-running poll
and lets the first I/O operation drive the wake-up via a new
_ensure_warm() guard. The platform auto-wakes a paused sprite when
traffic arrives, so this is essentially free — and avoids paying
1–10s of polling latency just to hand back a session handle.

The created-by-us path is unchanged: a fresh sprite still needs a
provisioning poll, and we set _warmth_verified=True after.

A new _invalidate_warmth() hook lets recovery flows force a re-poll
after a transport error.

Live timing of named-attach:
- create():       3s → 0.01s
- first exec:     adds the wake-up roundtrip (formerly paid eagerly)
- subsequent I/O: unchanged (cached warm flag)

3 new lazy-warm tests; the _attach test helper now also marks
_warmth_verified=True so existing tests that bypass startup keep
working.
Extends the lazy wait-for-running pattern from named-attach to the
created_by_us=True path. Sprites can transition between cold/warm/
running freely with auto-wake-on-traffic, and ``create_sprite`` already
raises eagerly on platform rejection — so the eager poll on the
ephemeral path was paying ~1.5s per session just to confirm something
the platform implicitly handles.

Now both paths skip the poll in ``_ensure_sprite``; ``_ensure_warm``
runs once on first I/O. ``_resolve_exposed_port`` calls ``_ensure_warm``
explicitly because it needs ``Sprite.url`` populated (which happens
during the post-poll refresh).

TUI startup goes from ~7.6s to ~6.0s. The ~1.5s saved is exactly the
poll we now skip; the wake-up cost shifts to the agent's first exec,
where it overlaps with model thinking time and is invisible to the user.

Updates ``test_create_ephemeral_sprite`` to assert no eager get_sprite
poll fires, mirroring the named-attach assertion.
Adds an idle-close watcher to SpritesSandboxSession that closes pooled
control connections after ``idle_close_seconds`` of no I/O. Once the
last WS closes, the sprite drops back to ``warm`` and stops accruing
running-state cost; the next I/O reopens a connection and the platform
auto-wakes the sprite on traffic arrival (~1s wake-up).

The watcher is a single asyncio Task scheduled on first activity. Each
I/O hook calls _touch_activity(), which (a) updates the last-activity
timestamp and (b) respawns the watcher if it has previously exited.
The loop sleeps until the deadline, re-checks (since activity may have
shifted the deadline forward), and exits after firing the close so the
next activity gets a fresh task.

PTY operations skip the close (their connections must stay open).
Shutdown cancels and awaits the watcher before tearing down.

New public knob:
- SpritesSandboxClientOptions.idle_close_seconds (default 60.0)
- SpritesSandboxSessionState.idle_close_seconds
Set to 0 (or negative) to disable. Field appended at end of both
classes; compat-guard parametrize entries updated.

4 new unit tests cover the watcher closing, the disabled mode, the
PTY-active skip, and the activity-resets-deadline behavior.
``Capability.clone`` runs every agent turn and resets per-instance
attribute state, so the previous ``_cached_text`` PrivateAttr never
hit — every turn re-executed ``cat /.sprite/llm.txt`` to inject
the platform doc, waking the sprite even when the model never made
a tool call.

Promotes the cache to module scope keyed by ``(sprite_name, path)``.
Survives across all clones for the same sprite, so the file lands
exactly once per sprite for the life of the process. Adds
``clear_platform_context_cache`` for applications that need to force
a re-fetch (e.g. after a sprite image upgrade).

Adds an autouse fixture in the test suite that clears the cache
between tests so state doesn't leak. 2 new regression tests cover
the cross-clone caching and the explicit clear path.
@github-actions github-actions Bot added dependencies documentation Improvements or additions to documentation enhancement New feature or request feature:extensions feature:sandboxes project labels Apr 28, 2026
Conflict was uv.lock only — regenerated with `uv lock` against the merged
pyproject.toml so all extras (sprites + upstream's recent ones) lock
consistently. pyproject.toml auto-merged: our [sprites] extra and the
upstream 0.14.6→0.14.7 version bump and compaction changes coexist.

Verified: 725 sandbox+sprites tests pass, mypy/ruff clean repo-wide.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 30df3ddecd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +495 to +496
deadline_s = max(0.0, float(self.state.timeout_ms or 0) / 1000.0) or float(
DEFAULT_SPRITES_WAIT_FOR_RUNNING_TIMEOUT_S
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Honor wait_for_running_timeout_s during warm-up polling

SpritesSandboxClientOptions exposes wait_for_running_timeout_s as the readiness-poll timeout, but warm-up currently computes deadline_s from state.timeout_ms instead. Because wait_for_running_timeout_s is never consumed here, callers cannot control the poll window as documented (they always get the default or an unrelated timeout_ms value), which can cause unexpected startup delays/timeouts in production.

Useful? React with 👍 / 👎.

Comment on lines +1305 to +1307
timeout_ms=options.timeout_ms,
workspace_persistence=options.workspace_persistence,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Propagate idle_close_seconds into created session state

create() builds SpritesSandboxSessionState without passing options.idle_close_seconds, so every new session silently falls back to the model default (60.0s) regardless of caller configuration. This breaks the options contract and prevents applications from disabling or tightening idle close behavior for cost/control reasons.

Useful? React with 👍 / 👎.

Comment on lines +436 to +439
await self._close_idle_control_connections()
# Watcher exits; the next I/O calls ``_touch_activity`` which
# will respawn it.
return
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep idle watcher alive when PTY sessions are active

The idle watcher exits unconditionally right after _close_idle_control_connections(). When a PTY is active, _close_idle_control_connections() returns early by design, but the watcher still terminates; after that PTY ends, there is no running watcher left to close pooled control connections, so the sprite can remain in a billable running state until another I/O event happens.

Useful? React with 👍 / 👎.

@alfozan alfozan self-requested a review April 28, 2026 21:30
@alfozan
Copy link
Copy Markdown
Collaborator

alfozan commented Apr 28, 2026

Thanks; just emailed re integration testing credentials

inner = SpritesSandboxSession.from_state(state, token=self._token, base_url=self._base_url)
try:
await inner._ensure_sprite()
inner._set_start_state_preserved(True)
Copy link
Copy Markdown
Collaborator

@alfozan alfozan Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you distinguish “reattached existing sprite” from “created a replacement sprite” here?

For created_by_us=True, _ensure_sprite() creates a new sprite when _sprite is empty. If this is a replacement after the original ephemeral sprite was deleted, resume() still calls _set_start_state_preserved(True). With NoopSnapshot and state.workspace_root_ready=True from the previous run, BaseSandboxSession.start() can treat the fresh sprite as preserved and skip the full durable manifest apply.

I think replacement should clear workspace_root_ready and not mark start state as preserved; only a true reattach to an existing sprite should do that.

env=dict(options.env or {}) or None,
timeout_ms=options.timeout_ms,
workspace_persistence=options.workspace_persistence,
)
Copy link
Copy Markdown
Collaborator

@alfozan alfozan Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like wait_for_running_timeout_s and idle_close_seconds are exposed on SpritesSandboxClientOptions, but not copied into SpritesSandboxSessionState here. _wait_for_sprite_running() also uses timeout_ms/default rather than wait_for_running_timeout_s.

Can you either wire these through fully, including tests from options -> state -> behavior, or remove the public knobs until they are implemented? Right now callers cannot actually tune the advertised wait timeout or disable idle close through client options.

try:
control = await self._ensure_control()
try:
op_conn = await control.start_op(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see manifest.environment being resolved or applied for Sprites exec/PTY/create. Other hosted backends pass manifest env at sandbox creation or inject it into command execution, so manifests that rely on env vars will behave differently here.

Can Sprites apply the merged env, or explicitly reject/document env as unsupported? Silent no-op behavior seems hard to debug.

@alfozan
Copy link
Copy Markdown
Collaborator

alfozan commented Apr 28, 2026

Could you add cloud mount support for Sprites as part of this provider?

Please add a Sprites mount strategy and focused tests for the common bucket flows, or link a clearly tracked follow-up if cloud mounts should land separately.

@seratch seratch changed the title feat: Add Sprites as a sandbox provider. feat: Add Sprites as a sandbox provider Apr 28, 2026
@seratch seratch changed the title feat: Add Sprites as a sandbox provider feat: add Sprites as a sandbox provider Apr 28, 2026
@seratch seratch marked this pull request as draft April 29, 2026 01:39
vaurdan added 2 commits April 29, 2026 15:32
The platform-context framing now tells the agent that ``sprite-env services
create`` ignores the cwd of the calling shell — services launch with the
host's home directory as cwd by default, which silently breaks any service
the agent tries to expose unless it's pointed at the workspace explicitly.
The framing also picks up the actual ``manifest.root`` so the example uses
the agent's real workspace path. Two new tests cover the warning text and
that the framing reflects a non-default manifest root.
…ounts

Adds an rclone-backed mount strategy for AWS S3 / Cloudflare R2 / Google
Cloud Storage / Azure Blob / Box mounts on Sprites sandboxes. Mirrors the
E2B / Daytona / Runloop pattern: lazy-installs ``rclone`` and the ``fuse``
package via ``sudo -n apt-get`` if they aren't preinstalled in the sprite
image, writes a per-session rclone config, runs ``rclone mount`` in daemon
mode, and tears down via ``fusermount -u`` on session/snapshot stop.

Sprite VMs run as the unprivileged ``sprite`` user with passwordless sudo;
``SpritesSandboxSession.exec`` rejects ``user=`` kwargs, so the install
path prefixes ``sudo -n`` rather than escalating through the framework.

Two SDK-side workarounds for current platform behavior:

1. **Stdout sentinels for tool detection.** sprite-env's WS control
   protocol currently ships ``op.complete`` with no ``exitCode`` field, so
   ``ExecResult.ok()`` always reports success. The strategy's detection
   commands wrap the conditional in ``if … then echo __SPRITES_PRESENT__;
   else echo __SPRITES_MISSING__; fi`` so stdout drives the decision
   instead of the (currently unreliable) exit code. The platform-side
   fix is tracked in sprite-env#446; once that lands, ``ExecResult.ok()``
   becomes accurate again, and the stdout-sentinel approach remains valid
   (it's strictly more robust than exit-code checks anyway).

2. **Post-mount dir-cache warmup.** ``rclone mount --daemon`` forks and
   the parent returns immediately; FUSE's first ``readdir`` on the
   mountpoint root then races the daemon's first remote listing fetch
   and can briefly observe an empty directory. ``_verify_mount_active``
   uses ``mountpoint -q`` (with a 5s poll) to confirm the bind landed,
   then issues a throw-away ``ls`` to prime rclone's dir cache before
   handing control back to the caller.

Discriminator: ``"sprites_cloud_bucket"``. Registered through the
polymorphic ``MountStrategyBase`` registry. Compat-guard parametrize
entries pin the public export and the discriminator string.

Twenty unit tests in ``tests/extensions/test_sandbox_sprites.py`` cover
the rclone-installed path, the lazy-install path, the silent-no-op
recovery path, FUSE-support detection, the per-session pattern
adjustment, the session-type guard, the post-mount verification (mounted
+ not-mounted + warmup-ordering), and JSON roundtrip through the
manifest and registry. ``docs/sandbox/clients.md`` storage-entries
matrix updated to show ✓ for S3 / R2 / GCS / Azure / Box on Sprites.
Verified end-to-end against a Tigris (S3-compatible) read-only bucket.
@vaurdan vaurdan force-pushed the feat/sprites-sandbox branch from 8768979 to 0d8fc85 Compare April 29, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies documentation Improvements or additions to documentation enhancement New feature or request feature:extensions feature:sandboxes project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants