Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion architecture/build-containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The gateway runs the control plane API server. It is deployed as a StatefulSet i
- **Docker target**: `gateway` in `deploy/docker/Dockerfile.images`
- **Registry**: `ghcr.io/nvidia/openshell/gateway:latest`
- **Pulled when**: Cluster startup (the Helm chart triggers the pull)
- **Entrypoint**: `openshell-gateway --port 8080` (gRPC + HTTP, mTLS)
- **Entrypoint**: `openshell-gateway --bind-address 0.0.0.0 --port 8080` (gRPC + HTTP, mTLS)

## Cluster (`openshell/cluster`)

Expand Down
4 changes: 2 additions & 2 deletions architecture/gateway-security.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,9 +304,9 @@ Traffic flows through several layers from the host to the gateway process:
| Container | `30051` | Hardcoded in `crates/openshell-bootstrap/src/docker.rs` |
| k3s NodePort | `30051` | `deploy/helm/openshell/values.yaml` (`service.nodePort`) |
| k3s Service | `8080` | `deploy/helm/openshell/values.yaml` (`service.port`) |
| Server bind | `8080` | `--port` flag / `OPENSHELL_SERVER_PORT` env var |
| Server bind | `0.0.0.0:8080` in deployed containers | `--bind-address 0.0.0.0 --port 8080` / `OPENSHELL_BIND_ADDRESS` + `OPENSHELL_SERVER_PORT` |

Docker maps `host_port → 30051/tcp`. Inside k3s, the NodePort service maps `30051 → 8080 (pod port)`. The server binds `0.0.0.0:8080`.
Docker maps `host_port → 30051/tcp`. Inside k3s, the NodePort service maps `30051 → 8080 (pod port)`. The deployed gateway container binds `0.0.0.0:8080` explicitly.

## Security Model Summary

Expand Down
18 changes: 12 additions & 6 deletions architecture/gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,23 +107,25 @@ The gateway boots in `cli::run_cli` (`crates/openshell-server/src/cli.rs`) and p
- `docker` constructs `openshell-driver-docker` in-process and manages local containers labeled with the configured sandbox namespace.
- `vm` spawns the standalone `openshell-driver-vm` binary as a local compute-driver process, resolves it from `--driver-dir`, conventional libexec install paths, or a sibling of the gateway binary, connects to it over a Unix domain socket, and keeps the libkrun/rootfs runtime out of the gateway binary.
3. Build `ServerState` (shared via `Arc<ServerState>` across all handlers), including a fresh `SupervisorSessionRegistry`.
4. **Spawn background tasks**:
4. Resume persisted sandboxes that were stopped during the previous gateway shutdown.
5. **Spawn background tasks**:
- `ComputeRuntime::spawn_watchers` -- consumes the compute-driver watch stream, republishes platform events, and runs a periodic `ListSandboxes` snapshot reconcile.
- `ssh_tunnel::spawn_session_reaper` -- sweeps expired or revoked SSH session tokens from the store hourly.
- `supervisor_session::spawn_relay_reaper` -- sweeps orphaned pending relay channels every 30 seconds.
5. Create `MultiplexService`.
6. Bind `TcpListener` on `config.bind_address`.
7. Optionally create `TlsAcceptor` from cert/key files.
8. Enter the accept loop: for each connection, spawn a tokio task that optionally performs a TLS handshake, then calls `MultiplexService::serve()`.
6. Create `MultiplexService`.
7. Bind the primary gateway listener and any compute-driver requested listeners. Docker requests the Docker bridge gateway address with the normal gateway port, so sandbox containers can call back over the bridge without joining the host network.
8. Bind optional health and metrics listeners.
9. Optionally create `TlsAcceptor` from cert/key files.
10. Spawn a task per gateway listener. Each accepted connection optionally performs a TLS handshake, then calls `MultiplexService::serve()`.

## Configuration

All configuration is via CLI flags with environment variable fallbacks. The `--db-url` flag is required.

| Flag | Env Var | Default | Description |
|------|---------|---------|-------------|
| `--bind-address` | `OPENSHELL_BIND_ADDRESS` | `127.0.0.1` | IP address for gateway, health, and metrics listeners. Container deployments pass `0.0.0.0` explicitly. |
| `--port` | `OPENSHELL_SERVER_PORT` | `8080` | TCP listen port |
| `--bind-address` | `OPENSHELL_BIND_ADDRESS` | `0.0.0.0` | Address for the main gateway listener |
| `--log-level` | `OPENSHELL_LOG_LEVEL` | `info` | Tracing log level filter |
| `--tls-cert` | `OPENSHELL_TLS_CERT` | None | Path to PEM certificate file |
| `--tls-key` | `OPENSHELL_TLS_KEY` | None | Path to PEM private key file |
Expand All @@ -136,6 +138,7 @@ All configuration is via CLI flags with environment variable fallbacks. The `--d
| `--sandbox-image` | `OPENSHELL_SANDBOX_IMAGE` | None | Default container image for sandbox pods |
| `--grpc-endpoint` | `OPENSHELL_GRPC_ENDPOINT` | None | gRPC endpoint reachable from within the cluster (for supervisor callbacks) |
| `--drivers` | `OPENSHELL_DRIVERS` | `kubernetes` | Compute backend to use. Current options are `kubernetes`, `docker`, and `vm`. |
| `--docker-network-name` | `OPENSHELL_DOCKER_NETWORK_NAME` | `openshell-docker` | Docker bridge network that local Docker sandboxes join |
| `--vm-driver-state-dir` | `OPENSHELL_VM_DRIVER_STATE_DIR` | `target/openshell-vm-driver` | Host directory for VM sandbox rootfs, console logs, and runtime state |
| `--driver-dir` | `OPENSHELL_DRIVER_DIR` | unset | Override directory for `openshell-driver-vm`. When unset, the gateway searches `~/.local/libexec/openshell`, `/usr/libexec/openshell`, `/usr/local/libexec/openshell`, `/usr/local/libexec`, then a sibling binary. |
| `--vm-krun-log-level` | `OPENSHELL_VM_KRUN_LOG_LEVEL` | `1` | libkrun log level for VM helper processes |
Expand Down Expand Up @@ -608,6 +611,9 @@ The gateway reaches the sandbox exclusively through the supervisor-initiated `Co
The Docker driver (`crates/openshell-driver-docker/src/lib.rs`) is an in-process compute backend for local standalone gateways. It creates one Docker container per sandbox, labels each container with `openshell.ai/managed-by=openshell`, `openshell.ai/sandbox-id`, `openshell.ai/sandbox-name`, and `openshell.ai/sandbox-namespace`, and bind-mounts a Linux `openshell-sandbox` supervisor binary into the container.

- **Create**: Pulls or validates the sandbox image according to `sandbox_image_pull_policy`, creates a labeled container, mounts the supervisor binary and optional TLS material, and starts the container with the supervisor as entrypoint.
- **Bridge networking**: Ensures a local Docker bridge network exists (`openshell-docker` by default) and starts every sandbox container on that network instead of using `network_mode=host`.
- **Gateway callback routing**: On native Linux Docker, injects `host.openshell.internal` with the bridge gateway IP and reports that bridge gateway IP plus the normal gateway port to `run_server()` as an extra listener. If the primary listener already binds the wildcard address for that port, the extra address is covered and is not bound a second time. On Docker Desktop, the bridge gateway IP belongs to Docker Desktop's VM rather than the macOS/Windows host, so the driver maps `host.openshell.internal` to Docker's `host-gateway` alias and does not request an extra listener. `OPENSHELL_ENDPOINT` inside Docker sandboxes uses the configured scheme and points at `host.openshell.internal:<gateway-port>` in both cases.
- **Environment ownership**: Merges template and spec environment first, then overwrites driver-owned supervisor variables, including `PATH`, `OPENSHELL_ENDPOINT`, `OPENSHELL_SANDBOX_ID`, `OPENSHELL_SSH_SOCKET_PATH`, and `OPENSHELL_SANDBOX_COMMAND`. This keeps privileged supervisor setup from resolving helper binaries through a user-controlled search path.
- **List/Get/Watch**: Reads labeled containers in the configured sandbox namespace and derives driver-native sandbox status from Docker state plus supervisor relay readiness.
- **Stop**: Stops the matching labeled container without deleting it.
- **Delete**: Force-removes the matching labeled container.
Expand Down
11 changes: 5 additions & 6 deletions architecture/sandbox-custom-containers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The `--from` flag accepts four kinds of input:
| Input | Example | Behavior |
|-------|---------|----------|
| **Community sandbox name** | `--from openclaw` | Resolves to `ghcr.io/nvidia/openshell-community/sandboxes/openclaw:latest` |
| **Dockerfile path** | `--from ./Dockerfile` | Builds the image, pushes it into the cluster, then creates the sandbox |
| **Dockerfile path** | `--from ./Dockerfile` | Builds the image into the host Docker daemon, then creates the sandbox |
| **Directory with Dockerfile** | `--from ./my-sandbox/` | Uses the directory as the build context |
| **Full image reference** | `--from myregistry.com/img:tag` | Uses the image directly |

Expand All @@ -33,8 +33,7 @@ The community registry prefix defaults to `ghcr.io/nvidia/openshell-community/sa
When `--from` points to a Dockerfile or directory, the CLI:

1. Builds the image locally via the Docker daemon (respecting `.dockerignore`).
2. Pushes it into the cluster's containerd runtime using `docker save` / `ctr import`.
3. Creates the sandbox with the resulting image tag.
2. Creates the sandbox with the resulting local image tag.

## How It Works

Expand Down Expand Up @@ -107,18 +106,18 @@ The `openshell-sandbox` supervisor adapts to arbitrary environments:
|----------|-----------|
| Unified `--from` flag | Single entry point for community names, Dockerfiles, directories, and image refs — removes the need to know registry paths |
| Community name resolution | Bare names like `openclaw` expand to the GHCR community registry, making the common case simple |
| Auto build+push for Dockerfiles | Eliminates the two-step `image push` + `create` workflow for local development |
| Auto build for Dockerfiles | Eliminates the two-step `docker build` + `create` workflow for local Docker-backed development |
| `OPENSHELL_COMMUNITY_REGISTRY` env var | Allows organizations to host their own community sandbox registry |
| hostPath side-load | Supervisor binary lives on the node filesystem — no init container, no emptyDir, no extra image pull. Faster pod startup. |
| Read-only mount in agent | The supervisor binary is mounted read-only, and the startup seccomp prelude blocks the remount syscalls that would otherwise reopen it for writes once privileged bootstrap has completed. |
| Command override | Ensures `openshell-sandbox` is the entrypoint regardless of the image's default CMD |
| Clear `run_as_user/group` for custom images | Prevents startup failure when the image lacks the default `sandbox` user |
| Non-fatal log file init | `/var/log/openshell.log` may be unwritable in arbitrary images; falls back to stdout |
| `docker save` / `ctr import` for push | Avoids requiring a registry for local dev; images land directly in the k3s containerd store |
| Host Docker image store | Dockerfile sources build into the host Docker daemon and are referenced by local image tag. |
| Optional `iptables` for bypass detection | Core network isolation works via routing alone (`iproute2`); `iptables` only adds fast-fail (`ECONNREFUSED`) and diagnostic LOG entries. Making it optional avoids hard failures in minimal images that lack `iptables` while giving better UX when it is available. |

## Limitations

- Distroless / `FROM scratch` images are not supported (the supervisor needs glibc and `/proc`)
- Missing `iproute2` (or required capabilities) blocks startup in proxy mode because namespace isolation is mandatory
- The supervisor binary must be present on the k3s node at `/opt/openshell/bin/openshell-sandbox` (embedded in the cluster image at build time)
- Kubernetes gateways require images to be available to the cluster runtime. Dockerfile sources build into the host Docker daemon only; use a registry image reference when the selected gateway cannot access the host Docker image store.
36 changes: 4 additions & 32 deletions crates/openshell-bootstrap/src/build.rs
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
// SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0

//! Build and push container images into a k3s gateway.
//! Build container images from Dockerfiles.
//!
//! This module wraps bollard's `build_image()` API to build a container image
//! from a Dockerfile and build context, then reuses the existing push pipeline
//! to import the image into the gateway's containerd runtime.
//! from a Dockerfile and build context into the local Docker daemon.

use std::collections::HashMap;
use std::path::Path;
Expand All @@ -15,48 +14,21 @@ use bollard::query_parameters::BuildImageOptionsBuilder;
use futures::StreamExt;
use miette::{IntoDiagnostic, Result, WrapErr};

use crate::constants::container_name;
use crate::push::push_local_images;

/// Build a container image from a Dockerfile and push it into the gateway.
///
/// This is used by `openshell sandbox create --from <Dockerfile>`. It:
/// 1. Creates a tar archive of the build context directory.
/// 2. Sends it to the local Docker daemon via `build_image()`.
/// 3. Pushes the resulting image into the gateway's containerd via the
/// existing `push_local_images()` pipeline.
/// Build a container image from a Dockerfile into the local Docker daemon.
#[allow(clippy::implicit_hasher)]
pub async fn build_and_push_image(
pub async fn build_local_image(
dockerfile_path: &Path,
tag: &str,
context_dir: &Path,
gateway_name: &str,
build_args: &HashMap<String, String>,
on_log: &mut impl FnMut(String),
) -> Result<()> {
// 1. Build the image locally.
on_log(format!(
"Building image {tag} from {}",
dockerfile_path.display()
));
build_image(dockerfile_path, tag, context_dir, build_args, on_log).await?;
on_log(format!("Built image {tag}"));

// 2. Push into the gateway.
on_log(format!(
"Pushing image {tag} into gateway \"{gateway_name}\""
));
// Use the long-timeout Docker client so `docker save` of multi-GB images
// doesn't trip the 120s bollard default mid-stream. Override with
// OPENSHELL_DOCKER_TIMEOUT_SECS=<secs>.
let local_docker = crate::docker::connect_local_for_large_transfers()
.into_diagnostic()
.wrap_err("failed to connect to local Docker daemon")?;
let container = container_name(gateway_name);
let images: Vec<&str> = vec![tag];
push_local_images(&local_docker, &local_docker, &container, &images, on_log).await?;

on_log(format!("Image {tag} is available in the gateway."));
Ok(())
}

Expand Down
5 changes: 2 additions & 3 deletions crates/openshell-cli/src/run.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2877,19 +2877,18 @@ async fn build_from_dockerfile(
eprintln!(" {msg}");
};

openshell_bootstrap::build::build_and_push_image(
openshell_bootstrap::build::build_local_image(
dockerfile,
&tag,
context,
gateway_name,
&HashMap::new(),
&mut on_log,
)
.await?;

eprintln!();
eprintln!(
"{} Image {} is available in the gateway.",
"{} Image {} is available in the local Docker daemon.",
"✓".green().bold(),
tag.cyan(),
);
Expand Down
11 changes: 10 additions & 1 deletion crates/openshell-core/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ pub const DEFAULT_SSH_HANDSHAKE_SKEW_SECS: u64 = 300;
/// Default Podman bridge network name.
pub const DEFAULT_NETWORK_NAME: &str = "openshell";

/// Default Docker bridge network name for local sandboxes.
pub const DEFAULT_DOCKER_NETWORK_NAME: &str = "openshell-docker";

/// Default OCI image for the openshell-sandbox supervisor binary.
pub const DEFAULT_SUPERVISOR_IMAGE: &str = "openshell/supervisor:latest";

Expand Down Expand Up @@ -515,7 +518,7 @@ impl Config {
}

fn default_bind_address() -> SocketAddr {
"0.0.0.0:8080".parse().expect("valid default address")
"127.0.0.1:8080".parse().expect("valid default address")
}

fn default_log_level() -> String {
Expand Down Expand Up @@ -589,6 +592,12 @@ mod tests {
assert!(err.contains("unsupported compute driver 'firecracker'"));
}

#[test]
fn config_defaults_to_loopback_bind_address() {
let expected: SocketAddr = "127.0.0.1:8080".parse().expect("valid address");
assert_eq!(Config::new(None).bind_address, expected);
}

#[test]
fn config_new_disables_health_bind_by_default() {
let cfg = Config::new(None);
Expand Down
1 change: 1 addition & 0 deletions crates/openshell-driver-docker/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ bytes = { workspace = true }
bollard = { version = "0.20" }
tar = "0.4"
tempfile = "3"
url = { workspace = true }

[lints]
workspace = true
Loading
Loading