Skip to content

feat(rpm): add RPM packaging with Packit/COPR and GHA release publishing#1126

Open
maxamillion wants to merge 18 commits intoNVIDIA:mainfrom
maxamillion:rpm
Open

feat(rpm): add RPM packaging with Packit/COPR and GHA release publishing#1126
maxamillion wants to merge 18 commits intoNVIDIA:mainfrom
maxamillion:rpm

Conversation

@maxamillion
Copy link
Copy Markdown
Collaborator

Summary

Add RPM packaging for OpenShell using Fedora's Packit/COPR ecosystem and integrate RPM build artifacts into GitHub Actions release pipelines. This provides native Fedora/EPEL RPM packages for the CLI, gateway server (with Podman driver and mTLS), and Python SDK.

Changes

RPM Packaging Foundation

  • Add openshell.spec producing three sub-packages: openshell (CLI), openshell-gateway (server + systemd user unit + PKI scripts), and python3-openshell (SDK)
  • Add .packit.yaml for Packit service integration with COPR builds on PRs, main commits, and releases
  • Add deploy/rpm/init-pki.sh for auto-generating mTLS PKI on first gateway start
  • Add deploy/rpm/init-gateway-env.sh for auto-generating gateway environment config
  • Add man pages: openshell.1, openshell-gateway.8, openshell-gateway.env.5
  • Add RPM-specific documentation: QUICKSTART.md, CONFIGURATION.md, TROUBLESHOOTING.md

GHA Release Integration

  • Add .github/workflows/rpm-package.yml reusable workflow using packit build locally in Fedora containers
  • Integrate RPM builds into release-dev.yml and release-tag.yml alongside existing deb packages
  • RPMs are built independently from source (no dependency on pre-built binary jobs), uploaded as GHA artifacts, and published in GitHub Releases

Podman Driver Enhancements

  • Enable mTLS by default for RPM gateway packaging
  • Rename tls_ca/cert/key to guest_tls_ca/cert/key for clarity
  • Add client_lifecycle_managed field to GatewayMetadata for correct destroy/stop behavior on non-Docker systems
  • Make gateway bind address configurable via OPENSHELL_BIND_ADDRESS
  • Fix gateway destroy on non-Docker systems failing looking for docker.sock
  • Skip cert extraction when TLS certs already exist on disk
  • Warn on http:// registration when mTLS certs exist
  • Resolve mTLS cert path mismatch and SELinux bind-mount denial
  • Consolidate supervisor and supervisor-sideload into single image

Testing

  • Local packit build locally validated in Fedora container
  • cargo check passes
  • cargo clippy passes
  • cargo fmt passes
  • E2E tests (requires CI run)

Checklist

  • Follows Conventional Commits
  • Architecture docs updated
  • Commits are signed off (DCO)

maxamillion added 18 commits May 1, 2026 14:02
Add .packit.yaml and openshell.spec for building RPMs via Fedora COPR.
Produces three sub-packages: openshell (CLI), openshell-gateway (server
with system and user systemd units), and python3-openshell (SDK). The
user unit supports rootless Podman operation out of the box.
- Restrict sysconfig to 0640 (contains SSH handshake secret)
- Auto-generate SSH handshake secret in %post on fresh install
- Add security warning about TLS-disabled exposure in sysconfig
- Add systemd hardening (NoNewPrivileges, ProtectSystem, PrivateTmp,
  RestrictAddressFamilies) to both system and user units
- Make user unit self-contained with inline Environment= defaults
  instead of reading the system sysconfig (which contains secrets)
The user unit was missing OPENSHELL_SSH_HANDSHAKE_SECRET since it no
longer reads the system sysconfig. Add an ExecStartPre that generates
a random secret into ~/.config/openshell/gateway.env on first start,
and an EnvironmentFile directive to read it back.
Socket activation was unreliable on some Fedora systems, causing the
gateway to fail with 'Connection refused' on the podman socket. Switch
to Requires=podman.service + After=podman.service so the podman API
server is running before the gateway starts.
Introduce supervisor-sideload image alongside the existing supervisor
image in CI workflows, docker build scripts, and RPM packaging. The
sideload variant (FROM scratch) is intended for Podman image-volume
mounts used by the RPM-packaged gateway.
…ND_HOST

The gateway previously hardcoded 0.0.0.0 for all listeners. Add a
--host / OPENSHELL_BIND_HOST parameter so the bind address can be
set at runtime. The binary default remains 0.0.0.0 for backward
compatibility, but the RPM sysconfig and systemd units default to
127.0.0.1 since the RPM targets single-host deployments that should
not expose the unauthenticated API on the network.
…ker.sock

gateway add registrations on loopback were misclassified as embedded
Docker containers, causing destroy/stop to attempt Docker socket
connections on systems using Podman or other external runtimes.

Add client_lifecycle_managed field to GatewayMetadata to distinguish
gateway start deployments (client-managed) from gateway add
registrations (externally-managed). Destroy/stop now skips container
operations for externally-managed gateways. Legacy metadata without
the field preserves existing behavior.

Also gitignore RPM build artifacts (*.src.rpm, *.tar.gz, *.tar.xz).
… single image

The supervisor-sideload image (FROM scratch, binary only) was a separate
build artifact for the Podman driver's image-volume mount. The Ubuntu-based
supervisor image served the Docker driver's binary extraction path.

Both use cases work with a single FROM scratch image: the Docker driver
extracts the binary via the container archive API (never starts the
container), and the Podman driver mounts the image filesystem directly.

Consolidate to a single supervisor image (FROM scratch) and remove all
supervisor-sideload references from CI workflows, build tasks, RPM spec,
e2e tests, and architecture docs. The supervisor-output build target is
retained as a backward-compat alias.
The Podman driver previously ran with TLS disabled, and the RPM bound
the gateway to 127.0.0.1 to limit exposure. However, loopback-bound
services are unreachable from Podman containers in both bridge and
pasta networking modes, breaking sandbox-to-gateway communication.

Instead of fighting the container networking model, solve the security
concern directly: bind to 0.0.0.0 with mTLS enabled by default.

PKI bootstrap:
- Add deploy/rpm/init-pki.sh that generates a self-signed CA, server
  cert, and client cert using OpenSSL on first gateway start
- Called from ExecStartPre in the systemd user unit (idempotent)
- Client certs copied to CLI auto-discovery directory so the CLI
  connects with mTLS without manual setup

Podman driver TLS injection:
- Add tls_ca/tls_cert/tls_key config fields to PodmanComputeConfig
- Bind-mount client certs into sandbox containers at
  /etc/openshell/tls/client/ (read-only)
- Set OPENSHELL_TLS_CA/CERT/KEY env vars so the supervisor connects
  with mTLS
- Auto-detect endpoint scheme (http vs https) based on TLS config

RPM packaging:
- Revert OPENSHELL_BIND_HOST from 127.0.0.1 to 0.0.0.0
- Remove OPENSHELL_DISABLE_TLS=true from defaults
- Add TLS cert path env vars for gateway and Podman driver
- Add init-pki.sh to /usr/libexec/openshell/
- Add GATEWAY-CONFIG.md documentation to /usr/share/doc/
- Add openssl as a dependency
Align the Podman driver's TLS config field names with the Docker and
VM drivers, which use guest_tls_ca/guest_tls_cert/guest_tls_key for
client certificates injected into sandbox containers.
gateway add --local tries to extract TLS certificates from a Docker
container, which fails on RPM/systemd deployments where the gateway
runs as a native service (not in a container). When init-pki.sh has
already provisioned client certs to the CLI auto-discovery directory,
skip the Docker-based extraction and use the existing certs.
…date connectivity

gateway add previously stored metadata without verifying the gateway
was reachable, and silently accepted http:// endpoints even when mTLS
client certs were already on disk (e.g., from RPM init-pki.sh).

Add two validations to gateway_add:

1. When registering an http:// endpoint and mTLS certs exist for the
   gateway name, warn the user and suggest the https:// equivalent
   (with --local for loopback endpoints).

2. After storing metadata, perform a non-fatal health check against
   the gateway. If unreachable, print a warning. This catches scheme
   mismatches, wrong ports, and unreachable hosts at registration
   time rather than at first sandbox create.
The cert-on-disk checks in gateway_add used the raw URL-derived
hostname (e.g., 'localhost') to look up client certs, but init-pki.sh
writes them under the 'openshell' gateway name. The TLS resolver in
tls.rs maps localhost/127.0.0.1 to 'openshell' — the cert detection
was not applying the same mapping.

Extract mtls_certs_exist_for_endpoint() which applies the loopback ->
'openshell' name mapping, replacing two identical inline cert checks.
This fixes the mTLS warning not firing for http://localhost:8080 and
the cert extraction skip not working for https://localhost:8080.
…enial

Two fixes for RPM/systemd gateway deployments on Fedora/RHEL:

1. Gateway name derivation: loopback endpoints (localhost, 127.0.0.1,
   ::1) now derive the canonical 'openshell' gateway name when no
   --name is provided. This matches the convention used by init-pki.sh,
   default_tls_dir, mtls_certs_exist_for_endpoint, and bootstrap.
   Previously the raw hostname was used (e.g. 'localhost'), causing
   tls_dir_for_gateway to look in gateways/localhost/mtls/ while
   init-pki.sh placed certs in gateways/openshell/mtls/.

2. SELinux bind-mount relabeling: TLS cert bind-mounts now include the
   'z' (shared relabel) option when SELinux is enabled. Without this,
   SELinux MAC policy denies the container process access to the
   bind-mounted cert files, causing 'failed to read CA cert' errors
   on Fedora/RHEL where SELinux is enforcing by default. Detection
   checks /sys/fs/selinux presence to cover both enforcing and
   permissive modes, matching Podman's own behavior.
…ployment

Remove the system unit and /etc/sysconfig file -- the RPM now ships
only the systemd user unit for rootless Podman operation. Replace the
single GATEWAY-CONFIG.md with three focused guides (QUICKSTART,
CONFIGURATION, TROUBLESHOOTING) covering prerequisites, gateway
registration, provider setup, CLI compatibility, remote access,
air-gap deployment, and upgrade procedures.

Add init-gateway-env.sh to generate a well-commented gateway.env with
an auto-generated SSH handshake secret on first start, replacing the
inline ExecStartPre one-liner. Fix the systemd unit dependency from
podman.service to podman.socket for proper socket activation.

Add man pages for openshell(1), openshell-gateway(8), and
openshell-gateway.env(5) built from pandoc markdown sources in
deploy/man/, shared across packaging formats.
Fix GatewayMetadata struct literals that were auto-merged without
the OIDC fields added in main. Use ..Default::default() consistently.
Also fix clippy map_unwrap_or lint in mTLS cert detection and remove
extra blank line in cli.rs from conflict resolution.
Integrate cargo-rpm-macros >= 25 to automatically generate
Provides: bundled(crate(...)) metadata required for Fedora
package review. Replaces manual .cargo/config.toml with
%cargo_prep -v vendor and adds %cargo_vendor_manifest and
%cargo_license macros to produce cargo-vendor.txt and
LICENSE.dependencies at build time.
Build RPMs from source using packit build locally in a Fedora
container, mirroring the Debian package workflow pattern. RPMs are
built independently (no dependency on pre-built binary jobs), uploaded
as GHA artifacts, and included in GitHub Releases alongside debs.

The existing Packit/COPR build path (.packit.yaml) is unchanged and
continues to serve Fedora repository consumers independently.
@maxamillion maxamillion requested a review from a team as a code owner May 1, 2026 22:05
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

gateway_endpoint: endpoint.to_string(),
is_remote: true,
auth_mode: Some("cloudflare_jwt".to_string()),
client_lifecycle_managed: Some(false),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making a mental note here. we'll remove this once we remove openshell gateway bootstrapping commands.

Comment on lines -322 to +329
Gateway (host, port 8080)
Gateway (host, 0.0.0.0:8080, mTLS enabled)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discused offline, i'll see if we can followup and remove the 0.0.0.0 bind with something more restricted by default. we'll have the gateway listen on two interfaces, 127.0.0.1 and another for the sandbox network.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants