From 13e3a02a12c5ae002b4ded8259ac8aec84cf7c2f Mon Sep 17 00:00:00 2001 From: David O'Keeffe Date: Sat, 16 May 2026 22:18:37 +1000 Subject: [PATCH 1/2] ci(release): generate signed SBOM on each release MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds supply-chain provenance to every GitHub Release so enterprise security teams (PCI-DSS / ISO 27001 / APRA CPS 234) can verify what shipped and prove it came from this repo's workflow. What's attached to each release now: - coda-sbom.cdx.json — CycloneDX SBOM (Python + npm deps via syft) - coda-sbom.cdx.json.cosign.bundle — cosign keyless signature bundle (cert + signature + Rekor inclusion proof) Signing uses GitHub OIDC — no long-lived keys. The signing identity is anchored to this workflow path and the release tag, and a public transparency-log entry is recorded in Rekor. Workflow changes: - Added `id-token: write` permission (required for OIDC keyless signing) - Added anchore/sbom-action step (SHA-pinned, format=cyclonedx-json) - Added sigstore/cosign-installer + sign-blob + in-workflow verify - Extended softprops/action-gh-release `files:` to attach both artefacts README changes: - New "Verifying release provenance" subsection with the cosign verify-blob command operators can run. Co-authored-by: Isaac --- .github/workflows/release.yml | 45 +++++++++++++++++++++++++++++++++++ README.md | 22 +++++++++++++++++ 2 files changed, 67 insertions(+) diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index e9973c9..7ba7d2f 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -14,6 +14,9 @@ jobs: runs-on: databrickslabs-protected-runner-group permissions: contents: write + # id-token: write is required for cosign keyless signing via GitHub OIDC. + # Without it, cosign falls back to interactive auth and the workflow hangs. + id-token: write steps: - name: Checkout @@ -111,6 +114,45 @@ jobs: git tag -a "$TAG" -m "Release $TAG" git push origin "$TAG" + # ----- Supply-chain provenance: SBOM + cosign keyless signature --------- + # Generates a CycloneDX SBOM from the repo (Python + npm package metadata), + # then signs it with cosign using a short-lived OIDC token from GitHub. + # Verifiers can confirm the SBOM came from this workflow at this tag via: + # cosign verify-blob --bundle coda-sbom.cdx.json.cosign.bundle \ + # --certificate-identity-regexp 'https://github.com/databrickslabs/coding-agents-databricks-apps/.+' \ + # --certificate-oidc-issuer https://token.actions.githubusercontent.com \ + # coda-sbom.cdx.json + - name: Generate CycloneDX SBOM + uses: anchore/sbom-action@9f7302141466aa6482940f15371237e9d9f4c34a # v0.20.5 + with: + path: . + format: cyclonedx-json + output-file: coda-sbom.cdx.json + # Don't auto-upload; we attach via softprops below for one consistent release. + upload-artifact: false + upload-release-assets: false + + - name: Install cosign + uses: sigstore/cosign-installer@d7d6e07ee54d2049ce5cdfc7eed4d6a6ccd80f5b # v3.5.0 + with: + cosign-release: v2.4.1 + + - name: Sign SBOM with cosign (keyless OIDC) + run: | + # --yes auto-confirms the Sigstore transparency log entry (Rekor). + # The resulting bundle contains the signature + certificate + Rekor + # inclusion proof in one self-contained file — easier for downstream + # verifiers than separate .sig/.cert files. + cosign sign-blob --yes \ + --bundle coda-sbom.cdx.json.cosign.bundle \ + coda-sbom.cdx.json + # Sanity check: verify what we just signed before publishing. + cosign verify-blob \ + --bundle coda-sbom.cdx.json.cosign.bundle \ + --certificate-identity-regexp 'https://github.com/${{ github.repository }}/.+' \ + --certificate-oidc-issuer https://token.actions.githubusercontent.com \ + coda-sbom.cdx.json + - name: Create GitHub Release uses: softprops/action-gh-release@b4309332981a82ec1c5618f44dd2e27cc8bfbfda # v3 with: @@ -118,3 +160,6 @@ jobs: name: "${{ steps.version.outputs.TAG }}" body: ${{ steps.notes.outputs.NOTES }} prerelease: ${{ inputs.prerelease }} + files: | + coda-sbom.cdx.json + coda-sbom.cdx.json.cosign.bundle diff --git a/README.md b/README.md index f179bed..2227586 100644 --- a/README.md +++ b/README.md @@ -292,6 +292,28 @@ This template repo opens that vision up for every Databricks user — no IDE set Single-user app — the owner is resolved via the app's service principal and Apps API (`app.creator`), with no PAT required at deploy time. Authorization checks `X-Forwarded-Email` against `app.creator`. On first terminal session, the user pastes a short-lived PAT interactively. Tokens auto-rotate every 10 minutes (15-minute lifetime), with old tokens proactively revoked. On restart, the user re-pastes (no persistence by design). +### Verifying release provenance + +Each GitHub Release ships with: + +- `coda-sbom.cdx.json` — CycloneDX SBOM of every Python + npm dependency (generated by [syft](https://github.com/anchore/syft)). +- `coda-sbom.cdx.json.cosign.bundle` — [Sigstore](https://www.sigstore.dev/) keyless signature bundle (cert + signature + Rekor inclusion proof in one file). + +To verify a release came from this repo's release workflow: + +```bash +TAG=v1.0.0 # the release you downloaded +gh release download "$TAG" -p 'coda-sbom.cdx.json*' + +cosign verify-blob \ + --bundle coda-sbom.cdx.json.cosign.bundle \ + --certificate-identity-regexp 'https://github.com/databrickslabs/coding-agents-databricks-apps/.+' \ + --certificate-oidc-issuer https://token.actions.githubusercontent.com \ + coda-sbom.cdx.json +``` + +Signing uses GitHub's OIDC token — no long-lived signing keys exist. The signing identity is anchored to the workflow path + tag ref, and a public transparency-log entry is recorded in Rekor. + ### Gunicorn Production uses `workers=1` (PTY state is process-local), `threads=16` (concurrent polling + WebSocket), `gthread` worker class, `timeout=60` (long-lived WebSocket connections). From 1f084a8ef3bc956f0cc35241f168516c55fd594c Mon Sep 17 00:00:00 2001 From: mpkrass7 Date: Tue, 19 May 2026 16:52:19 -0400 Subject: [PATCH 2/2] docs: move SBOM verification to docs/SECURITY.md, drop stale plans MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Keeps the README lean — release-provenance details now live in docs/SECURITY.md where security reviewers expect them. Also removes docs/plans/, which held historical design/implementation notes for features that have already shipped. Co-authored-by: Isaac --- README.md | 22 +- docs/SECURITY.md | 23 + .../plans/2025-02-03-bundled-skills-design.md | 170 ---- docs/plans/2026-02-02-web-terminal-design.md | 294 ------ .../2026-02-02-web-terminal-implementation.md | 398 -------- .../2026-02-03-session-timeout-design.md | 113 --- .../plans/2026-02-03-workspace-sync-design.md | 194 ---- .../2026-03-08-multi-tab-terminals-design.md | 103 -- ...3-08-multi-tab-terminals-implementation.md | 946 ------------------ ...-11-litellm-empty-content-blocks-design.md | 156 --- ...-03-27-pat-auto-rotation-implementation.md | 510 ---------- .../2026-03-28-session-detach-reconnect.md | 119 --- docs/plans/PLAN-issue-8.md | 119 --- 13 files changed, 24 insertions(+), 3143 deletions(-) create mode 100644 docs/SECURITY.md delete mode 100644 docs/plans/2025-02-03-bundled-skills-design.md delete mode 100644 docs/plans/2026-02-02-web-terminal-design.md delete mode 100644 docs/plans/2026-02-02-web-terminal-implementation.md delete mode 100644 docs/plans/2026-02-03-session-timeout-design.md delete mode 100644 docs/plans/2026-02-03-workspace-sync-design.md delete mode 100644 docs/plans/2026-03-08-multi-tab-terminals-design.md delete mode 100644 docs/plans/2026-03-08-multi-tab-terminals-implementation.md delete mode 100644 docs/plans/2026-03-11-litellm-empty-content-blocks-design.md delete mode 100644 docs/plans/2026-03-27-pat-auto-rotation-implementation.md delete mode 100644 docs/plans/2026-03-28-session-detach-reconnect.md delete mode 100644 docs/plans/PLAN-issue-8.md diff --git a/README.md b/README.md index 2227586..4fd39c6 100644 --- a/README.md +++ b/README.md @@ -292,27 +292,7 @@ This template repo opens that vision up for every Databricks user — no IDE set Single-user app — the owner is resolved via the app's service principal and Apps API (`app.creator`), with no PAT required at deploy time. Authorization checks `X-Forwarded-Email` against `app.creator`. On first terminal session, the user pastes a short-lived PAT interactively. Tokens auto-rotate every 10 minutes (15-minute lifetime), with old tokens proactively revoked. On restart, the user re-pastes (no persistence by design). -### Verifying release provenance - -Each GitHub Release ships with: - -- `coda-sbom.cdx.json` — CycloneDX SBOM of every Python + npm dependency (generated by [syft](https://github.com/anchore/syft)). -- `coda-sbom.cdx.json.cosign.bundle` — [Sigstore](https://www.sigstore.dev/) keyless signature bundle (cert + signature + Rekor inclusion proof in one file). - -To verify a release came from this repo's release workflow: - -```bash -TAG=v1.0.0 # the release you downloaded -gh release download "$TAG" -p 'coda-sbom.cdx.json*' - -cosign verify-blob \ - --bundle coda-sbom.cdx.json.cosign.bundle \ - --certificate-identity-regexp 'https://github.com/databrickslabs/coding-agents-databricks-apps/.+' \ - --certificate-oidc-issuer https://token.actions.githubusercontent.com \ - coda-sbom.cdx.json -``` - -Signing uses GitHub's OIDC token — no long-lived signing keys exist. The signing identity is anchored to the workflow path + tag ref, and a public transparency-log entry is recorded in Rekor. +Each GitHub Release ships a signed CycloneDX SBOM — see [docs/SECURITY.md](./docs/SECURITY.md) for verification steps. ### Gunicorn diff --git a/docs/SECURITY.md b/docs/SECURITY.md new file mode 100644 index 0000000..56ab729 --- /dev/null +++ b/docs/SECURITY.md @@ -0,0 +1,23 @@ +# Security + +## Verifying release provenance + +Each GitHub Release ships with: + +- `coda-sbom.cdx.json` — CycloneDX SBOM of every Python + npm dependency (generated by [syft](https://github.com/anchore/syft)). +- `coda-sbom.cdx.json.cosign.bundle` — [Sigstore](https://www.sigstore.dev/) keyless signature bundle (cert + signature + Rekor inclusion proof in one file). + +To verify a release came from this repo's release workflow: + +```bash +TAG=v1.0.0 # the release you downloaded +gh release download "$TAG" -p 'coda-sbom.cdx.json*' + +cosign verify-blob \ + --bundle coda-sbom.cdx.json.cosign.bundle \ + --certificate-identity-regexp 'https://github.com/databrickslabs/coding-agents-databricks-apps/.+' \ + --certificate-oidc-issuer https://token.actions.githubusercontent.com \ + coda-sbom.cdx.json +``` + +Signing uses GitHub's OIDC token — no long-lived signing keys exist. The signing identity is anchored to the workflow path + tag ref, and a public transparency-log entry is recorded in Rekor. diff --git a/docs/plans/2025-02-03-bundled-skills-design.md b/docs/plans/2025-02-03-bundled-skills-design.md deleted file mode 100644 index ee9f95f..0000000 --- a/docs/plans/2025-02-03-bundled-skills-design.md +++ /dev/null @@ -1,170 +0,0 @@ -# Pre-bundled Databricks Skills & Superpowers Plugin - -**Date:** 2025-02-03 -**Status:** Approved - -## Overview - -Bundle Databricks skills and the superpowers plugin into the Claude Code on Databricks app so users have immediate access to Databricks-specific knowledge and development workflows. - -## Goals - -- Users get 16 Databricks skills out of the box (no manual installation) -- Users get the full superpowers plugin (TDD, debugging, brainstorming, etc.) -- Skills are version-controlled with the app -- Welcome message shows available capabilities - -## Directory Structure - -``` -xterm-experiment/ -├── .claude/ -│ ├── skills/ # Databricks skills (16) -│ │ ├── agent-bricks/ -│ │ ├── aibi-dashboards/ -│ │ ├── asset-bundles/ -│ │ ├── databricks-app-apx/ -│ │ ├── databricks-app-python/ -│ │ ├── databricks-config/ -│ │ ├── databricks-docs/ -│ │ ├── databricks-genie/ -│ │ ├── databricks-jobs/ -│ │ ├── databricks-python-sdk/ -│ │ ├── databricks-unity-catalog/ -│ │ ├── mlflow-evaluation/ -│ │ ├── model-serving/ -│ │ ├── spark-declarative-pipelines/ -│ │ ├── synthetic-data-generation/ -│ │ └── unstructured-pdf-generation/ -│ │ -│ └── plugins/ -│ └── superpowers/ # Full superpowers plugin -│ ├── .claude-plugin/ -│ │ └── plugin.json -│ ├── skills/ # 14 skills -│ ├── commands/ -│ ├── hooks/ -│ ├── agents/ -│ └── ... -├── setup_claude.py # Modified to register plugin -├── app.py # Modified to start PTY in ~/projects/ -├── CLAUDE.md # Welcome message -└── README.md # Updated documentation -``` - -## Implementation Details - -### 1. Bundle Databricks Skills - -Copy all 16 skills from [ai-dev-kit](https://github.com/databricks-solutions/ai-dev-kit) `databricks-skills/` to `.claude/skills/`: - -| Category | Skills | -|----------|--------| -| AI & Agents | agent-bricks, databricks-genie, mlflow-evaluation, model-serving | -| Analytics | aibi-dashboards, databricks-unity-catalog | -| Data Engineering | spark-declarative-pipelines, databricks-jobs, synthetic-data-generation | -| Development | asset-bundles, databricks-app-apx, databricks-app-python, databricks-python-sdk, databricks-config | -| Reference | databricks-docs, unstructured-pdf-generation | - -### 2. Bundle Superpowers Plugin - -Copy full plugin from [superpowers](https://github.com/obra/superpowers) to `.claude/plugins/superpowers/`: - -- 14 skills (brainstorming, TDD, systematic-debugging, etc.) -- Commands (/commit, etc.) -- Hooks -- Agents - -### 3. Register Plugin in setup_claude.py - -```python -# 6. Register bundled superpowers plugin -plugins_dir = claude_dir / "plugins" -plugins_dir.mkdir(exist_ok=True) - -installed_plugins = { - "version": 2, - "plugins": { - "superpowers@bundled": [ - { - "scope": "user", - "installPath": str(home / ".claude" / "plugins" / "superpowers"), - "version": "4.0.3", - "installedAt": "2025-01-01T00:00:00.000Z", - "lastUpdated": "2025-01-01T00:00:00.000Z" - } - ] - } -} - -plugins_json_path = plugins_dir / "installed_plugins.json" -plugins_json_path.write_text(json.dumps(installed_plugins, indent=2)) -print("Superpowers plugin registered") -``` - -### 4. Start PTY in ~/projects/ - -Modify `app.py` to start shell sessions in the projects directory: - -```python -# In create_session(), when spawning the PTY: -projects_dir = os.path.expanduser("~/projects") -os.makedirs(projects_dir, exist_ok=True) - -pid, fd = pty.fork() -if pid == 0: - os.chdir(projects_dir) # Start in projects/ - os.execvpe('/bin/bash', ['/bin/bash', '-l'], env) -``` - -### 5. Welcome Message (CLAUDE.md) - -Create `CLAUDE.md` at repo root: - -```markdown -# Claude Code on Databricks - -Welcome! This environment comes pre-configured with: - -## Databricks Skills (16) -- **AI & Agents**: agent-bricks, databricks-genie, mlflow-evaluation, model-serving -- **Analytics**: aibi-dashboards, databricks-unity-catalog -- **Data Engineering**: spark-declarative-pipelines, databricks-jobs, synthetic-data-generation -- **Development**: asset-bundles, databricks-app-apx, databricks-app-python, databricks-python-sdk, databricks-config -- **Reference**: databricks-docs, unstructured-pdf-generation - -## Superpowers Plugin -- brainstorming, test-driven-development, systematic-debugging, writing-plans, and more - -## Quick Start -- Projects sync to Databricks Workspace on git commit -- Use `/commit` for guided commits -- Ask "help me create a dashboard" to see skills in action -``` - -### 6. README Update - -Document bundled skills with credits to source repositories: - -- [databricks-solutions/ai-dev-kit](https://github.com/databricks-solutions/ai-dev-kit) - Databricks skills -- [obra/superpowers](https://github.com/obra/superpowers) - Development workflow plugin - -Include update instructions for keeping skills current. - -## Updating Skills - -Since skills are bundled (not downloaded at startup), updates require: - -1. Pull latest from ai-dev-kit repo -2. Copy updated skills to `.claude/skills/` -3. Redeploy the app - -## Trade-offs - -| Approach | Chosen | Reason | -|----------|--------|--------| -| Bundled vs Download at startup | Bundled | Faster startup, no network dependency, predictable | -| All skills vs Subset | All 16 | Comprehensive coverage | -| Skills location | `.claude/skills/` | Standard location, auto-loaded | -| Superpowers full vs skills-only | Full plugin | Get commands, hooks, agents too | -| HOME vs working dir change | Working dir | Keep .claude/ separate, only projects sync | diff --git a/docs/plans/2026-02-02-web-terminal-design.md b/docs/plans/2026-02-02-web-terminal-design.md deleted file mode 100644 index e0ca60c..0000000 --- a/docs/plans/2026-02-02-web-terminal-design.md +++ /dev/null @@ -1,294 +0,0 @@ -# Web Terminal for Databricks Apps with Claude Code - -**Date:** 2026-02-02 -**Status:** Approved - -## Overview - -A web-based terminal emulator deployed as a Databricks App that provides shell access to the container, with Claude Code pre-configured for vibe coding. - -## Architecture - -``` -┌─────────────────────────────────────────────────────────────┐ -│ Databricks App Container │ -│ │ -│ ┌──────────────┐ WebSocket ┌──────────────────┐ │ -│ │ │◄────────────────────►│ │ │ -│ │ Flask App │ │ PTY Process │ │ -│ │ (Backend) │ │ (bash shell) │ │ -│ │ │ │ │ │ -│ └──────┬───────┘ │ ┌────────────┐ │ │ -│ │ │ │Claude Code │ │ │ -│ │ serves │ │ (CLI) │ │ │ -│ ▼ │ └────────────┘ │ │ -│ ┌──────────────┐ └──────────────────┘ │ -│ │ xterm.js │ │ -│ │ (Frontend) │ │ -│ └──────────────┘ │ -└─────────────────────────────────────────────────────────────┘ - ▲ - │ HTTPS - ▼ -┌─────────────────┐ -│ User Browser │ -└─────────────────┘ -``` - -## Components - -### Backend (Flask + WebSocket + PTY) - -- **Flask** serves the static frontend -- **flask-socketio** handles WebSocket connections -- **ptyprocess** spawns bash shells -- Each connection gets its own PTY session - -### Frontend (xterm.js) - -- Terminal emulator in the browser -- Loaded from CDN -- Socket.IO client for WebSocket communication -- Auto-resizes to viewport - -### Claude Code Configuration - -Uses Databricks model serving instead of direct Anthropic API: - -| File | Purpose | -|------|---------| -| `~/.claude/settings.json` | Databricks model serving config | -| `~/.claude.json` | Skip onboarding prompt (v2.0.65+ fix) | - -## Project Structure - -``` -xterm-experiment/ -├── app.py # Flask + WebSocket + PTY -├── setup_claude.py # Pre-configures Claude for Databricks -├── requirements.txt -├── app.yaml -└── static/ - └── index.html -``` - -## Files - -### requirements.txt - -``` -flask>=2.0 -flask-socketio>=5.0 -gevent>=21.0 -gevent-websocket>=0.10 -ptyprocess>=0.7 -claude-agent-sdk -``` - -### app.yaml - -```yaml -command: - - bash - - -c - - "python setup_claude.py && python app.py" -env: - - name: DATABRICKS_HOST - value: https://fevm-serverless-9cefok.cloud.databricks.com - - name: DATABRICKS_TOKEN - valueFrom: DATABRICKS_TOKEN -``` - -### app.py - -```python -import os -import pty -import select -import subprocess -from flask import Flask, send_from_directory -from flask_socketio import SocketIO, emit, request - -app = Flask(__name__) -socketio = SocketIO(app, cors_allowed_origins="*", async_mode="gevent") - -# Store PTY file descriptors per session -sessions = {} - -@app.route("/") -def index(): - return send_from_directory("static", "index.html") - -@socketio.on("connect") -def handle_connect(): - """Spawn a new PTY bash shell for this connection.""" - try: - master_fd, slave_fd = pty.openpty() - pid = subprocess.Popen( - ["/bin/bash"], - stdin=slave_fd, - stdout=slave_fd, - stderr=slave_fd, - preexec_fn=os.setsid - ).pid - sessions[request.sid] = {"master_fd": master_fd, "pid": pid} - socketio.start_background_task(read_pty_output, request.sid, master_fd) - except Exception as e: - emit("output", f"\x1b[31mError spawning shell: {e}\x1b[0m\r\n") - -@socketio.on("input") -def handle_input(data): - """Forward user input to the PTY.""" - fd = sessions.get(request.sid, {}).get("master_fd") - if fd: - os.write(fd, data.encode()) - -@socketio.on("disconnect") -def handle_disconnect(): - """Clean up PTY on disconnect.""" - session = sessions.pop(request.sid, None) - if session: - os.close(session["master_fd"]) - -def read_pty_output(sid, fd): - """Read PTY output and send to browser.""" - while sid in sessions: - if select.select([fd], [], [], 0.1)[0]: - try: - output = os.read(fd, 1024).decode(errors="replace") - socketio.emit("output", output, to=sid) - except OSError: - socketio.emit("output", "\r\n\x1b[31mShell disconnected.\x1b[0m\r\n", to=sid) - break - -if __name__ == "__main__": - socketio.run(app, host="0.0.0.0", port=8000) -``` - -### setup_claude.py - -```python -import os -import json -from pathlib import Path - -# Create ~/.claude directory -claude_dir = Path.home() / ".claude" -claude_dir.mkdir(exist_ok=True) - -# 1. Write settings.json for Databricks model serving -settings = { - "env": { - "ANTHROPIC_MODEL": "databricks-claude-sonnet-4-5", - "ANTHROPIC_BASE_URL": f"{os.environ['DATABRICKS_HOST']}/serving-endpoints/anthropic", - "ANTHROPIC_AUTH_TOKEN": os.environ["DATABRICKS_TOKEN"], - "ANTHROPIC_CUSTOM_HEADERS": "x-databricks-use-coding-agent-mode: true" - } -} - -settings_path = claude_dir / "settings.json" -settings_path.write_text(json.dumps(settings, indent=2)) - -# 2. Write ~/.claude.json to skip onboarding (v2.0.65+ fix) -claude_json = { - "hasCompletedOnboarding": True -} - -claude_json_path = Path.home() / ".claude.json" -claude_json_path.write_text(json.dumps(claude_json, indent=2)) - -print(f"Claude configured: {settings_path}") -print(f"Onboarding skipped: {claude_json_path}") -``` - -### static/index.html - -```html - - - - Terminal - - - - -
- - - - - - - -``` - -## Deployment - -1. **Create the app:** - ```bash - databricks apps create xterm-terminal - ``` - -2. **Set the token secret:** - ```bash - databricks secrets create-scope xterm-terminal - databricks secrets put-secret xterm-terminal DATABRICKS_TOKEN - ``` - -3. **Deploy:** - ```bash - databricks apps deploy xterm-terminal --source-code-path . - ``` - -## Known Limitations - -| Limitation | Impact | Workaround | -|------------|--------|------------| -| No persistence | Files lost on redeploy | Mount workspace volume (future) | -| Single user per session | Each tab = new shell | Expected behavior | -| 12hr Databricks session limit | Long sessions timeout | User reconnects | -| No terminal resize signaling | Fixed size initially | Can add SIGWINCH handling | -| Container resources | Limited CPU/memory | Use for coding, not heavy compute | - -## Security Considerations - -- Shell runs as app user (not root) -- Databricks token scoped to model serving -- No network egress restrictions by default (Claude can `curl`, `git clone`, etc.) - -## Future Enhancements - -- Persistent workspace via mounted volumes -- Multi-user authentication -- Terminal resize signaling (SIGWINCH) -- Session recording/playback diff --git a/docs/plans/2026-02-02-web-terminal-implementation.md b/docs/plans/2026-02-02-web-terminal-implementation.md deleted file mode 100644 index ecf4d5f..0000000 --- a/docs/plans/2026-02-02-web-terminal-implementation.md +++ /dev/null @@ -1,398 +0,0 @@ -# Web Terminal Implementation Plan - -> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. - -**Goal:** Build a deployable web terminal for Databricks Apps with Claude Code pre-configured. - -**Architecture:** Flask backend spawns PTY shells per WebSocket connection; xterm.js frontend renders terminal in browser; setup script configures Claude Code for Databricks model serving. - -**Tech Stack:** Flask, flask-socketio, gevent, ptyprocess, xterm.js, Socket.IO - ---- - -## Task 1: Create Project Dependencies - -**Files:** -- Create: `requirements.txt` - -**Step 1: Create requirements.txt** - -``` -flask>=2.0 -flask-socketio>=5.0 -gevent>=21.0 -gevent-websocket>=0.10 -ptyprocess>=0.7 -claude-agent-sdk -``` - -**Step 2: Commit** - -```bash -git add requirements.txt -git commit -m "feat: add Python dependencies for web terminal" -``` - ---- - -## Task 2: Create Claude Configuration Script - -**Files:** -- Create: `setup_claude.py` - -**Step 1: Create setup_claude.py** - -```python -import os -import json -from pathlib import Path - -# Create ~/.claude directory -claude_dir = Path.home() / ".claude" -claude_dir.mkdir(exist_ok=True) - -# 1. Write settings.json for Databricks model serving -settings = { - "env": { - "ANTHROPIC_MODEL": "databricks-claude-sonnet-4-5", - "ANTHROPIC_BASE_URL": f"{os.environ['DATABRICKS_HOST']}/serving-endpoints/anthropic", - "ANTHROPIC_AUTH_TOKEN": os.environ["DATABRICKS_TOKEN"], - "ANTHROPIC_CUSTOM_HEADERS": "x-databricks-use-coding-agent-mode: true" - } -} - -settings_path = claude_dir / "settings.json" -settings_path.write_text(json.dumps(settings, indent=2)) - -# 2. Write ~/.claude.json to skip onboarding (v2.0.65+ fix) -claude_json = { - "hasCompletedOnboarding": True -} - -claude_json_path = Path.home() / ".claude.json" -claude_json_path.write_text(json.dumps(claude_json, indent=2)) - -print(f"Claude configured: {settings_path}") -print(f"Onboarding skipped: {claude_json_path}") -``` - -**Step 2: Test script runs without error (mock env vars)** - -```bash -DATABRICKS_HOST=https://example.databricks.com DATABRICKS_TOKEN=test python setup_claude.py -``` - -Expected: Prints paths, creates files in home directory - -**Step 3: Verify files created** - -```bash -cat ~/.claude/settings.json -cat ~/.claude.json -``` - -Expected: JSON files with correct structure - -**Step 4: Commit** - -```bash -git add setup_claude.py -git commit -m "feat: add Claude Code configuration script for Databricks" -``` - ---- - -## Task 3: Create Frontend HTML - -**Files:** -- Create: `static/index.html` - -**Step 1: Create static directory** - -```bash -mkdir -p static -``` - -**Step 2: Create static/index.html** - -```html - - - - Terminal - - - - -
- - - - - - - -``` - -**Step 3: Commit** - -```bash -git add static/index.html -git commit -m "feat: add xterm.js frontend for web terminal" -``` - ---- - -## Task 4: Create Flask Backend - -**Files:** -- Create: `app.py` - -**Step 1: Create app.py** - -```python -import os -import pty -import select -import subprocess -from flask import Flask, send_from_directory -from flask_socketio import SocketIO, emit, request - -app = Flask(__name__) -socketio = SocketIO(app, cors_allowed_origins="*", async_mode="gevent") - -# Store PTY file descriptors per session -sessions = {} - - -@app.route("/") -def index(): - return send_from_directory("static", "index.html") - - -@socketio.on("connect") -def handle_connect(): - """Spawn a new PTY bash shell for this connection.""" - try: - master_fd, slave_fd = pty.openpty() - pid = subprocess.Popen( - ["/bin/bash"], - stdin=slave_fd, - stdout=slave_fd, - stderr=slave_fd, - preexec_fn=os.setsid - ).pid - sessions[request.sid] = {"master_fd": master_fd, "pid": pid} - socketio.start_background_task(read_pty_output, request.sid, master_fd) - except Exception as e: - emit("output", f"\x1b[31mError spawning shell: {e}\x1b[0m\r\n") - - -@socketio.on("input") -def handle_input(data): - """Forward user input to the PTY.""" - fd = sessions.get(request.sid, {}).get("master_fd") - if fd: - os.write(fd, data.encode()) - - -@socketio.on("disconnect") -def handle_disconnect(): - """Clean up PTY on disconnect.""" - session = sessions.pop(request.sid, None) - if session: - os.close(session["master_fd"]) - - -def read_pty_output(sid, fd): - """Read PTY output and send to browser.""" - while sid in sessions: - if select.select([fd], [], [], 0.1)[0]: - try: - output = os.read(fd, 1024).decode(errors="replace") - socketio.emit("output", output, to=sid) - except OSError: - socketio.emit("output", "\r\n\x1b[31mShell disconnected.\x1b[0m\r\n", to=sid) - break - - -if __name__ == "__main__": - socketio.run(app, host="0.0.0.0", port=8000) -``` - -**Step 2: Commit** - -```bash -git add app.py -git commit -m "feat: add Flask backend with WebSocket PTY handling" -``` - ---- - -## Task 5: Create Databricks App Configuration - -**Files:** -- Create: `app.yaml` - -**Step 1: Create app.yaml** - -```yaml -command: - - bash - - -c - - "python setup_claude.py && python app.py" -env: - - name: DATABRICKS_HOST - value: https://fevm-serverless-9cefok.cloud.databricks.com - - name: DATABRICKS_TOKEN - valueFrom: DATABRICKS_TOKEN -``` - -**Step 2: Commit** - -```bash -git add app.yaml -git commit -m "feat: add Databricks App deployment configuration" -``` - ---- - -## Task 6: Local Testing - -**Step 1: Install dependencies** - -```bash -uv pip install -r requirements.txt -``` - -**Step 2: Run the app locally** - -```bash -python app.py -``` - -Expected: Server starts on http://0.0.0.0:8000 - -**Step 3: Test in browser** - -Open http://localhost:8000 in browser. - -Expected: -- Dark terminal appears -- Green "Connected" message shows -- Can type commands (ls, pwd, etc.) -- Output renders correctly - -**Step 4: Test terminal functionality** - -In the web terminal, run: -```bash -echo "hello world" -ls -la -pwd -``` - -Expected: Commands execute and output displays - -**Step 5: Stop the server** - -Press Ctrl+C in terminal running app.py - ---- - -## Task 7: Final Commit - -**Step 1: Verify all files present** - -```bash -ls -la -ls -la static/ -``` - -Expected structure: -``` -xterm-experiment/ -├── app.py -├── app.yaml -├── requirements.txt -├── setup_claude.py -├── static/ -│ └── index.html -└── docs/ - └── plans/ - └── 2026-02-02-web-terminal-design.md -``` - -**Step 2: Final commit if any uncommitted changes** - -```bash -git status -``` - -If changes exist: -```bash -git add -A -git commit -m "chore: finalize web terminal implementation" -``` - ---- - -## Deployment (Manual - After Local Testing) - -Once local testing passes, deploy to Databricks: - -```bash -# 1. Create the app (if not exists) -databricks apps create xterm-terminal - -# 2. Set up secrets (one-time) -databricks secrets create-scope xterm-terminal -databricks secrets put-secret xterm-terminal DATABRICKS_TOKEN - -# 3. Deploy -databricks apps deploy xterm-terminal --source-code-path . -``` - ---- - -## Summary - -| Task | Description | Files | -|------|-------------|-------| -| 1 | Dependencies | requirements.txt | -| 2 | Claude config | setup_claude.py | -| 3 | Frontend | static/index.html | -| 4 | Backend | app.py | -| 5 | App config | app.yaml | -| 6 | Local test | - | -| 7 | Final commit | - | diff --git a/docs/plans/2026-02-03-session-timeout-design.md b/docs/plans/2026-02-03-session-timeout-design.md deleted file mode 100644 index b4ed7f7..0000000 --- a/docs/plans/2026-02-03-session-timeout-design.md +++ /dev/null @@ -1,113 +0,0 @@ -# Session Timeout Design - -## Problem - -When frontend tabs close unexpectedly (browser crash, network drop, force quit), the backend PTY sessions remain open indefinitely. The `beforeunload` beacon cleanup only works for graceful tab closes. - -## Solution - -Use the existing 100ms polling as an implicit heartbeat. If `/api/output` hasn't been called for 60 seconds, assume the frontend is gone and terminate the session gracefully. - -## Design - -### Configuration Constants - -```python -SESSION_TIMEOUT_SECONDS = 60 # No poll for 60s = dead session -CLEANUP_INTERVAL_SECONDS = 30 # How often to check for stale sessions -GRACEFUL_SHUTDOWN_WAIT = 3 # Seconds to wait after SIGHUP before SIGKILL -``` - -### Data Model Changes - -Add `last_poll_time` to session structure: - -```python -sessions[session_id] = { - "master_fd": master_fd, - "pid": pid, - "output_buffer": deque(maxlen=1000), - "last_poll_time": time.time(), # NEW - "created_at": time.time() # NEW -} -``` - -Update timestamp on every poll in `/api/output`: - -```python -sessions[session_id]["last_poll_time"] = time.time() -``` - -### Cleanup Thread - -Background thread runs every 30 seconds: - -```python -def cleanup_stale_sessions(): - while True: - time.sleep(CLEANUP_INTERVAL_SECONDS) - - now = time.time() - stale_sessions = [] - - with sessions_lock: - for session_id, session in sessions.items(): - if now - session["last_poll_time"] > SESSION_TIMEOUT_SECONDS: - stale_sessions.append((session_id, session["pid"], session["master_fd"])) - - for session_id, pid, master_fd in stale_sessions: - terminate_session(session_id, pid, master_fd) -``` - -### Graceful Termination - -SIGHUP first, wait 3 seconds, then SIGKILL if still alive: - -```python -def terminate_session(session_id, pid, master_fd): - try: - os.kill(pid, signal.SIGHUP) - time.sleep(GRACEFUL_SHUTDOWN_WAIT) - - try: - os.kill(pid, 0) # Check if still alive - os.kill(pid, signal.SIGKILL) - except OSError: - pass # Already dead - - os.close(master_fd) - except OSError: - pass - - with sessions_lock: - sessions.pop(session_id, None) -``` - -### Thread Startup - -Add before `app.run()`: - -```python -cleanup_thread = threading.Thread(target=cleanup_stale_sessions, daemon=True) -cleanup_thread.start() -``` - -## Behavior - -| Scenario | Result | -|----------|--------| -| Browser open, user idle | Polling continues, session stays alive | -| Browser closed gracefully | Beacon fires, immediate cleanup | -| Browser crash / force quit | Polling stops, cleanup after 60s | -| Network disconnect | Polling stops, cleanup after 60s | -| Tab force-closed | Polling stops, cleanup after 60s | - -## Files to Modify - -- `app.py` - All backend changes (data model, cleanup thread, termination logic) - -## Not In Scope - -- Input-based idle timeout (killing sessions where user hasn't typed) -- Maximum session limits -- Session persistence/reconnection diff --git a/docs/plans/2026-02-03-workspace-sync-design.md b/docs/plans/2026-02-03-workspace-sync-design.md deleted file mode 100644 index 56add54..0000000 --- a/docs/plans/2026-02-03-workspace-sync-design.md +++ /dev/null @@ -1,194 +0,0 @@ -# Git-Based Workspace Sync Design - -**Goal:** Auto-sync user projects from the container to Databricks Workspace on git commit. - -**Architecture:** Git post-commit hook triggers `databricks sync` to upload project files to `/Workspace/Users//projects/`. - ---- - -## Overview - -When users create projects in the `~/projects` folder and commit with git, their code automatically syncs to their Databricks Workspace. This ensures work persists even when the container restarts. - -``` -Container Databricks Workspace -┌─────────────────────┐ ┌──────────────────────────────────┐ -│ ~/projects/ │ git commit │ /Workspace/Users// │ -│ my-app/ │ ────────────► │ projects/ │ -│ .git/hooks/ │ post-commit │ my-app/ │ -│ post-commit │ triggers │ (synced files) │ -└─────────────────────┘ └──────────────────────────────────┘ -``` - ---- - -## Implementation Plan - -### Files to Create/Modify - -| File | Action | Purpose | -|------|--------|---------| -| `sync_to_workspace.py` | Create | Sync script called by git hook | -| `setup_claude.py` | Modify | Add projects folder + git template setup | -| `requirements.txt` | Modify | Add databricks-sdk | -| `static/index.html` | Modify | Update welcome message | - ---- - -### Task 1: Create sync_to_workspace.py - -```python -#!/usr/bin/env python3 -"""Sync a project directory to Databricks Workspace.""" -import os -import sys -import subprocess -from pathlib import Path - -def get_user_email(): - """Get current user's email from Databricks token.""" - from databricks.sdk import WorkspaceClient - w = WorkspaceClient() - return w.current_user.me().user_name - -def sync_project(project_path: Path): - """Sync project to user's Workspace.""" - try: - user_email = get_user_email() - workspace_dest = f"/Workspace/Users/{user_email}/projects/{project_path.name}" - - result = subprocess.run( - ["databricks", "sync", str(project_path), workspace_dest, "--watch=false"], - capture_output=True, - text=True - ) - - if result.returncode == 0: - print(f"✓ Synced to {workspace_dest}") - else: - print(f"⚠ Sync warning: {result.stderr}", file=sys.stderr) - - except Exception as e: - # Log error but don't block the commit - error_log = Path.home() / ".sync-errors.log" - with open(error_log, "a") as f: - f.write(f"{project_path}: {e}\n") - print(f"⚠ Sync failed (logged to ~/.sync-errors.log)", file=sys.stderr) - -if __name__ == "__main__": - if len(sys.argv) > 1: - sync_project(Path(sys.argv[1])) - else: - sync_project(Path.cwd()) -``` - ---- - -### Task 2: Update setup_claude.py - -Add after existing code: - -```python -# 4. Create projects directory -projects_dir = home / "projects" -projects_dir.mkdir(exist_ok=True) -print(f"Projects directory: {projects_dir}") - -# 5. Set up git template with post-commit hook -git_template_hooks = home / ".git-templates" / "hooks" -git_template_hooks.mkdir(parents=True, exist_ok=True) - -post_commit_hook = git_template_hooks / "post-commit" -post_commit_hook.write_text('''#!/bin/bash -# Auto-sync to Databricks Workspace on commit -python3 /app/python/source_code/sync_to_workspace.py "$(pwd)" & -''') -post_commit_hook.chmod(0o755) - -# Configure git to use template for new repos -subprocess.run( - ["git", "config", "--global", "init.templateDir", str(home / ".git-templates")], - capture_output=True -) -print("Git post-commit hook template configured") -``` - ---- - -### Task 3: Update requirements.txt - -Add: -``` -databricks-sdk>=0.20.0 -``` - ---- - -### Task 4: Update welcome message in static/index.html - -Change the welcome message to: -```javascript -term.write('\x1b[32mConnected. Type "claude" to start coding.\x1b[0m\r\n'); -term.write('\x1b[90mProjects in ~/projects auto-sync to Workspace on git commit.\x1b[0m\r\n\r\n'); -``` - ---- - -## User Workflow - -```bash -# 1. User connects to terminal -# 2. Navigate to projects folder -cd ~/projects - -# 3. Create a new project -mkdir my-app && cd my-app - -# 4. Initialize git (post-commit hook auto-installed) -git init - -# 5. Write code with Claude... - -# 6. Commit triggers sync -git add . && git commit -m "initial" -# Output: ✓ Synced to /Workspace/Users/user@company.com/projects/my-app -``` - ---- - -## Configuration - -- **Sync destination:** `/Workspace/Users//projects/` -- **User email:** Derived from Databricks token via SDK at runtime -- **Trigger:** Git post-commit hook (only on commits) - ---- - -## Error Handling - -- Sync failures are logged to `~/.sync-errors.log` -- Errors don't block git commits -- Failed syncs retry on next commit - ---- - -## Verification - -1. Deploy the updated app -2. Connect to terminal -3. Run: - ```bash - cd ~/projects - mkdir test-sync && cd test-sync - git init - echo "# Test" > README.md - git add . && git commit -m "test" - ``` -4. Check Databricks Workspace: `/Workspace/Users//projects/test-sync/` -5. Verify README.md appears - ---- - -## Dependencies - -- `databricks-sdk>=0.20.0` (add to requirements.txt) diff --git a/docs/plans/2026-03-08-multi-tab-terminals-design.md b/docs/plans/2026-03-08-multi-tab-terminals-design.md deleted file mode 100644 index 0720b38..0000000 --- a/docs/plans/2026-03-08-multi-tab-terminals-design.md +++ /dev/null @@ -1,103 +0,0 @@ -# Multi-Tab Terminal Support - -**Date:** 2026-03-08 -**Status:** Approved - -## Overview - -Add browser-style tabs to the web terminal, where each tab owns its own independent split-pane layout. This lets users run multiple concurrent sessions (e.g., Claude in one tab, logs in another, git in a third) without losing the existing split-pane feature. - -## Architecture - -``` -┌─[Shell 1]──[Shell 2]──[+ ]──────────────────────────────────┐ -│ │ -│ Tab 1's pane container (hidden when tab inactive) │ -│ ┌─────────────────────┐ | ┌─────────────────────┐ │ -│ │ Pane 1 │ | │ Pane 2 │ │ -│ └─────────────────────┘ | └─────────────────────┘ │ -│ │ -│ Tab 2's pane container (display:none when inactive) │ -│ ┌─────────────────────────────────────────────────┐ │ -│ │ Pane 1 (full width) │ │ -│ └─────────────────────────────────────────────────┘ │ -└───────────────────────────────────────────────────────────────┘ -``` - -## Data Model - -```javascript -tabs = [ - { - id: "tab-1", - label: "Shell 1", // double-click to rename - panes: [ // each tab owns 1-2 panes - { id, element, term, fitAddon, searchAddon, sessionId } - ], - activePaneId: "pane-1", - paneContainer:
, // per-tab container element - divider:
| null // per-tab divider (if split) - } -] -activeTabId = "tab-1" -``` - -## Constraints - -- **Max 5 tabs** (up to 2 panes each = max 10 PTY sessions) -- **Default label:** "Shell N" — double-click to rename -- **No persistence:** tabs and sessions are lost on page refresh -- **Backend unchanged:** tabs are purely frontend; each pane still calls `/api/session` - -## Tab Bar UI - -- 32px height, positioned above the pane container -- Translucent/blurred background matching existing toolbar aesthetic -- Theme-aware (adapts to light/dark) -- Each tab shows: label + close "x" (visible on hover or when active) -- "+" button at end (hidden when 5 tabs reached) -- Active tab has a subtle bottom border accent - -## Keyboard Shortcuts - -### Tab shortcuts (new) -| Shortcut | Action | -|----------|--------| -| `Ctrl+Shift+T` | New tab | -| `Ctrl+Shift+[` | Previous tab | -| `Ctrl+Shift+]` | Next tab | -| `Ctrl+Shift+1-5` | Jump to tab by number | - -### Pane shortcuts (moved from Ctrl+Shift to Alt+Shift) -| Shortcut | Action | -|----------|--------| -| `Alt+Shift+D` | Split pane within active tab | -| `Alt+Shift+W` | Close pane (closes tab if last pane) | -| `Alt+Shift+[` | Previous pane within tab | -| `Alt+Shift+]` | Next pane within tab | - -## Tab Lifecycle - -1. **Create:** "+" button or `Ctrl+Shift+T`. Creates tab with one pane, spawns PTY, switches to it. -2. **Switch:** Click tab or `Ctrl+Shift+[/]`. Hides current container, shows target, refits panes, focuses active pane. -3. **Rename:** Double-click label, inline edit, Enter to confirm, Escape to cancel. -4. **Close:** Click "x" or close last pane via `Alt+Shift+W`. Terminates all PTY sessions in that tab. If last tab, auto-creates a new "Shell 1". - -## Implementation Scope - -### Modified files -- `static/index.html` — tab bar HTML/CSS, JS refactored to wrap panes inside tabs - -### Unchanged files -- `app.py` — backend has no tab concept -- `static/poll-worker.js` — already supports multiple panes by paneId - -### Estimated changes -- ~40 lines CSS (tab bar styling) -- ~150 lines net JS change (tab management functions, refactored pane logic) - -## Out of Scope (YAGNI) -- Drag-to-reorder tabs -- Tab persistence across page reloads -- Tab-specific themes or settings -- Session reconnection / PTY resumption (separate project) diff --git a/docs/plans/2026-03-08-multi-tab-terminals-implementation.md b/docs/plans/2026-03-08-multi-tab-terminals-implementation.md deleted file mode 100644 index c7517bf..0000000 --- a/docs/plans/2026-03-08-multi-tab-terminals-implementation.md +++ /dev/null @@ -1,946 +0,0 @@ -# Multi-Tab Terminals Implementation Plan - -> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. - -**Goal:** Add browser-style tabs to the web terminal where each tab owns its own independent split-pane layout, with renamable labels and a 5-tab cap. - -**Architecture:** Purely frontend change to `static/index.html`. The existing flat `panes[]` array and `activePaneId` get wrapped inside a `tabs[]` array. Each tab owns a dedicated pane container DOM element. Switching tabs toggles CSS `display` on these containers. The backend and poll-worker are unchanged. - -**Tech Stack:** Vanilla JS, xterm.js, existing CSS conventions (translucent/blurred, theme-aware) - ---- - -### Task 1: Add Tab Bar HTML and CSS - -**Files:** -- Modify: `static/index.html:14-17` (pane container CSS) -- Modify: `static/index.html:199-255` (HTML body, before pane-container) - -**Step 1: Add tab bar CSS** - -Insert after line 12 (`#status` rule) and before line 14 (`/* Pane container */`): - -```css - /* Tab bar */ - #tab-bar { - display: flex; align-items: center; height: 32px; width: 100vw; - background: rgba(255,255,255,0.04); - border-bottom: 1px solid rgba(255,255,255,0.08); - backdrop-filter: blur(12px); -webkit-backdrop-filter: blur(12px); - overflow-x: auto; overflow-y: hidden; - user-select: none; flex-shrink: 0; - } - .tab { - display: flex; align-items: center; gap: 6px; - padding: 0 12px; height: 100%; cursor: pointer; - font-size: 12px; white-space: nowrap; - border-right: 1px solid rgba(255,255,255,0.06); - transition: background 0.15s; - position: relative; - } - .tab:hover { background: rgba(255,255,255,0.06); } - .tab.active { - background: rgba(255,255,255,0.08); - border-bottom: 2px solid rgba(100,150,255,0.6); - } - .tab-label { - outline: none; border: none; background: none; - color: inherit; font: inherit; padding: 0; - min-width: 30px; max-width: 120px; - cursor: inherit; - } - .tab-label:focus { - cursor: text; - border-bottom: 1px solid rgba(100,150,255,0.5); - } - .tab-close { - opacity: 0; font-size: 10px; padding: 2px 4px; - border-radius: 3px; border: none; background: none; - color: inherit; cursor: pointer; transition: opacity 0.15s, background 0.15s; - line-height: 1; - } - .tab:hover .tab-close, .tab.active .tab-close { opacity: 0.6; } - .tab-close:hover { opacity: 1 !important; background: rgba(255,255,255,0.1); } - #new-tab-btn { - padding: 0 10px; height: 100%; border: none; background: none; - color: inherit; font-size: 16px; cursor: pointer; - opacity: 0.5; transition: opacity 0.15s; - } - #new-tab-btn:hover { opacity: 1; } - #new-tab-btn:disabled { opacity: 0.2; cursor: default; } -``` - -**Step 2: Update pane container height** - -Change line 15 from: -```css - #pane-container { display: flex; flex-direction: row; height: 100vh; width: 100vw; } -``` -to: -```css - .tab-pane-container { display: flex; flex-direction: row; height: calc(100vh - 33px); width: 100vw; } - .tab-pane-container.hidden { display: none; } -``` - -Note: The old `#pane-container` ID is no longer used. Each tab creates its own `.tab-pane-container` div dynamically. - -**Step 3: Update pane divider CSS** - -Change line 20-27 from `#pane-divider` to class-based: -```css - .pane-divider { - flex: 0 0 4px; cursor: col-resize; - background: rgba(128,128,128,0.15); - transition: background 0.15s; - z-index: 1; - } - .pane-divider:hover, .pane-divider.dragging { - background: rgba(100,150,255,0.5); - } -``` - -**Step 4: Add tab bar HTML** - -Replace line 255 (`
`) with: -```html -
- -
- -``` - -**Step 5: Verify the page loads without errors** - -Open in browser, confirm the tab bar strip renders (empty, with just the "+" button). Terminal won't work yet because the JS still references the old `#pane-container`. - -**Step 6: Commit** - -```bash -git add static/index.html -git commit -m "feat: add tab bar HTML and CSS" -``` - ---- - -### Task 2: Refactor State Model — Introduce Tabs Array - -**Files:** -- Modify: `static/index.html:356-383` (State and Pane Object Model sections) - -**Step 1: Replace flat pane state with tabs model** - -Replace the Pane Object Model section (lines 367-383): -```javascript - // ── Pane Object Model ───────────────────────────────────────── - // Each pane: { id, element, term, fitAddon, searchAddon, sessionId } - let panes = []; - let activePaneId = null; - let paneIdCounter = 0; - - function getActivePane() { - return panes.find(p => p.id === activePaneId) || panes[0]; - } - - function focusPane(id) { - activePaneId = id; - panes.forEach(p => { - p.element.classList.toggle('active', p.id === id); - if (p.id === id) p.term.focus(); - }); - } -``` - -With: -```javascript - // ── Tab & Pane Object Model ─────────────────────────────────── - // Tab: { id, label, panes[], activePaneId, paneContainer, divider } - // Pane: { id, element, term, fitAddon, searchAddon, sessionId } - const MAX_TABS = 5; - let tabs = []; - let activeTabId = null; - let tabIdCounter = 0; - let paneIdCounter = 0; - - function getActiveTab() { - return tabs.find(t => t.id === activeTabId) || tabs[0]; - } - - function getActivePane() { - const tab = getActiveTab(); - if (!tab) return null; - return tab.panes.find(p => p.id === tab.activePaneId) || tab.panes[0]; - } - - function getAllPanes() { - return tabs.flatMap(t => t.panes); - } - - function focusPane(id) { - const tab = getActiveTab(); - if (!tab) return; - tab.activePaneId = id; - tab.panes.forEach(p => { - p.element.classList.toggle('active', p.id === id); - if (p.id === id) p.term.focus(); - }); - } -``` - -**Step 2: Verify no syntax errors** - -Page will be broken (functions reference old `panes` global). That's expected — we fix the references in the next tasks. - -**Step 3: Commit** - -```bash -git add static/index.html -git commit -m "refactor: introduce tabs array data model" -``` - ---- - -### Task 3: Update Theme, Font, and Refit Functions for Tabs - -**Files:** -- Modify: `static/index.html` — `applyTheme`, `setFontSize`, `setFontFamily`, `refitAllPanes` functions - -**Step 1: Update applyTheme** - -Replace line 403 (`panes.forEach(...)`) with: -```javascript - getAllPanes().forEach(p => { p.term.options.theme = preset.theme; }); -``` - -**Step 2: Update setFontSize** - -Replace line 424 (`panes.forEach(...)`) with: -```javascript - getAllPanes().forEach(p => { p.term.options.fontSize = currentFontSize; }); -``` - -**Step 3: Update setFontFamily** - -Replace line 434 (`panes.forEach(...)`) with: -```javascript - getAllPanes().forEach(p => { p.term.options.fontFamily = family; }); -``` - -**Step 4: Update refitAllPanes to only refit the active tab's panes** - -Replace the `refitAllPanes` function: -```javascript - function refitAllPanes() { - const tab = getActiveTab(); - if (!tab) return; - tab.panes.forEach(p => { - p.fitAddon.fit(); - if (p.sessionId) sendResize(p.term.cols, p.term.rows, p.sessionId); - }); - } -``` - -**Step 5: Commit** - -```bash -git add static/index.html -git commit -m "refactor: update theme/font/refit to use tabs model" -``` - ---- - -### Task 4: Rewrite createPane to Accept a Parent Tab - -**Files:** -- Modify: `static/index.html` — `createPane` function (lines ~773-838) - -**Step 1: Rewrite createPane** - -Replace the entire `createPane` function with: -```javascript - async function createPane(tab) { - const id = 'pane-' + (++paneIdCounter); - const container = tab.paneContainer; - const element = document.createElement('div'); - element.className = 'pane'; - element.id = id; - - // Add divider before second pane - if (tab.panes.length === 1) { - const divider = document.createElement('div'); - divider.className = 'pane-divider'; - container.appendChild(divider); - tab.divider = divider; - setupDividerDrag(divider, tab); - } - - container.appendChild(element); - - const term = new Terminal({ - cursorBlink: true, - fontSize: currentFontSize, - fontFamily: fontFamilies[currentFontFamily] || 'monospace', - theme: themes[currentThemeName].theme - }); - - const fitAddon = new FitAddon.FitAddon(); - term.loadAddon(fitAddon); - term.loadAddon(new WebLinksAddon.WebLinksAddon()); - - let searchAddon = null; - if (typeof SearchAddon !== 'undefined') { - searchAddon = new SearchAddon.SearchAddon(); - term.loadAddon(searchAddon); - } - - if (typeof ImageAddon !== 'undefined' && ImageAddon.ImageAddon) { - term.loadAddon(new ImageAddon.ImageAddon({ - sixelSupport: true, - sixelScrolling: true, - iipSupport: true, - enableSizeReports: true, - storageLimit: 128 - })); - } - - term.open(element); - fitAddon.fit(); - - const sid = await createSession(); - await sendResize(term.cols, term.rows, sid); - - term.write('\x1b[32mConnected. Type "claude" to start coding.\x1b[0m\r\n'); - term.write('\x1b[90mProjects in ~/projects auto-sync to Workspace on git commit.\x1b[0m\r\n'); - term.write('\x1b[90mCtrl+Shift+T new tab \u2502 Alt+Shift+D split pane \u2502 Alt+Shift+W close pane\x1b[0m\r\n\r\n'); - - const pane = { id, element, term, fitAddon, searchAddon, sessionId: sid }; - term.onData(data => sendInput(data, pane.sessionId)); - pollWorker.postMessage({ type: 'start_poll', paneId: id, sessionId: sid }); - - // Click to focus - element.addEventListener('mousedown', () => focusPane(id)); - - tab.panes.push(pane); - focusPane(id); - - return pane; - } -``` - -**Step 2: Commit** - -```bash -git add static/index.html -git commit -m "refactor: createPane now accepts parent tab" -``` - ---- - -### Task 5: Implement Tab Management Functions (createTab, switchTab, closeTab, renameTab) - -**Files:** -- Modify: `static/index.html` — add new functions after createPane - -**Step 1: Add createTab function** - -Insert after the `createPane` function: -```javascript - // ── Tab Management ────────────────────────────────────────────── - async function createTab() { - if (tabs.length >= MAX_TABS) return null; - - const id = 'tab-' + (++tabIdCounter); - const label = 'Shell ' + tabIdCounter; - - // Create per-tab pane container - const paneContainer = document.createElement('div'); - paneContainer.className = 'tab-pane-container'; - paneContainer.id = id + '-panes'; - document.body.appendChild(paneContainer); - - const tab = { - id, - label, - panes: [], - activePaneId: null, - paneContainer, - divider: null - }; - - tabs.push(tab); - - // Render tab in the tab bar - renderTabBar(); - - // Switch to new tab (hides others) - switchTab(id); - - // Create first pane - await createPane(tab); - - updateTabButtons(); - return tab; - } - - function switchTab(id) { - const prevTab = getActiveTab(); - activeTabId = id; - - // Toggle pane container visibility - tabs.forEach(t => { - t.paneContainer.classList.toggle('hidden', t.id !== id); - }); - - // Update tab bar active state - renderTabBar(); - - // Refit panes in the newly visible tab and focus - const tab = getActiveTab(); - if (tab && tab.panes.length > 0) { - requestAnimationFrame(() => { - refitAllPanes(); - const ap = tab.panes.find(p => p.id === tab.activePaneId) || tab.panes[0]; - if (ap) ap.term.focus(); - }); - } - } - - function closeTab(id) { - const tab = tabs.find(t => t.id === id); - if (!tab) return; - - // Cleanup all panes in this tab - tab.panes.forEach(p => { - cleanupPane(p); - p.term.dispose(); - }); - - // Remove DOM - tab.paneContainer.remove(); - - // Remove from array - tabs = tabs.filter(t => t.id !== id); - - // If we closed the active tab, switch to the last tab - if (activeTabId === id) { - if (tabs.length > 0) { - switchTab(tabs[tabs.length - 1].id); - } - } - - // If no tabs left, create a new one - if (tabs.length === 0) { - tabIdCounter = 0; - createTab(); - return; - } - - renderTabBar(); - updateTabButtons(); - } - - function startRenameTab(id) { - const labelEl = document.querySelector(`#tab-bar .tab[data-tab-id="${id}"] .tab-label`); - if (!labelEl) return; - labelEl.contentEditable = 'true'; - labelEl.focus(); - - // Select all text - const range = document.createRange(); - range.selectNodeContents(labelEl); - window.getSelection().removeAllRanges(); - window.getSelection().addRange(range); - - function finishRename() { - labelEl.contentEditable = 'false'; - const newLabel = labelEl.textContent.trim(); - const tab = tabs.find(t => t.id === id); - if (tab && newLabel) { - tab.label = newLabel; - } else if (tab) { - labelEl.textContent = tab.label; // revert empty - } - labelEl.removeEventListener('blur', finishRename); - labelEl.removeEventListener('keydown', handleKey); - // Refocus terminal - const ap = getActivePane(); - if (ap) ap.term.focus(); - } - - function handleKey(e) { - if (e.key === 'Enter') { - e.preventDefault(); - finishRename(); - } - if (e.key === 'Escape') { - e.preventDefault(); - const tab = tabs.find(t => t.id === id); - if (tab) labelEl.textContent = tab.label; - finishRename(); - } - } - - labelEl.addEventListener('blur', finishRename); - labelEl.addEventListener('keydown', handleKey); - } -``` - -**Step 2: Add renderTabBar function** - -```javascript - function renderTabBar() { - const tabBar = document.getElementById('tab-bar'); - const newTabBtn = document.getElementById('new-tab-btn'); - - // Remove old tab elements (keep the + button) - tabBar.querySelectorAll('.tab').forEach(el => el.remove()); - - // Insert tabs before the + button - tabs.forEach((tab, index) => { - const tabEl = document.createElement('div'); - tabEl.className = 'tab' + (tab.id === activeTabId ? ' active' : ''); - tabEl.dataset.tabId = tab.id; - - const label = document.createElement('span'); - label.className = 'tab-label'; - label.textContent = tab.label; - tabEl.appendChild(label); - - const closeBtn = document.createElement('button'); - closeBtn.className = 'tab-close'; - closeBtn.textContent = '\u00D7'; - closeBtn.title = 'Close tab'; - closeBtn.addEventListener('click', (e) => { - e.stopPropagation(); - closeTab(tab.id); - }); - tabEl.appendChild(closeBtn); - - // Click to switch - tabEl.addEventListener('click', () => switchTab(tab.id)); - - // Double-click to rename - tabEl.addEventListener('dblclick', (e) => { - e.preventDefault(); - startRenameTab(tab.id); - }); - - tabBar.insertBefore(tabEl, newTabBtn); - }); - - // Update + button state - newTabBtn.disabled = tabs.length >= MAX_TABS; - } - - function updateTabButtons() { - // Update toolbar pane buttons for active tab - const tab = getActiveTab(); - const multi = tab && tab.panes.length > 1; - document.getElementById('close-pane-btn').style.display = multi ? '' : 'none'; - document.getElementById('next-pane-btn').style.display = multi ? '' : 'none'; - document.getElementById('split-btn').style.display = (tab && tab.panes.length >= 2) ? 'none' : ''; - } -``` - -**Step 3: Commit** - -```bash -git add static/index.html -git commit -m "feat: implement createTab, switchTab, closeTab, renameTab" -``` - ---- - -### Task 6: Rewrite splitPane, closeActivePane, cyclePaneFocus for Tab Context - -**Files:** -- Modify: `static/index.html` — replace `splitPane`, `closeActivePane`, `cyclePaneFocus` functions - -**Step 1: Replace splitPane** - -```javascript - async function splitPane() { - const tab = getActiveTab(); - if (!tab || tab.panes.length >= 2) return; - status.textContent = 'Splitting...'; - status.style.display = ''; - try { - await createPane(tab); - // Reset flex for even split - tab.panes.forEach(p => { p.element.style.flex = '1'; }); - refitAllPanes(); - updateTabButtons(); - status.style.display = 'none'; - } catch (e) { - status.textContent = 'Split failed: ' + e.message; - status.style.color = '#ff5555'; - } - } -``` - -**Step 2: Replace closeActivePane** - -```javascript - function closeActivePane() { - const tab = getActiveTab(); - if (!tab) return; - - // If only one pane, close the whole tab - if (tab.panes.length <= 1) { - closeTab(tab.id); - return; - } - - const ap = tab.panes.find(p => p.id === tab.activePaneId) || tab.panes[0]; - if (!ap) return; - - cleanupPane(ap); - ap.term.dispose(); - ap.element.remove(); - - // Remove divider - if (tab.divider) { - tab.divider.remove(); - tab.divider = null; - } - - tab.panes = tab.panes.filter(p => p.id !== ap.id); - - // Reset remaining pane to full width - if (tab.panes.length === 1) { - tab.panes[0].element.style.flex = '1'; - } - - focusPane(tab.panes[0].id); - refitAllPanes(); - updateTabButtons(); - } -``` - -**Step 3: Replace cyclePaneFocus** - -```javascript - function cyclePaneFocus(direction) { - const tab = getActiveTab(); - if (!tab || tab.panes.length <= 1) return; - const idx = tab.panes.findIndex(p => p.id === tab.activePaneId); - const next = direction === 'next' - ? (idx + 1) % tab.panes.length - : (idx - 1 + tab.panes.length) % tab.panes.length; - focusPane(tab.panes[next].id); - } -``` - -**Step 4: Add cycleTabFocus** - -```javascript - function cycleTabFocus(direction) { - if (tabs.length <= 1) return; - const idx = tabs.findIndex(t => t.id === activeTabId); - const next = direction === 'next' - ? (idx + 1) % tabs.length - : (idx - 1 + tabs.length) % tabs.length; - switchTab(tabs[next].id); - } - - function jumpToTab(number) { - // number is 1-indexed - if (number >= 1 && number <= tabs.length) { - switchTab(tabs[number - 1].id); - } - } -``` - -**Step 5: Commit** - -```bash -git add static/index.html -git commit -m "refactor: pane operations now scoped to active tab" -``` - ---- - -### Task 7: Update Divider Drag for Per-Tab Dividers - -**Files:** -- Modify: `static/index.html` — `setupDividerDrag` function - -**Step 1: Update setupDividerDrag to accept tab parameter** - -Replace the function: -```javascript - function setupDividerDrag(divider, tab) { - let dragging = false; - - divider.addEventListener('mousedown', e => { - e.preventDefault(); - dragging = true; - divider.classList.add('dragging'); - document.body.style.cursor = 'col-resize'; - document.body.style.userSelect = 'none'; - }); - - document.addEventListener('mousemove', e => { - if (!dragging || tab.panes.length < 2) return; - const rect = tab.paneContainer.getBoundingClientRect(); - let pct = ((e.clientX - rect.left) / rect.width) * 100; - pct = Math.max(15, Math.min(85, pct)); - tab.panes[0].element.style.flex = `0 0 ${pct}%`; - tab.panes[1].element.style.flex = '1 1 0'; - refitAllPanes(); - }); - - document.addEventListener('mouseup', () => { - if (dragging) { - dragging = false; - divider.classList.remove('dragging'); - document.body.style.cursor = ''; - document.body.style.userSelect = ''; - refitAllPanes(); - } - }); - } -``` - -**Step 2: Commit** - -```bash -git add static/index.html -git commit -m "refactor: divider drag scoped to parent tab" -``` - ---- - -### Task 8: Update Keyboard Shortcuts - -**Files:** -- Modify: `static/index.html` — the `document.addEventListener('keydown', ...)` block - -**Step 1: Replace the shortcut block** - -Replace the entire keyboard shortcuts section (lines 633-674) with: -```javascript - // ── Global Keyboard Shortcuts ────────────────────────────────── - document.addEventListener('keydown', e => { - // Ctrl+= : increase font - if (e.ctrlKey && !e.altKey && !e.shiftKey && (e.key === '=' || e.key === '+')) { - e.preventDefault(); setFontSize(currentFontSize + 1); return; - } - // Ctrl+- : decrease font - if (e.ctrlKey && !e.altKey && !e.shiftKey && e.key === '-') { - e.preventDefault(); setFontSize(currentFontSize - 1); return; - } - // Ctrl+0 : reset font - if (e.ctrlKey && !e.altKey && !e.shiftKey && e.key === '0') { - e.preventDefault(); setFontSize(DEFAULT_FONT_SIZE); return; - } - // Ctrl+Shift+F : toggle search - if (e.ctrlKey && e.shiftKey && e.key === 'F') { - e.preventDefault(); toggleSearch(); return; - } - // Alt+V (Option+V) : toggle voice dictation - if (e.altKey && !e.ctrlKey && !e.shiftKey && e.code === 'KeyV') { - e.preventDefault(); - if (dictationActive) closeDictation(); - else startDictation(); - return; - } - - // ── Tab shortcuts (Ctrl+Shift) ── - // Ctrl+Shift+T : new tab - if (e.ctrlKey && e.shiftKey && e.key === 'T') { - e.preventDefault(); createTab(); return; - } - // Ctrl+Shift+W : close active pane (closes tab if last pane) - if (e.ctrlKey && e.shiftKey && e.key === 'W') { - e.preventDefault(); closeActivePane(); return; - } - // Ctrl+Shift+] : next tab - if (e.ctrlKey && e.shiftKey && e.code === 'BracketRight') { - e.preventDefault(); cycleTabFocus('next'); return; - } - // Ctrl+Shift+[ : prev tab - if (e.ctrlKey && e.shiftKey && e.code === 'BracketLeft') { - e.preventDefault(); cycleTabFocus('prev'); return; - } - // Ctrl+Shift+1-5 : jump to tab - if (e.ctrlKey && e.shiftKey && e.code >= 'Digit1' && e.code <= 'Digit5') { - e.preventDefault(); jumpToTab(parseInt(e.code.slice(-1))); return; - } - - // ── Pane shortcuts (Alt+Shift) ── - // Alt+Shift+D : split pane - if (e.altKey && e.shiftKey && e.key === 'D') { - e.preventDefault(); splitPane(); return; - } - // Alt+Shift+W : close pane - if (e.altKey && e.shiftKey && e.key === 'W') { - e.preventDefault(); closeActivePane(); return; - } - // Alt+Shift+] : next pane - if (e.altKey && e.shiftKey && e.code === 'BracketRight') { - e.preventDefault(); cyclePaneFocus('next'); return; - } - // Alt+Shift+[ : prev pane - if (e.altKey && e.shiftKey && e.code === 'BracketLeft') { - e.preventDefault(); cyclePaneFocus('prev'); return; - } - }); -``` - -**Step 2: Update toolbar button tooltips** - -Update line 224 to reflect new shortcut: -```html - - - -``` - -**Step 3: Commit** - -```bash -git add static/index.html -git commit -m "feat: add tab keyboard shortcuts, move pane shortcuts to Alt+Shift" -``` - ---- - -### Task 9: Update Toolbar Button Wiring and Cleanup Functions - -**Files:** -- Modify: `static/index.html` — toolbar button listeners, cleanupAllPanes, pagehide, updatePaneButtons - -**Step 1: Wire the new-tab button** - -Add after the existing toolbar button listeners: -```javascript - document.getElementById('new-tab-btn').addEventListener('click', () => createTab()); -``` - -**Step 2: Update cleanupAllPanes to iterate all tabs** - -Replace: -```javascript - function cleanupAllPanes() { - panes.forEach(p => cleanupPane(p)); - } -``` -With: -```javascript - function cleanupAllPanes() { - getAllPanes().forEach(p => cleanupPane(p)); - } -``` - -**Step 3: Update pagehide beacon to iterate all tabs** - -Replace the `pagehide` listener: -```javascript - window.addEventListener('pagehide', () => { - getAllPanes().forEach(p => { - if (p.sessionId) { - navigator.sendBeacon( - '/api/heartbeat', - new Blob([JSON.stringify({ session_id: p.sessionId })], { type: 'application/json' }) - ); - } - }); - }); -``` - -**Step 4: Replace the old updatePaneButtons function** - -The `updatePaneButtons` function was already rewritten in Task 5 as `updateTabButtons`. Remove the old one (lines 930-935) if it still exists. - -**Step 5: Commit** - -```bash -git add static/index.html -git commit -m "feat: wire new-tab button, update cleanup for tabs" -``` - ---- - -### Task 10: Update init() to Create First Tab Instead of First Pane - -**Files:** -- Modify: `static/index.html` — `init` function - -**Step 1: Replace init** - -```javascript - async function init() { - try { - status.textContent = 'Initializing terminal...'; - - if (typeof Terminal === 'undefined') throw new Error('xterm.js not loaded'); - if (typeof FitAddon === 'undefined') throw new Error('FitAddon not loaded'); - - await createTab(); - - status.textContent = 'Connected!'; - setTimeout(() => { status.style.display = 'none'; }, 1000); - - window.addEventListener('resize', () => refitAllPanes()); - window.addEventListener('beforeunload', () => cleanupAllPanes()); - - } catch (e) { - status.textContent = 'Error: ' + e.message; - status.style.color = '#ff5555'; - console.error(e); - } - } -``` - -**Step 2: Remove the old `
`** - -This was already replaced in Task 1, but verify it's gone. The `createTab` function now creates per-tab pane containers dynamically. - -**Step 3: Verify end-to-end** - -Open the page in a browser. Verify: -- Tab bar appears at top with "Shell 1" tab and "+" button -- Terminal renders and works below the tab bar -- Click "+" creates "Shell 2" with its own terminal session -- Clicking tabs switches between them -- Double-click a tab label to rename it -- Click "x" on a tab to close it -- `Ctrl+Shift+T` creates a new tab -- `Ctrl+Shift+[/]` cycles tabs -- `Alt+Shift+D` splits the active tab's pane -- `Alt+Shift+W` closes a pane (or tab if last pane) -- Closing the last tab auto-creates a new "Shell 1" -- Max 5 tabs, "+" button disables at cap - -**Step 4: Commit** - -```bash -git add static/index.html -git commit -m "feat: init creates first tab, multi-tab terminals complete" -``` - ---- - -### Task 11: Remove Dead Code and Final Cleanup - -**Files:** -- Modify: `static/index.html` - -**Step 1: Remove any remaining references to the old global `panes` variable** - -Search for `panes.forEach`, `panes.find`, `panes.length`, `panes.filter`, `panes.push`, `panes.pop`, `panes[` in the file. All should now reference `tab.panes` or `getAllPanes()`. Remove any dead code. - -**Step 2: Remove the old `#pane-container` and `#pane-divider` CSS rules if still present** - -They've been replaced by `.tab-pane-container` and `.pane-divider`. - -**Step 3: Verify no console errors** - -Open browser dev tools, check console is clean. - -**Step 4: Commit** - -```bash -git add static/index.html -git commit -m "chore: remove dead pane code, cleanup" -``` diff --git a/docs/plans/2026-03-11-litellm-empty-content-blocks-design.md b/docs/plans/2026-03-11-litellm-empty-content-blocks-design.md deleted file mode 100644 index 745def2..0000000 --- a/docs/plans/2026-03-11-litellm-empty-content-blocks-design.md +++ /dev/null @@ -1,156 +0,0 @@ -# Design: LiteLLM Local Proxy for Empty Content Block Sanitization - -**Date:** 2026-03-11 -**Branch:** `fix/litellm-empty-content-blocks` -**Related:** OpenCode [#5028](https://github.com/sst/opencode/issues/5028), LiteLLM [PR #20384](https://github.com/BerriAI/litellm/pull/20384) - -## Problem - -OpenCode intermittently sends malformed messages containing empty text content blocks -(`{"type": "text", "text": ""}`) to the Databricks Foundation Model API. This occurs during: - -1. **Streaming** — empty text blocks appear between thinking blocks in conversation history -2. **Compaction** — `/compact` command produces empty or whitespace-only blocks -3. **Model switching** — switching between models (e.g., Gemini to Claude) generates whitespace-only chunks - -The Databricks Foundation Model API strictly rejects these with: -``` -Bad Request: {"message":"messages: text content blocks must be non-empty"} -``` - -Once a corrupted message enters the conversation history, **every subsequent request fails** — -the session is permanently bricked. This is OpenCode issue -[#5028](https://github.com/sst/opencode/issues/5028), still open as of March 2026. - -## Why Not PR #52's Approach - -[PR #52](https://github.com/datasciencemonkey/coding-agents-databricks-apps/pull/52) proposes -forking OpenCode (`dgokeeffe/opencode`) to add a native Databricks provider. After analysis: - -1. **Does not fix the root cause** — The fork's `feat/databricks-ai-sdk-provider` branch - has no commits that sanitize empty content blocks. The bug originates in OpenCode's core - agent loop (conversation history management), not the provider layer. A native provider - sends whatever the core gives it. - -2. **Fork maintenance burden** — Must track upstream OpenCode releases indefinitely. - When upstream fixes #5028, the fork may conflict. - -3. **Scope creep** — PR #52 bundles the fork with a spawner app, GitHub CLI setup, - and performance fixes. These are independent concerns that should be separate PRs. - -4. **Fragile coupling** — Tightly couples our project to a fork that may diverge from - upstream, creating long-term maintenance risk for a demo/tool project. - -### What to cherry-pick from PR #52 (separately) - -PR #52 contains valuable changes that are **independent of the fork** and should be -extracted into their own PRs: - -- **Performance fixes** — `select()` timeout reduction (500ms → 50ms), lock contention - fixes in `get_output_batch()` and `cleanup_stale_sessions()`, poll-worker interval - reduction (100ms → 50ms). These are changes to `app.py` and `static/poll-worker.js`. - -- **WebSocket detection fix** — Correct Socket.IO transport detection that checks - `socket.io.engine.transport.name` instead of trusting `connected=true`. This is a - change to `static/index.html`. - -- **GitHub CLI setup** — Automated `gh` install with xterm.js-safe auth wrapper. - Standalone setup script. - -These should be reviewed and merged independently — they don't require the OpenCode fork. - -## Our Approach: LiteLLM Local Proxy - -Run a lightweight LiteLLM instance **inside the same container** on an internal port. -It intercepts requests from OpenCode, strips empty content blocks via the sanitization -logic added in [LiteLLM PR #20384](https://github.com/BerriAI/litellm/pull/20384), -and forwards clean messages to Databricks AI Gateway. - -### Architecture - -In the current setup, **OpenCode** talks directly to the **Databricks AI Gateway**. -Because OpenCode sends malformed "empty text blocks," the Gateway rejects them -immediately with a 400 error. - -By introducing **LiteLLM**, we change the traffic flow inside the container: - -``` -Users → port 8000 (Flask/xterm.js UI) - ↓ spawns PTY - OpenCode → localhost:4000 (LiteLLM) → Databricks AI Gateway → Claude/Gemini -``` - -1. **OpenCode** (the agent) sends the request to `http://localhost:4000` (the **LiteLLM Proxy**). -2. **LiteLLM** intercepts the request *before* it leaves the container. -3. **LiteLLM** applies the sanitization logic (stripping the `{"type": "text", "text": ""}` blocks). -4. **LiteLLM** then forwards the "cleaned" request to the **Databricks AI Gateway**. -5. **Databricks** receives a perfectly valid request and processes it. - -So, while the traffic eventually reaches Databricks, it is "washed" by LiteLLM locally -first. This ensures that the Databricks Gateway never sees the malformed data that causes -it to throw an error. - -- **Port 8000** — Flask/Gunicorn (exposed to users via Databricks Apps) -- **Port 4000** — LiteLLM proxy (internal only, never exposed externally) -- Databricks Apps only routes external traffic to port 8000 - -When upstream OpenCode eventually fixes #5028, LiteLLM becomes a no-op (nothing to -strip) — it degrades gracefully. At that point, remove `setup_litellm.py`, revert the -baseURL in `setup_opencode.py`, and drop the dependency. - -### Implementation Plan - -#### 1. Add `litellm` to `requirements.txt` - -``` -litellm>=1.60 -``` - -#### 2. Create `setup_litellm.py` - -New setup script that: -- Writes a LiteLLM config YAML pointing to Databricks AI Gateway -- Starts LiteLLM as a background process on `localhost:4000` -- Waits for the health endpoint to confirm it's ready -- Maps each Databricks model to the `databricks/` prefix so the sanitization path activates - -#### 3. Update `setup_opencode.py` - -Change OpenCode's `baseURL` from the Databricks Gateway URL to `http://localhost:4000` -so all requests route through LiteLLM first. The model names and auth stay the same. - -#### 4. Add `litellm` setup step to `app.py` - -Add a new step in `run_setup()` that runs **before** the parallel agent setup -(LiteLLM must be running before OpenCode starts using it): - -```python -# Sequential: LiteLLM proxy must be running before agents that use it -_run_step("litellm", ["python", "setup_litellm.py"]) - -# Then parallel agent setup... -``` - -#### 5. Health check - -`setup_litellm.py` should poll `http://localhost:4000/health` before returning success, -ensuring the proxy is ready before OpenCode sends its first request. - -### Trade-offs - -| Aspect | Impact | -|--------|--------| -| Added dependency | `litellm` package (~small footprint as proxy) | -| Added latency | Negligible — localhost hop, no network | -| Startup time | ~2-3s for LiteLLM to start (sequential, before agents) | -| Maintenance | Zero — LiteLLM is a well-maintained OSS project | -| Graceful degradation | When #5028 is fixed upstream, proxy strips nothing | -| Governance preserved | AI Gateway, MLflow tracing, Unity Catalog all intact | - -### Testing - -1. Deploy to Databricks Apps -2. Launch OpenCode with `databricks-claude-opus-4-6` -3. Run 10+ iterations including `/compact` — verify no 400 errors -4. Check MLflow traces — confirm requests still flow through AI Gateway -5. Verify LiteLLM is NOT accessible from outside the container (port 4000 not exposed) diff --git a/docs/plans/2026-03-27-pat-auto-rotation-implementation.md b/docs/plans/2026-03-27-pat-auto-rotation-implementation.md deleted file mode 100644 index df55204..0000000 --- a/docs/plans/2026-03-27-pat-auto-rotation-implementation.md +++ /dev/null @@ -1,510 +0,0 @@ -# PAT Auto-Rotation Implementation Plan - -> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task. - -**Goal:** Implement automatic PAT rotation with 2-hour short-lived tokens, rotating every 90 minutes, with persistence to app secrets for restart survival. Fixes #81. - -**Architecture:** New `pat_rotator.py` module with a `PATRotator` class that runs a background daemon thread. Uses current PAT to mint new PAT, persists to Secrets API via SP credentials, writes to `~/.databrickscfg`, revokes old PAT. Integrated into `initialize_app()`. - -**Tech Stack:** Python, Flask, databricks-sdk, requests, threading - ---- - -### Task 1: Create PATRotator module with tests - -**Files:** -- Create: `pat_rotator.py` -- Create: `tests/test_pat_rotator.py` - -**Step 1: Write the failing tests** - -```python -# tests/test_pat_rotator.py -"""Tests for PAT auto-rotation — short-lived tokens with background refresh.""" - -import os -import time -import threading -from unittest import mock - -import pytest - - -class TestPATRotation: - """Core rotation logic.""" - - def test_rotate_mints_new_token(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com", rotation_interval=5400, token_lifetime=7200) - rotator._current_token = "old-pat" - rotator._current_token_id = "old-id" - - mock_response_create = mock.MagicMock() - mock_response_create.status_code = 200 - mock_response_create.json.return_value = { - "token_value": "new-pat", - "token_info": {"token_id": "new-id", "expiry_time": int(time.time() + 7200) * 1000} - } - mock_response_delete = mock.MagicMock() - mock_response_delete.status_code = 200 - - with mock.patch("pat_rotator.requests.post") as mock_post: - mock_post.side_effect = [mock_response_create, mock_response_delete] - with mock.patch.object(rotator, "_persist_token"): - result = rotator._rotate_once() - - assert result is True - assert rotator._current_token == "new-pat" - assert rotator._current_token_id == "new-id" - - def test_rotate_revokes_old_token(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - rotator._current_token = "old-pat" - rotator._current_token_id = "old-id" - - mock_response_create = mock.MagicMock() - mock_response_create.status_code = 200 - mock_response_create.json.return_value = { - "token_value": "new-pat", - "token_info": {"token_id": "new-id", "expiry_time": int(time.time() + 7200) * 1000} - } - mock_response_delete = mock.MagicMock() - mock_response_delete.status_code = 200 - - with mock.patch("pat_rotator.requests.post") as mock_post: - mock_post.side_effect = [mock_response_create, mock_response_delete] - with mock.patch.object(rotator, "_persist_token"): - rotator._rotate_once() - - # Second call should be the delete with the OLD token id - delete_call = mock_post.call_args_list[1] - assert "token/delete" in delete_call[0][0] - assert delete_call[1]["json"]["token_id"] == "old-id" - - def test_rotate_fails_gracefully_on_create_error(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - rotator._current_token = "old-pat" - rotator._current_token_id = "old-id" - - mock_response = mock.MagicMock() - mock_response.status_code = 403 - mock_response.text = "Forbidden" - - with mock.patch("pat_rotator.requests.post", return_value=mock_response): - result = rotator._rotate_once() - - assert result is False - assert rotator._current_token == "old-pat" # Unchanged - - def test_rotate_continues_if_revoke_fails(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - rotator._current_token = "old-pat" - rotator._current_token_id = "old-id" - - mock_create = mock.MagicMock() - mock_create.status_code = 200 - mock_create.json.return_value = { - "token_value": "new-pat", - "token_info": {"token_id": "new-id", "expiry_time": int(time.time() + 7200) * 1000} - } - mock_delete = mock.MagicMock() - mock_delete.status_code = 500 - - with mock.patch("pat_rotator.requests.post") as mock_post: - mock_post.side_effect = [mock_create, mock_delete] - with mock.patch.object(rotator, "_persist_token"): - result = rotator._rotate_once() - - # New token should still be active even if old revocation failed - assert result is True - assert rotator._current_token == "new-pat" - - -class TestTokenPersistence: - """Writing token to ~/.databrickscfg.""" - - def test_writes_databrickscfg(self, tmp_path): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - rotator._databrickscfg_path = str(tmp_path / ".databrickscfg") - rotator._write_databrickscfg("test-token") - - content = (tmp_path / ".databrickscfg").read_text() - assert "test-token" in content - assert "https://test.databricks.com" in content - - def test_databrickscfg_permissions(self, tmp_path): - import stat - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - rotator._databrickscfg_path = str(tmp_path / ".databrickscfg") - rotator._write_databrickscfg("test-token") - - mode = os.stat(str(tmp_path / ".databrickscfg")).st_mode - assert stat.S_IMODE(mode) == 0o600 - - def test_updates_env_var(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - with mock.patch.object(rotator, "_write_databrickscfg"): - with mock.patch.object(rotator, "_persist_to_secret"): - rotator._persist_token("new-token-value") - assert os.environ.get("DATABRICKS_TOKEN") == "new-token-value" - - -class TestSecretPersistence: - """Persisting rotated token to app secret via SP.""" - - def test_persist_to_secret_calls_sdk(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com", - secret_scope="my-scope", secret_key="DATABRICKS_TOKEN") - - with mock.patch("pat_rotator.WorkspaceClient") as mock_ws: - rotator._persist_to_secret("new-token") - mock_ws.return_value.secrets.put_secret.assert_called_once_with( - scope="my-scope", key="DATABRICKS_TOKEN", string_value="new-token" - ) - - def test_persist_skipped_when_no_scope_configured(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com", - secret_scope=None, secret_key=None) - - with mock.patch("pat_rotator.WorkspaceClient") as mock_ws: - rotator._persist_to_secret("new-token") - mock_ws.return_value.secrets.put_secret.assert_not_called() - - -class TestRotatorLifecycle: - """Start/stop the background thread.""" - - def test_start_creates_daemon_thread(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com", rotation_interval=9999) - rotator._current_token = "test-pat" - with mock.patch.object(rotator, "_rotation_loop"): - rotator.start() - assert rotator._thread is not None - assert rotator._thread.daemon is True - rotator.stop() - - def test_no_start_without_token(self): - from pat_rotator import PATRotator - rotator = PATRotator(host="https://test.databricks.com") - rotator._current_token = None - rotator.start() - assert rotator._thread is None -``` - -**Step 2: Run tests to verify they fail** - -Run: `uv run pytest tests/test_pat_rotator.py -v` -Expected: FAIL — `ModuleNotFoundError: No module named 'pat_rotator'` - -**Step 3: Write implementation** - -```python -# pat_rotator.py -"""Auto-rotate short-lived PATs in the background. - -Mints a new 2-hour PAT every 90 minutes, persists to app secret -(survives restart), writes to ~/.databrickscfg (immediate CLI/SDK use), -and revokes the old PAT. Fixes #81. -""" - -import os -import time -import threading -import logging - -import requests -from databricks.sdk import WorkspaceClient - -from utils import ensure_https - -logger = logging.getLogger(__name__) - -# Defaults -DEFAULT_TOKEN_LIFETIME = 7200 # 2 hours -DEFAULT_ROTATION_INTERVAL = 5400 # 90 minutes - - -class PATRotator: - """Background PAT rotation with secret persistence.""" - - def __init__(self, host=None, rotation_interval=DEFAULT_ROTATION_INTERVAL, - token_lifetime=DEFAULT_TOKEN_LIFETIME, - secret_scope=None, secret_key=None): - self._host = ensure_https(host or os.environ.get("DATABRICKS_HOST", "")) - self._rotation_interval = rotation_interval - self._token_lifetime = token_lifetime - self._secret_scope = secret_scope - self._secret_key = secret_key - self._current_token = os.environ.get("DATABRICKS_TOKEN", "").strip() or None - self._current_token_id = None - self._lock = threading.Lock() - self._thread = None - self._stop_event = threading.Event() - self._databrickscfg_path = os.path.join( - os.environ.get("HOME", "/app/python/source_code"), - ".databrickscfg" - ) - - @property - def token(self): - with self._lock: - return self._current_token - - def start(self): - """Start the background rotation thread.""" - if not self._current_token: - logger.warning("No PAT configured — rotation thread not started") - return - if self._thread and self._thread.is_alive(): - return - self._stop_event.clear() - self._thread = threading.Thread(target=self._rotation_loop, daemon=True, - name="pat-rotation") - self._thread.start() - logger.info(f"PAT rotation started (interval={self._rotation_interval}s, " - f"lifetime={self._token_lifetime}s)") - - def stop(self): - """Signal the rotation thread to stop.""" - self._stop_event.set() - - def _rotation_loop(self): - """Background loop: sleep, rotate, repeat.""" - while not self._stop_event.is_set(): - self._stop_event.wait(timeout=self._rotation_interval) - if self._stop_event.is_set(): - break - try: - self._rotate_once() - except Exception as e: - logger.error(f"PAT rotation failed unexpectedly: {e}") - - def _rotate_once(self): - """Mint new PAT, persist, revoke old. Returns True on success.""" - if not self._current_token: - return False - - # 1. Mint new token - try: - resp = requests.post( - f"{self._host}/api/2.0/token/create", - headers={"Authorization": f"Bearer {self._current_token}"}, - json={ - "lifetime_seconds": self._token_lifetime, - "comment": "coda-auto-rotated" - }, - timeout=30 - ) - except requests.RequestException as e: - logger.error(f"PAT rotation: create request failed: {e}") - return False - - if resp.status_code != 200: - logger.error(f"PAT rotation: create failed ({resp.status_code}): {resp.text}") - return False - - data = resp.json() - new_token = data["token_value"] - new_token_id = data["token_info"]["token_id"] - - old_token_id = self._current_token_id - - # 2. Persist new token (secret + file + env) - with self._lock: - self._current_token = new_token - self._current_token_id = new_token_id - self._persist_token(new_token) - logger.info(f"PAT rotated successfully (new_id={new_token_id})") - - # 3. Revoke old token (best-effort — old token expires in 2h anyway) - if old_token_id: - try: - resp = requests.post( - f"{self._host}/api/2.0/token/delete", - headers={"Authorization": f"Bearer {new_token}"}, - json={"token_id": old_token_id}, - timeout=30 - ) - if resp.status_code == 200: - logger.info(f"Old PAT revoked (id={old_token_id})") - else: - logger.warning(f"Old PAT revocation failed ({resp.status_code})") - except requests.RequestException as e: - logger.warning(f"Old PAT revocation request failed: {e}") - - return True - - def _persist_token(self, token): - """Write rotated token to all persistence layers.""" - os.environ["DATABRICKS_TOKEN"] = token - self._write_databrickscfg(token) - self._persist_to_secret(token) - - def _write_databrickscfg(self, token): - """Write token to ~/.databrickscfg for CLI/SDK tools.""" - content = ( - "[DEFAULT]\n" - f"host = {self._host}\n" - f"token = {token}\n" - ) - try: - with open(self._databrickscfg_path, "w") as f: - f.write(content) - os.chmod(self._databrickscfg_path, 0o600) - except OSError as e: - logger.warning(f"Could not write .databrickscfg: {e}") - - def _persist_to_secret(self, token): - """Persist token to Databricks app secret (survives restart).""" - if not self._secret_scope or not self._secret_key: - return - try: - w = WorkspaceClient() - w.secrets.put_secret(scope=self._secret_scope, key=self._secret_key, - string_value=token) - logger.info("Rotated PAT persisted to app secret") - except Exception as e: - logger.warning(f"Could not persist PAT to secret: {e}") -``` - -**Step 4: Run tests** - -Run: `uv run pytest tests/test_pat_rotator.py -v` -Expected: All PASS - -**Step 5: Commit** - -```bash -git add pat_rotator.py tests/test_pat_rotator.py -git -c user.email=datasciencemonkey@gmail.com -c user.name="Sathish Gangichetty" commit -m "feat: add PATRotator for short-lived token auto-rotation (#81)" -``` - ---- - -### Task 2: Integrate PATRotator into app.py - -**Files:** -- Modify: `app.py` (initialize_app, ~line 917) - -**Step 1: Write failing test** - -```python -# tests/test_pat_rotation_integration.py -"""Integration test: PATRotator wired into app.""" - -from unittest import mock - -def test_app_has_pat_rotator(): - with mock.patch("app.initialize_app"): - import app as app_module - assert hasattr(app_module, "pat_rotator") -``` - -**Step 2: Run test — should fail** - -Run: `uv run pytest tests/test_pat_rotation_integration.py -v` - -**Step 3: Modify app.py** - -Add import near top (after existing imports): -```python -from pat_rotator import PATRotator -``` - -Add module-level instance: -```python -# PAT auto-rotation (short-lived tokens, background refresh) -pat_rotator = PATRotator( - secret_scope=os.environ.get("PAT_SECRET_SCOPE"), - secret_key=os.environ.get("PAT_SECRET_KEY", "DATABRICKS_TOKEN"), -) -``` - -In `initialize_app()`, after the setup thread start, add: -```python - # Start PAT auto-rotation if a PAT is configured - pat_rotator.start() -``` - -**Step 4: Run all tests** - -Run: `uv run pytest tests/ -v` - -**Step 5: Commit** - -```bash -git add app.py tests/test_pat_rotation_integration.py -git -c user.email=datasciencemonkey@gmail.com -c user.name="Sathish Gangichetty" commit -m "feat: wire PATRotator into app startup (#81)" -``` - ---- - -### Task 3: Update app.yaml with secret resource and rotation env vars - -**Files:** -- Modify: `app.yaml` - -**Step 1: Update app.yaml** - -```yaml -command: - - gunicorn - - app:app -env: - - name: HOME - value: /app/python/source_code - - name: DATABRICKS_TOKEN - valueFrom: DATABRICKS_TOKEN - - name: PAT_SECRET_SCOPE - value: coda-app - - name: PAT_SECRET_KEY - value: DATABRICKS_TOKEN - - name: ANTHROPIC_MODEL - value: databricks-claude-opus-4-6 - - name: GEMINI_MODEL - value: databricks-gemini-3-1-pro - - name: CODEX_MODEL - value: databricks-gpt-5-2 - - name: DATABRICKS_GATEWAY_HOST - valueFrom: DATABRICKS_GATEWAY_HOST - - name: CLAUDE_CODE_DISABLE_AUTO_MEMORY - value: 0 -resources: - - name: pat-token - secret: - scope: coda-app - key: DATABRICKS_TOKEN - permission: WRITE -``` - -**Step 2: Commit** - -```bash -git add app.yaml -git -c user.email=datasciencemonkey@gmail.com -c user.name="Sathish Gangichetty" commit -m "chore: add secret resource with WRITE for PAT rotation (#81)" -``` - ---- - -### Task 4: Run full test suite and commit plan - -**Step 1: Run tests** - -Run: `uv run pytest tests/ -v` -Expected: All PASS - -**Step 2: Commit plan doc** - -```bash -git add docs/plans/2026-03-27-pat-auto-rotation-implementation.md -git -c user.email=datasciencemonkey@gmail.com -c user.name="Sathish Gangichetty" commit -m "docs: PAT auto-rotation implementation plan (#81)" -``` diff --git a/docs/plans/2026-03-28-session-detach-reconnect.md b/docs/plans/2026-03-28-session-detach-reconnect.md deleted file mode 100644 index da22bda..0000000 --- a/docs/plans/2026-03-28-session-detach-reconnect.md +++ /dev/null @@ -1,119 +0,0 @@ -# Session Detach & Reconnect - -**Date:** 2026-03-28 -**Context:** Coding agent sessions (claude, opencode, gemini) should survive tab closure. Only `exit` in the shell kills a session. - ---- - -## Problem - -Closing a browser tab kills the PTY process immediately via `sendBeacon('/api/session/close')`. For a coding agent mid-task, this destroys work in progress. The user didn't intend to kill the session — they just closed a tab. - -## Design - -### Principle: Detach, Don't Kill - -- **Tab/pane close = detach.** Frontend disconnects, PTY keeps running. -- **`exit` in shell = the only kill.** PTY EOF detection triggers cleanup. -- **24-hour reaper = safety net.** Orphaned sessions die after 24h with no heartbeat. - -### Changes - -#### 1. Frontend — `cleanupPane()` stops killing - -Remove `sendBeacon('/api/session/close')` from `cleanupPane()`. Keep poll stop, WS room leave, and xterm disposal. The `beforeunload` handler still calls `cleanupAllPanes()` but it no longer kills anything. `pagehide` already just sends a heartbeat. - -#### 2. Backend — `GET /api/sessions` - -Returns active sessions with process detection: - -```json -[ - { - "session_id": "abc-123", - "created_at": 1743120382.5, - "last_poll_time": 1743120982.5, - "exited": false, - "process": "claude", - "idle_seconds": 342 - } -] -``` - -Process detection: `ps --ppid {pid} -o comm=` to find the child process of the shell. Falls back to "bash" if no child. - -Added to auth skip list alongside `/api/pat-status`. - -#### 3. Backend — `POST /api/session/attach` - -Reattach to an existing session: - -- Input: `{ session_id }` -- Validates session exists and not exited -- Resets `last_poll_time` (restarts 24h idle clock) -- Returns output buffer (last ~1000 lines) for replay -- Returns metadata (process name, created_at) - -```json -{ - "session_id": "abc-123", - "output": ["line1\r\n", "line2\r\n"], - "process": "claude", - "created_at": 1743120382.5 -} -``` - -#### 4. Frontend — Session picker on return visit - -The picker only appears when PAT is already valid (return visit). First-time PAT flow always creates a new session. - -``` -createPane() - → /api/pat-status - → invalid → PAT prompt → setup → create new session - → valid → GET /api/sessions - → 0 sessions → create new - → 1 session → auto-reattach (replay buffer) - → N sessions → show picker -``` - -**Picker UI** (rendered in xterm with mouse support): - -``` - Existing sessions: - - claude (running, 2h ago) [Attach] [✕] - opencode (running, 45m ago) [Attach] [✕] - bash (idle, 3h ago) [Attach] [✕] - - [+ New session] -``` - -- Click **Attach** or session row → `POST /api/session/attach`, replay buffer, join WS room, start polling -- Click **✕** → `POST /api/session/close` for that session, re-render picker -- Click **+ New session** → `POST /api/session` as today -- One session → skip picker, auto-reattach - -#### 5. Exited session cleanup - -When `read_pty_output()` detects EOF (user typed `exit`), call `terminate_session()` immediately to remove from dict. No zombie sessions in the picker. - -Session picker also filters out `exited: true` (defensive, race condition guard). - ---- - -## Files to Modify - -| File | Change | -|------|--------| -| `app.py` | Add `GET /api/sessions`, `POST /api/session/attach`. Update auth skip list. Update `read_pty_output()` to call `terminate_session()` on EOF. Add `_get_session_process(pid)` helper. | -| `static/index.html` | Remove `sendBeacon('/api/session/close')` from `cleanupPane()`. Add session picker flow in `createPane()`. Add mouse click handling for picker UI. | - -## What Doesn't Change - -- `POST /api/session/close` endpoint stays — used by EOF cleanup path -- `terminate_session()` stays — core kill logic unchanged -- 24-hour timeout stays — safety net for orphans -- `pagehide` heartbeat stays — already correct -- WebSocket disconnect behavior stays — already doesn't kill PTY -- PAT rotation, session awareness — unchanged (sessions still count) diff --git a/docs/plans/PLAN-issue-8.md b/docs/plans/PLAN-issue-8.md deleted file mode 100644 index ad4af7a..0000000 --- a/docs/plans/PLAN-issue-8.md +++ /dev/null @@ -1,119 +0,0 @@ -# Issue #8: Frontend Keep-Alive, Reconnection & Web Worker Polling - -## Context - -The frontend polling is fragile. A single `setInterval` at 100ms calls `/api/output` — any non-200 response immediately kills the session with no retry. Browsers throttle background tab timers, so switching tabs easily causes polls to stall past the 300s timeout. The current workaround (bumping timeout from 60s to 300s) masks the problem but doesn't fix it. - -**Branch:** Create `feat/frontend-keepalive` off `main` - -## Architecture - -``` -Main Thread (index.html) Web Worker (poll-worker.js) Backend (app.py) -───────────────────────── ────────────────────────── ──────────────── -- xterm.js / DOM - Output polling (100ms fg) - /api/output (existing) -- visibilitychange handler ←──→ - Heartbeat polling (30s bg) ──→ - /api/heartbeat (NEW) -- pagehide sendBeacon - Retry/backoff state - /api/session/close -- Input/resize sending - Per-pane state map -``` - -Web Workers are NOT throttled by browsers in background tabs — this is the key benefit. - -## Changes - -### 1. Backend: Add `/api/heartbeat` endpoint - -**File:** `app.py` (insert after `/api/output` at line 530) - -```python -@app.route("/api/heartbeat", methods=["POST"]) -def heartbeat(): - """Lightweight keep-alive — resets timeout without draining output buffer.""" - data = request.json - session_id = data.get("session_id") - with sessions_lock: - if session_id not in sessions: - return jsonify({"error": "Session not found"}), 404 - session = sessions[session_id] - session["last_poll_time"] = time.time() - timeout_warning = session.pop("timeout_warning", False) - return jsonify({"status": "ok", "timeout_warning": timeout_warning}) -``` - -Critical: does NOT touch `output_buffer` — output is only drained by `/api/output`. - -### 2. New file: `static/poll-worker.js` - -Web Worker handling all HTTP polling and retry logic (~120 lines). - -**Per-pane state:** -```javascript -const panes = new Map(); -// Each: { sessionId, pollTimerId, heartbeatTimerId, retryCount, mode: 'foreground'|'background' } -``` - -**Message protocol (main → worker):** -- `{ type: 'start_poll', paneId, sessionId }` — begin polling for a pane -- `{ type: 'stop_poll', paneId }` — stop polling on close -- `{ type: 'visibility_change', hidden: bool }` — switch fg/bg mode - -**Message protocol (worker → main):** -- `{ type: 'output', paneId, data }` — terminal output + flags -- `{ type: 'session_ended', paneId, reason }` — 'exited' | 'auth_expired' | 'shutting_down' -- `{ type: 'connection_status', paneId, status, attempt, maxAttempts }` — reconnecting/connected -- `{ type: 'session_dead', paneId }` — retries exhausted - -**Retry strategy:** Capped exponential backoff with jitter -- Base: 500ms, multiplier: 2x, max delay: 10s, max attempts: 5 -- Schedule: ~500ms → ~1s → ~2s → ~4s → ~8s (~15.5s total) -- 403 (auth) and `exited` flag: no retry (permanent) -- 404, 5xx, network error: full retry with backoff - -**Visibility modes:** -- Foreground: output poll every 100ms, no heartbeat -- Background: no output poll, heartbeat every 30s - -### 3. Modify `static/index.html` - -**Remove:** -- `pollOutput(pane)` function (lines 704-738) -- `setInterval(() => pollOutput(pane), 100)` (line 809) - -**Add:** -- Worker init: `const pollWorker = new Worker('/static/poll-worker.js');` -- `handleWorkerMessage(event)` — routes worker messages to xterm writes per pane -- `visibilitychange` listener → sends `visibility_change` to worker -- `pagehide` listener → `navigator.sendBeacon('/api/heartbeat', ...)` for all active panes - -**Modify:** -- `createPane()`: replace `setInterval` with `pollWorker.postMessage({ type: 'start_poll', ... })` -- `cleanupPane(pane)`: replace `clearInterval` with `pollWorker.postMessage({ type: 'stop_poll', ... })` -- Remove `pollInterval` from pane object (no longer needed) - -### 4. New test: `tests/test_heartbeat.py` - -- Heartbeat with valid session returns 200, resets `last_poll_time` -- Heartbeat with unknown session returns 404 -- Heartbeat does NOT drain output buffer (critical invariant) -- Heartbeat returns and clears `timeout_warning` flag - -## Edge Cases Handled - -| Scenario | Behavior | -|----------|----------| -| Background tab | Worker switches to 30s heartbeat; resumes 100ms polling on return | -| Laptop sleep (>5min) | Session expires server-side; on wake, retry exhaustion → "Connection lost" | -| Backend restart/deploy | `shutting_down` flag warns client; retries handle brief downtime | -| Auth expired (403) | No retry, immediate "refresh page" message | -| Network blip | Backoff retries recover transparently | -| Multiple panes | Independent per-pane state in Worker | -| `pagehide` (tab close) | sendBeacon fires heartbeat as safety net before Worker dies | - -## Verification - -1. `uv run --with pytest pytest tests/test_heartbeat.py -v` — heartbeat tests pass -2. `uv run --with pytest pytest tests/ -v` — all existing tests still pass -3. Manual: open terminal, verify output works at 100ms (Network tab) -4. Manual: background tab 30s → return → session alive, buffered output appears -5. Manual: background tab >5min → return → clean "session expired" message -6. Manual: check Network tab shows `/api/heartbeat` every ~30s when backgrounded