diff --git a/.agents/workflows/review-github-pr.md b/.agents/workflows/review-github-pr.md new file mode 100644 index 00000000..3a874535 --- /dev/null +++ b/.agents/workflows/review-github-pr.md @@ -0,0 +1,216 @@ +--- +description: Review a GitHub Issue or PR for SharpAI/SwiftLM — fetch, analyze, implement fixes, address review comments, and push back to the correct branch +--- + +# Review GitHub Issue / PR + +This workflow guides end-to-end handling of a GitHub Issue or Pull Request for the +`SharpAI/SwiftLM` repository: from fetching context, through implementing or +reviewing code changes, to pushing a clean commit back to the correct fork branch. + +--- + +## Prerequisites + +- `gh` CLI path on macOS: **`/opt/homebrew/bin/gh`** + ```bash + export PATH="/opt/homebrew/bin:$PATH" + which gh # → /opt/homebrew/bin/gh + ``` +- `gh` must be authenticated (`gh auth status`) +- Working directory: `/Users/simba/workspace/mlx-server` +- Remote `fork` may need to be added if pushing to a contributor's fork: + ```bash + git remote add fork https://github.com//SwiftLM.git + ``` + +--- + +## Steps + +### 1. Fetch the Issue or PR + +Determine whether the user supplied an **Issue number** or a **PR number**, then +pull the full context using `gh`: + +```bash +# For a PR +gh pr view --repo SharpAI/SwiftLM \ + --json number,title,body,state,baseRefName,headRefName,headRepository,commits,files + +# For an Issue +gh issue view --repo SharpAI/SwiftLM \ + --json number,title,body,state,labels,comments +``` + +Note the **`headRepository`** field — if it is not `SharpAI/SwiftLM`, the PR comes +from a fork. You must push back to the fork's branch (see Step 6). + +--- + +### 2. Understand the Scope + +Read the PR/Issue body and associated comments carefully. Identify: + +- **Category** — bug fix, feature, test improvement, CI/CD, documentation. +- **Files touched** — run `gh pr diff --repo SharpAI/SwiftLM` or read + the `files` field. +- **CI status** — check the latest run: + ```bash + gh run list --repo SharpAI/SwiftLM --branch --limit 3 + ``` +- **Review comments** — if Copilot or a human left inline review comments, read + them all before writing a single line of code: + ```bash + gh pr view --repo SharpAI/SwiftLM --comments + ``` + +--- + +### 3. Check Out the Branch Locally + +```bash +# If the PR is from SharpAI directly +git fetch origin +git checkout + +# If the PR is from a fork +git remote add fork https://github.com//SwiftLM.git # once only +git fetch fork +git checkout -b fork/ +``` + +Verify you are on the correct branch: +```bash +git status +git log --oneline -5 +``` + +--- + +### 4. Triage Review Comments (for PRs) + +For each Copilot or human review comment: + +1. **Classify** the severity: + - 🔴 **Must fix** — correctness bugs, resource leaks, race conditions, broken CI. + - 🟡 **Should fix** — test coverage gaps, false-pass logic, missing imports. + - 🟢 **Optional** — style, wording, architecture refactors beyond the PR scope. + +2. **Implement** all 🔴 and 🟡 items. For 🟢 items, document them as follow-up + work in a code comment or GitHub comment but do not expand the PR scope. + +3. **Key patterns learned from SwiftLM history**: + - Shell scripts use `set -euo pipefail` — every `grep`, `jq`, or pipeline that + may produce no output **must** be guarded with `|| true` or placed inside an + `if` condition to prevent silent script abort. + - Heartbeat / background `Task` objects in Swift **must** be cancelled via + `defer { task?.cancel() }` so all exit paths (including client disconnect) + are covered — not just the happy path. + - CORS-related shell tests must target the dedicated `--cors` server instance, + not the main server started without the flag. + - Concurrent-request tests must use `--parallel N` (N ≥ 2) to actually exercise + parallel code paths. + - When adding new Swift test files that use `Data` / `JSONSerialization`, + always add `import Foundation` — XCTest does not re-export it in all SPM environments. + +--- + +### 5. Verify Locally + +Build and run the relevant test suite before pushing: + +```bash +# Swift unit tests +swift test --filter SwiftLMTests + +# Integration tests (server) +./tests/test-server.sh .build/release/SwiftLM 15413 + +# OpenCode / SDK compatibility test +./tests/test-opencode.sh .build/release/SwiftLM 15414 +``` + +If CI previously failed with a specific test number, reproduce it locally first: +```bash +gh run view --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|Test [0-9]+" +``` + +--- + +### 6. Commit and Push to the Correct Remote + +> [!IMPORTANT] +> Always push to the **fork's branch** when updating a fork-originated PR. +> Pushing to `origin` (SharpAI) creates a new branch and does NOT update the PR. + +```bash +git add +git commit -m "(): + +" + +# PR from a fork → push to fork +git push fork : + +# PR from SharpAI directly → push to origin +git push origin +``` + +Verify the PR was updated: +```bash +gh pr view --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline' +``` + +--- + +### 7. Monitor CI + +After pushing, monitor the triggered workflow: + +```bash +# List recent runs on the branch +gh run list --repo SharpAI/SwiftLM --branch --limit 5 + +# Stream logs for the latest run +gh run view --repo SharpAI/SwiftLM --log + +# Pull only failed steps +gh run view --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|exit code" +``` + +If tests fail, go back to Step 4. Iterate until CI is green. + +--- + +### 8. Respond to Reviewers (Optional) + +If a human or Copilot reviewer left inline comments that you have addressed, +leave a reply comment summarising what was changed and why each item was handled +(or deferred): + +```bash +gh pr comment --repo SharpAI/SwiftLM \ + --body "Addressed all 🔴/🟡 review comments in commit : +- heartbeat leak: added defer cleanup in both streaming handlers +- import Foundation: added to ServerSSETests.swift +- CORS test: redirected to CORS_PORT server +- parallel test: dedicated --parallel 2 server on PORT+3 +- set -e trap: guarded grep/jq pipelines with || true" +``` + +--- + +## Quick Reference + +| Task | Command | +|------|---------| +| View PR | `gh pr view --repo SharpAI/SwiftLM` | +| View PR diff | `gh pr diff --repo SharpAI/SwiftLM` | +| View PR comments | `gh pr view --repo SharpAI/SwiftLM --comments` | +| View Issue | `gh issue view --repo SharpAI/SwiftLM` | +| List CI runs | `gh run list --repo SharpAI/SwiftLM --branch ` | +| Failed CI logs | `gh run view --repo SharpAI/SwiftLM --log-failed` | +| Push to fork | `git push fork :` | +| Push to SharpAI | `git push origin ` | +| Verify PR commits | `gh pr view --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'` | diff --git a/Sources/SwiftLM/Server.swift b/Sources/SwiftLM/Server.swift index 00c9c850..07e1bea3 100644 --- a/Sources/SwiftLM/Server.swift +++ b/Sources/SwiftLM/Server.swift @@ -301,6 +301,22 @@ struct MLXServer: AsyncParsableCommand { // Resolve model directory for profiling (checks HuggingFace cache) let modelDirectory = resolveModelDirectory(modelId: modelId) + // ── Fix #72: Compute draft model footprint ONCE (Copilot review) ────── + // Resolved before the streamExperts block so the exact byte count can be + // reused for the early cap, both strategy branches, and logging without + // repeating the filesystem walk. Use weightFileSizeBytes (exact bytes) + // instead of weightMemoryGB * 1_073_741_824 to avoid the ~7% GiB/GB + // mismatch flagged in Copilot review (weightMemoryGB = bytes / 1e9, not /2^30). + let draftFootprintBytes: Int + if self.streamExperts, + let draftPath = self.draftModel, + let draftDir = resolveModelDirectory(modelId: draftPath), + let draftProfile = ModelProfiler.profile(modelDirectory: draftDir, modelId: draftPath) { + draftFootprintBytes = draftProfile.weightFileSizeBytes + } else { + draftFootprintBytes = 0 + } + if self.streamExperts, let modelDir = modelDirectory { setenv("EXPERIMENTAL_SSD_STREAM", modelDir.path, 1) // Activate the modern Swift ExpertStreamingConfig so Load.swift can: @@ -314,6 +330,24 @@ struct MLXServer: AsyncParsableCommand { // Cap Metal command buffer size to avoid the 5s Apple GPU Watchdog. setenv("MLX_MAX_OPS_PER_BUFFER", "50", 1) print("[SwiftLM] Enabled Async SSD Streaming on directory: \(modelDir.lastPathComponent)") + + // ── Fix #72: Apply SSD memory cap EARLY (before any model loads) ── + // Both the main model and draft model must load under the budget. + // The sentinel memoryLimit bypasses MLX eval_impl's spin-wait loop. + // Also address Copilot comment: apply the cap even when modelDirectory + // is nil (first-run download) so downloads also respect the budget. + let system = ModelProfiler.systemProfile() + if draftFootprintBytes > 0 { + print("[SwiftLM] 📦 Draft model footprint: \(String(format: "%.2f", Double(draftFootprintBytes) / 1e9))GB reserved from SSD budget") + } + Memory.cacheLimit = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes) + Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200 GB sentinel + } else if self.streamExperts { + // modelDirectory is nil — model not yet downloaded (first-run). + // Still apply the SSD memory cap so the download itself is bounded. + let system = ModelProfiler.systemProfile() + Memory.cacheLimit = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes) + Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200 GB sentinel } var partitionPlan: PartitionPlan? @@ -338,7 +372,8 @@ struct MLXServer: AsyncParsableCommand { if self.streamExperts { // SSD Streaming: expert weights are mmap'd from SSD via the OS page cache. // No swap involved — the page cache evicts stale expert pages cleanly. - let physicalBudget = Int(Double(system.totalRAMBytes) * 0.85) - (4 * 1024 * 1024 * 1024) + // draftFootprintBytes pre-computed once above (Copilot review). + let physicalBudget = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes) Memory.cacheLimit = physicalBudget Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200GB sentinel to bypass MLX eval_impl spin loop print("[SwiftLM] 💾 Memory strategy: SSD STREAMING (page-cache managed, \(physicalBudget / (1024*1024*1024))GB RAM budget, no swap)") @@ -349,7 +384,8 @@ struct MLXServer: AsyncParsableCommand { } case .layerPartitioned: if self.streamExperts { - let physicalBudget = Int(Double(system.totalRAMBytes) * 0.85) - (4 * 1024 * 1024 * 1024) + // draftFootprintBytes pre-computed once above (Copilot review). + let physicalBudget = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes) Memory.cacheLimit = physicalBudget Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200GB sentinel to bypass MLX eval_impl spin loop print("[SwiftLM] 💾 Memory strategy: SSD STREAMING (page-cache managed, \(physicalBudget / (1024*1024*1024))GB RAM budget, no swap)") @@ -476,6 +512,11 @@ struct MLXServer: AsyncParsableCommand { } else { draftConfig = ModelConfiguration(id: draftModelPath) } + // Fix #72: mirror lazyLoad so the draft model's weights are mmap'd + // (not eagerly paged into unified RAM) when SSD streaming is active. + if self.streamExperts { + draftConfig.lazyLoad = true + } let draftDownloader = HubDownloader(hub: HubApi(downloadBase: cacheRoot)) let draftContainer = try await LLMModelFactory.shared.loadContainer( from: draftDownloader, @@ -833,6 +874,24 @@ struct ServerConfig: Sendable { let turboKV: Bool } +// ── SSD Memory Budget ──────────────────────────────────────────────────────── + +/// Compute the page-cache budget (bytes) for SSD streaming mode. +/// +/// Formula: `totalRAM × 0.85 − osHeadroom − draftWeightBytes`, floored at 2 GB. +/// +/// - Parameters: +/// - totalRAMBytes: Physical RAM reported by the OS (e.g. `system.totalRAMBytes`). +/// - draftWeightBytes: Weight size (bytes) of the draft model, or 0 if none. +/// Subtracted so the draft model's resident pages don't push the main model's +/// page cache over the physical limit and trigger swap (Issue #72). +/// - Returns: The recommended `Memory.cacheLimit` value in bytes. +func computeSSDMemoryBudget(totalRAMBytes: UInt64, draftWeightBytes: Int = 0) -> Int { + let osHeadroom = 4 * 1024 * 1024 * 1024 // 4 GB for OS + system processes + let raw = Int(Double(totalRAMBytes) * 0.85) - osHeadroom - draftWeightBytes + return max(raw, 2 * 1024 * 1024 * 1024) // floor at 2 GB +} + // ── Model Directory Resolution ─────────────────────────────────────────────── /// Resolve a model ID to its local directory (if already downloaded).