Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
216 changes: 216 additions & 0 deletions .agents/workflows/review-github-pr.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
---
description: Review a GitHub Issue or PR for SharpAI/SwiftLM β€” fetch, analyze, implement fixes, address review comments, and push back to the correct branch
---

# Review GitHub Issue / PR

This workflow guides end-to-end handling of a GitHub Issue or Pull Request for the
`SharpAI/SwiftLM` repository: from fetching context, through implementing or
reviewing code changes, to pushing a clean commit back to the correct fork branch.

---

## Prerequisites

- `gh` CLI path on macOS: **`/opt/homebrew/bin/gh`**
```bash
export PATH="/opt/homebrew/bin:$PATH"
which gh # β†’ /opt/homebrew/bin/gh
```
- `gh` must be authenticated (`gh auth status`)
- Working directory: `/Users/simba/workspace/mlx-server`
Comment on lines +15 to +21
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow hard-codes machine-specific paths (e.g. /opt/homebrew/bin/gh, /Users/simba/workspace/mlx-server). That makes the workflow non-portable for other contributors and CI environments. Consider replacing these with placeholders and/or environment-variable based guidance (e.g. "ensure gh is on PATH" and "run from your local repo checkout").

Suggested change
- `gh` CLI path on macOS: **`/opt/homebrew/bin/gh`**
```bash
export PATH="/opt/homebrew/bin:$PATH"
which gh # β†’ /opt/homebrew/bin/gh
```
- `gh` must be authenticated (`gh auth status`)
- Working directory: `/Users/simba/workspace/mlx-server`
- Ensure the `gh` CLI is installed and available on your `PATH`
```bash
which gh
gh --version
  • gh must be authenticated (gh auth status)
  • Run this workflow from your local checkout of the SharpAI/SwiftLM repository

Copilot uses AI. Check for mistakes.
- Remote `fork` may need to be added if pushing to a contributor's fork:
```bash
git remote add fork https://github.com/<contributor>/SwiftLM.git
```

---

## Steps

### 1. Fetch the Issue or PR

Determine whether the user supplied an **Issue number** or a **PR number**, then
pull the full context using `gh`:

```bash
# For a PR
gh pr view <NUMBER> --repo SharpAI/SwiftLM \
--json number,title,body,state,baseRefName,headRefName,headRepository,commits,files

# For an Issue
gh issue view <NUMBER> --repo SharpAI/SwiftLM \
--json number,title,body,state,labels,comments
```

Note the **`headRepository`** field β€” if it is not `SharpAI/SwiftLM`, the PR comes
from a fork. You must push back to the fork's branch (see Step 6).

---

### 2. Understand the Scope

Read the PR/Issue body and associated comments carefully. Identify:

- **Category** β€” bug fix, feature, test improvement, CI/CD, documentation.
- **Files touched** β€” run `gh pr diff <NUMBER> --repo SharpAI/SwiftLM` or read
the `files` field.
- **CI status** β€” check the latest run:
```bash
gh run list --repo SharpAI/SwiftLM --branch <headRefName> --limit 3
```
- **Review comments** β€” if Copilot or a human left inline review comments, read
them all before writing a single line of code:
```bash
gh pr view <NUMBER> --repo SharpAI/SwiftLM --comments
```

---

### 3. Check Out the Branch Locally

```bash
# If the PR is from SharpAI directly
git fetch origin
git checkout <headRefName>

# If the PR is from a fork
git remote add fork https://github.com/<forkOwner>/SwiftLM.git # once only
git fetch fork <headRefName>
git checkout -b <headRefName> fork/<headRefName>
```

Verify you are on the correct branch:
```bash
git status
git log --oneline -5
```

---

### 4. Triage Review Comments (for PRs)

For each Copilot or human review comment:

1. **Classify** the severity:
- πŸ”΄ **Must fix** β€” correctness bugs, resource leaks, race conditions, broken CI.
- 🟑 **Should fix** β€” test coverage gaps, false-pass logic, missing imports.
- 🟒 **Optional** β€” style, wording, architecture refactors beyond the PR scope.

2. **Implement** all πŸ”΄ and 🟑 items. For 🟒 items, document them as follow-up
work in a code comment or GitHub comment but do not expand the PR scope.

3. **Key patterns learned from SwiftLM history**:
- Shell scripts use `set -euo pipefail` β€” every `grep`, `jq`, or pipeline that
may produce no output **must** be guarded with `|| true` or placed inside an
`if` condition to prevent silent script abort.
- Heartbeat / background `Task` objects in Swift **must** be cancelled via
`defer { task?.cancel() }` so all exit paths (including client disconnect)
are covered β€” not just the happy path.
- CORS-related shell tests must target the dedicated `--cors` server instance,
not the main server started without the flag.
- Concurrent-request tests must use `--parallel N` (N β‰₯ 2) to actually exercise
parallel code paths.
- When adding new Swift test files that use `Data` / `JSONSerialization`,
always add `import Foundation` β€” XCTest does not re-export it in all SPM environments.

---

### 5. Verify Locally

Build and run the relevant test suite before pushing:

```bash
# Swift unit tests
swift test --filter SwiftLMTests

# Integration tests (server)
./tests/test-server.sh .build/release/SwiftLM 15413

# OpenCode / SDK compatibility test
./tests/test-opencode.sh .build/release/SwiftLM 15414
```

If CI previously failed with a specific test number, reproduce it locally first:
```bash
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|Test [0-9]+"
```

---

### 6. Commit and Push to the Correct Remote

> [!IMPORTANT]
> Always push to the **fork's branch** when updating a fork-originated PR.
> Pushing to `origin` (SharpAI) creates a new branch and does NOT update the PR.

```bash
git add <files>
git commit -m "<type>(<scope>): <summary>

<body: what changed and why>"

# PR from a fork β†’ push to fork
git push fork <headRefName>:<headRefName>

# PR from SharpAI directly β†’ push to origin
git push origin <headRefName>
```

Verify the PR was updated:
```bash
gh pr view <NUMBER> --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'
```

---

### 7. Monitor CI

After pushing, monitor the triggered workflow:

```bash
# List recent runs on the branch
gh run list --repo SharpAI/SwiftLM --branch <headRefName> --limit 5

# Stream logs for the latest run
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log

# Pull only failed steps
gh run view <RUN_ID> --repo SharpAI/SwiftLM --log-failed 2>&1 | grep -E "FAIL|error|exit code"
```

If tests fail, go back to Step 4. Iterate until CI is green.

---

### 8. Respond to Reviewers (Optional)

If a human or Copilot reviewer left inline comments that you have addressed,
leave a reply comment summarising what was changed and why each item was handled
(or deferred):

```bash
gh pr comment <NUMBER> --repo SharpAI/SwiftLM \
--body "Addressed all πŸ”΄/🟑 review comments in commit <SHA>:
- heartbeat leak: added defer cleanup in both streaming handlers
- import Foundation: added to ServerSSETests.swift
- CORS test: redirected to CORS_PORT server
- parallel test: dedicated --parallel 2 server on PORT+3
- set -e trap: guarded grep/jq pipelines with || true"
```

---

## Quick Reference

| Task | Command |
|------|---------|
| View PR | `gh pr view <N> --repo SharpAI/SwiftLM` |
| View PR diff | `gh pr diff <N> --repo SharpAI/SwiftLM` |
| View PR comments | `gh pr view <N> --repo SharpAI/SwiftLM --comments` |
| View Issue | `gh issue view <N> --repo SharpAI/SwiftLM` |
| List CI runs | `gh run list --repo SharpAI/SwiftLM --branch <branch>` |
| Failed CI logs | `gh run view <ID> --repo SharpAI/SwiftLM --log-failed` |
| Push to fork | `git push fork <branch>:<branch>` |
| Push to SharpAI | `git push origin <branch>` |
| Verify PR commits | `gh pr view <N> --repo SharpAI/SwiftLM --json commits --jq '.commits[].messageHeadline'` |
63 changes: 61 additions & 2 deletions Sources/SwiftLM/Server.swift
Original file line number Diff line number Diff line change
Expand Up @@ -301,6 +301,22 @@ struct MLXServer: AsyncParsableCommand {
// Resolve model directory for profiling (checks HuggingFace cache)
let modelDirectory = resolveModelDirectory(modelId: modelId)

// ── Fix #72: Compute draft model footprint ONCE (Copilot review) ──────
// Resolved before the streamExperts block so the exact byte count can be
// reused for the early cap, both strategy branches, and logging without
// repeating the filesystem walk. Use weightFileSizeBytes (exact bytes)
// instead of weightMemoryGB * 1_073_741_824 to avoid the ~7% GiB/GB
// mismatch flagged in Copilot review (weightMemoryGB = bytes / 1e9, not /2^30).
let draftFootprintBytes: Int
if self.streamExperts,
let draftPath = self.draftModel,
let draftDir = resolveModelDirectory(modelId: draftPath),
let draftProfile = ModelProfiler.profile(modelDirectory: draftDir, modelId: draftPath) {
draftFootprintBytes = draftProfile.weightFileSizeBytes
} else {
draftFootprintBytes = 0
}

if self.streamExperts, let modelDir = modelDirectory {
setenv("EXPERIMENTAL_SSD_STREAM", modelDir.path, 1)
// Activate the modern Swift ExpertStreamingConfig so Load.swift can:
Expand All @@ -314,6 +330,24 @@ struct MLXServer: AsyncParsableCommand {
// Cap Metal command buffer size to avoid the 5s Apple GPU Watchdog.
setenv("MLX_MAX_OPS_PER_BUFFER", "50", 1)
print("[SwiftLM] Enabled Async SSD Streaming on directory: \(modelDir.lastPathComponent)")

// ── Fix #72: Apply SSD memory cap EARLY (before any model loads) ──
// Both the main model and draft model must load under the budget.
// The sentinel memoryLimit bypasses MLX eval_impl's spin-wait loop.
// Also address Copilot comment: apply the cap even when modelDirectory
// is nil (first-run download) so downloads also respect the budget.
let system = ModelProfiler.systemProfile()
if draftFootprintBytes > 0 {
print("[SwiftLM] πŸ“¦ Draft model footprint: \(String(format: "%.2f", Double(draftFootprintBytes) / 1e9))GB reserved from SSD budget")
}
Memory.cacheLimit = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200 GB sentinel
} else if self.streamExperts {
// modelDirectory is nil β€” model not yet downloaded (first-run).
// Still apply the SSD memory cap so the download itself is bounded.
let system = ModelProfiler.systemProfile()
Memory.cacheLimit = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200 GB sentinel
}

var partitionPlan: PartitionPlan?
Expand All @@ -338,7 +372,8 @@ struct MLXServer: AsyncParsableCommand {
if self.streamExperts {
// SSD Streaming: expert weights are mmap'd from SSD via the OS page cache.
// No swap involved β€” the page cache evicts stale expert pages cleanly.
let physicalBudget = Int(Double(system.totalRAMBytes) * 0.85) - (4 * 1024 * 1024 * 1024)
// draftFootprintBytes pre-computed once above (Copilot review).
let physicalBudget = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
Memory.cacheLimit = physicalBudget
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200GB sentinel to bypass MLX eval_impl spin loop
print("[SwiftLM] πŸ’Ύ Memory strategy: SSD STREAMING (page-cache managed, \(physicalBudget / (1024*1024*1024))GB RAM budget, no swap)")
Expand All @@ -349,7 +384,8 @@ struct MLXServer: AsyncParsableCommand {
}
case .layerPartitioned:
if self.streamExperts {
let physicalBudget = Int(Double(system.totalRAMBytes) * 0.85) - (4 * 1024 * 1024 * 1024)
// draftFootprintBytes pre-computed once above (Copilot review).
let physicalBudget = computeSSDMemoryBudget(totalRAMBytes: system.totalRAMBytes, draftWeightBytes: draftFootprintBytes)
Memory.cacheLimit = physicalBudget
Memory.memoryLimit = 200 * 1024 * 1024 * 1024 // 200GB sentinel to bypass MLX eval_impl spin loop
print("[SwiftLM] πŸ’Ύ Memory strategy: SSD STREAMING (page-cache managed, \(physicalBudget / (1024*1024*1024))GB RAM budget, no swap)")
Expand Down Expand Up @@ -476,6 +512,11 @@ struct MLXServer: AsyncParsableCommand {
} else {
draftConfig = ModelConfiguration(id: draftModelPath)
}
// Fix #72: mirror lazyLoad so the draft model's weights are mmap'd
// (not eagerly paged into unified RAM) when SSD streaming is active.
if self.streamExperts {
draftConfig.lazyLoad = true
}
let draftDownloader = HubDownloader(hub: HubApi(downloadBase: cacheRoot))
let draftContainer = try await LLMModelFactory.shared.loadContainer(
from: draftDownloader,
Expand Down Expand Up @@ -833,6 +874,24 @@ struct ServerConfig: Sendable {
let turboKV: Bool
}

// ── SSD Memory Budget ────────────────────────────────────────────────────────

/// Compute the page-cache budget (bytes) for SSD streaming mode.
///
/// Formula: `totalRAM Γ— 0.85 βˆ’ osHeadroom βˆ’ draftWeightBytes`, floored at 2 GB.
///
/// - Parameters:
/// - totalRAMBytes: Physical RAM reported by the OS (e.g. `system.totalRAMBytes`).
/// - draftWeightBytes: Weight size (bytes) of the draft model, or 0 if none.
/// Subtracted so the draft model's resident pages don't push the main model's
/// page cache over the physical limit and trigger swap (Issue #72).
/// - Returns: The recommended `Memory.cacheLimit` value in bytes.
func computeSSDMemoryBudget(totalRAMBytes: UInt64, draftWeightBytes: Int = 0) -> Int {
let osHeadroom = 4 * 1024 * 1024 * 1024 // 4 GB for OS + system processes
let raw = Int(Double(totalRAMBytes) * 0.85) - osHeadroom - draftWeightBytes
return max(raw, 2 * 1024 * 1024 * 1024) // floor at 2 GB
}

// ── Model Directory Resolution ───────────────────────────────────────────────

/// Resolve a model ID to its local directory (if already downloaded).
Expand Down
Loading