Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,6 @@ jobs:
- name: Smoke test production CLI and MCP
run: pnpm smoke

- name: Verify generated dist is committed
run: git diff --exit-code -- packages/ragmir-core/dist packages/ragmir-tts/dist

- name: Verify npm package metadata
run: pnpm package:check

Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@ private/**
*.tgz
release-artifacts/
*.pid
*dist

packages/ragmir-core/dist/
packages/ragmir-tts/dist/
packages/ragmir-app/dist/
packages/ragmir-app/src-tauri/target/
packages/ragmir-app/src-tauri/gen/
Expand Down
10 changes: 6 additions & 4 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,10 @@
- `packages/ragmir-core/examples/library-api-demo` is the local library-API smoke (`pnpm example`). It
`import`s `@jcode.labs/ragmir` via Node self-referencing so it always exercises the local
`packages/ragmir-core/dist` build, never the npm-published package, and it reuses the
`sovereign-rag-demo` synthetic corpus rather than adding a second one. Testing local changes must
use this local build (or `node packages/ragmir-core/dist/cli.js`), not `npx ragmir`, which would
resolve the released npm version.
`sovereign-rag-demo` synthetic corpus rather than adding a second one. `dist/` is a gitignored build
output: build it first with `pnpm build` (or run `pnpm example`, which builds first), then run
`node packages/ragmir-core/dist/cli.js`. Never use `npx ragmir`, which would resolve the released npm
version.
- Use Context7 before changing dependencies or public APIs that rely on external libraries.
- Run `pnpm validate` before opening a release pull request or publishing. It covers
Biome, dependency security audit, TypeScript, Vitest, build output, production CLI/MCP smoke
Expand Down Expand Up @@ -234,7 +235,8 @@ General principles (KISS, DRY, YAGNI, SOLID) as applied in this codebase. Match
`openRowsTable`, `redactText`, `supportedExtensions`, `recordAccess`); extract instead of copying.
`embedText` delegating to `embedTexts` is the reference pattern.
- No dead or obsolete code. Delete replaced code, unused exports, and commented-out blocks in the
same change; a deletion must cover both source and the regenerated package `dist/`.
same change. `dist/` is gitignored build output: regenerate it locally with `pnpm build` before
running CLI/MCP smoke or the library-API demo, but do not commit it.
- No magic strings or numbers. Name meaningful literals as constants, and put shared paths, provider
defaults, and ignore constants in `packages/ragmir-core/src/defaults.ts` rather than copying them across
modules.
Expand Down
16 changes: 9 additions & 7 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,15 @@ Run only the TTS package tests: `pnpm --filter @jcode.labs/ragmir-tts test`

Tests are colocated as `packages/*/src/*.test.ts` and run on the TypeScript sources.

## Committed `dist/` — critical

`packages/ragmir-core/dist/` and `packages/ragmir-tts/dist/` are checked into Git. CI enforces
`git diff --exit-code -- packages/ragmir-core/dist packages/ragmir-tts/dist`. After any change under
`packages/ragmir-core/src/` or `packages/ragmir-tts/src/`, run `pnpm build` and commit the regenerated
output in the same commit, or CI fails. This is the single easiest mistake to make in this repo.
`packages/ragmir-app/dist/` and `packages/ragmir-landing/dist/` are build artifacts and stay ignored.
## `dist/` is gitignored build output — critical

All `packages/*/dist/` directories (`ragmir-core`, `ragmir-tts`, `ragmir-app`, `ragmir-landing`,
`ragmir-license-webhook`) are gitignored build output and are NOT checked into Git. Build them locally
with `pnpm build` before running the CLI, MCP smoke, the library-API demo, or `pnpm validate`. CI
rebuilds `dist/` from source in the `Build` step before smoke tests, and the release pipeline rebuilds
it again (`scripts/semantic-release-prepare.mjs` runs `pnpm --filter @jcode.labs/ragmir build`) before
`pnpm pack`/`publish`, so the published npm tarball always contains freshly built output. Never commit
`dist/`; a clean clone has none until `pnpm build` runs.

## Naming map (the package has several names on purpose)

Expand Down
15 changes: 9 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -994,8 +994,9 @@ retrieval mode, so they run without downloading an embedding or chat model, and
documents.

> Testing local changes: use the repository's own build, not `npx`. Inside this repo `npx ragmir`
> resolves to the **published** npm package, not your working copy — so it would not exercise your
> local edits. The examples below run the local `dist/` build instead.
> resolves to the **published** npm package, not your working copy—so it would not exercise your
> local edits. Build once with `pnpm build`, then run the examples against the local `dist/` build.
> (`dist/` is gitignored build output, so a clean clone has none until `pnpm build` runs.)

### CLI workspace (`sovereign-rag-demo`)

Expand Down Expand Up @@ -1048,16 +1049,18 @@ pnpm --filter @jcode.labs/ragmir build
pnpm --filter @jcode.labs/ragmir-tts build
```

`packages/ragmir-core/dist/` and `packages/ragmir-tts/dist/` are committed. `packages/ragmir-app/dist/`
and `packages/ragmir-landing/dist/` are ignored build artifacts. After changing TypeScript sources in
published packages, run:
All `packages/*/dist/` directories (`ragmir-core`, `ragmir-tts`, `ragmir-app`, `ragmir-landing`) are
gitignored build output — they are not checked into Git. After changing TypeScript sources in published
packages, build and validate locally:

```bash
pnpm build
pnpm validate
```

CI checks that generated `dist/` files match the source.
CI rebuilds `dist/` from source before smoke tests, and the release pipeline rebuilds it again before
`pnpm pack`/`publish`, so the published tarball always carries fresh output. A clean clone has no
`dist/` until `pnpm build` runs.

The root package is private and only orchestrates workspace tasks. npm publishing is handled by the
protected `Release npm` GitHub Actions workflow on `main`. semantic-release derives the version from
Expand Down
3 changes: 2 additions & 1 deletion packages/ragmir-core/examples/library-api-demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ That builds Ragmir Core, then runs the demo. It reuses the committed synthetic c
`sovereign-rag-demo`, so it needs no private documents and writes only to that example's gitignored
`.ragmir/storage`.

To run it directly against the already-built `dist/` without rebuilding:
To run it directly against the already-built `dist/` without rebuilding (run `pnpm build` first;
`dist/` is gitignored, so a clean clone has none):

```bash
node packages/ragmir-core/examples/library-api-demo/run.mjs
Expand Down
40 changes: 39 additions & 1 deletion packages/ragmir-core/src/access-log.test.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { appendFile, mkdtemp, rm } from "node:fs/promises"
import { appendFile, mkdtemp, rm, stat, writeFile } from "node:fs/promises"
import os from "node:os"
import path from "node:path"
import { afterEach, describe, expect, it } from "vitest"
Expand Down Expand Up @@ -59,3 +59,41 @@ describe("accessLogUsageReport", () => {
)
})
})

describe("recordAccess retention", () => {
it("trims the access log when it exceeds the size cap, keeping the most recent lines", async () => {
const root = await mkdtemp(path.join(os.tmpdir(), "ragmir-access-log-trim-"))
tempDirs.push(root)
await initProject(root)
const config = await loadConfig(root)

// Pre-grow the log past the 10 MB cap with filler lines, then append one
// real event. The retention trim must shrink the file well below the cap
// while keeping the newest event intact.
const fillerLine = '{"action":"ingest","timestamp":"2024-01-01T00:00:00.000Z"}'
const fillerCount = Math.ceil((11 * 1024 * 1024) / (fillerLine.length + 1))
const filler = `${Array(fillerCount).fill(fillerLine).join("\n")}\n`
await writeFile(config.accessLogPath, filler, "utf8")
const sizeBefore = (await stat(config.accessLogPath)).size
expect(sizeBefore).toBeGreaterThan(10 * 1024 * 1024)

await recordAccess(config, { action: "search", resultCount: 1 })

const sizeAfter = (await stat(config.accessLogPath)).size
expect(sizeAfter).toBeLessThan(sizeBefore)
expect(sizeAfter).toBeLessThan(10 * 1024 * 1024)
})

it("does not write a log entry when access logging is disabled", async () => {
const root = await mkdtemp(path.join(os.tmpdir(), "ragmir-access-log-disabled-"))
tempDirs.push(root)
await initProject(root)
const config = await loadConfig(root)
const disabledConfig = { ...config, accessLog: false }

await recordAccess(disabledConfig, { action: "search", resultCount: 1 })

// No log file should exist because the disabled path returns before any append.
await expect(stat(config.accessLogPath)).rejects.toThrow("ENOENT")
})
})
36 changes: 35 additions & 1 deletion packages/ragmir-core/src/access-log.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import { createHash } from "node:crypto"
import { existsSync } from "node:fs"
import { appendFile, mkdir, readFile } from "node:fs/promises"
import { appendFile, mkdir, readFile, stat, writeFile } from "node:fs/promises"
import path from "node:path"
import { loadConfig } from "./config.js"
import type {
Expand Down Expand Up @@ -36,6 +36,14 @@ const ACCESS_LOG_ACTIONS: AccessLogAction[] = [
const ACCESS_LOG_ACTION_SET = new Set<string>(ACCESS_LOG_ACTIONS)
const DEFAULT_USAGE_REPORT_DAYS = 7
const MILLISECONDS_PER_DAY = 24 * 60 * 60 * 1000
/**
* Soft cap above which the access log is trimmed to its most recent lines.
* Keeps the file bounded so long-lived installations (MCP server) do not grow
* it without limit, and so usage reports do not load an unbounded file.
*/
const MAX_ACCESS_LOG_BYTES = 10 * 1024 * 1024
/** Number of most recent lines retained when the log exceeds the byte cap. */
const TRIMMED_ACCESS_LOG_LINES = 50_000

export async function recordAccess(config: Config, event: AccessLogEvent): Promise<void> {
if (!config.accessLog) {
Expand All @@ -44,12 +52,38 @@ export async function recordAccess(config: Config, event: AccessLogEvent): Promi

try {
await mkdir(path.dirname(config.accessLogPath), { recursive: true })
await trimAccessLogIfNeeded(config.accessLogPath)
await appendFile(config.accessLogPath, `${JSON.stringify(toLogLine(event))}\n`, "utf8")
} catch {
// Access logging is best-effort so read-only workspaces do not block local use.
}
}

async function trimAccessLogIfNeeded(accessLogPath: string): Promise<void> {
let size = 0
try {
size = (await stat(accessLogPath)).size
} catch (error) {
if (isNodeError(error) && error.code === "ENOENT") {
return
}
throw error
}

if (size <= MAX_ACCESS_LOG_BYTES) {
return
}

const content = await readFile(accessLogPath, "utf8")
const lines = content.split("\n").filter((line) => line.length > 0)
const kept = lines.slice(Math.max(0, lines.length - TRIMMED_ACCESS_LOG_LINES))
await writeFile(accessLogPath, `${kept.join("\n")}\n`, "utf8")
}

function isNodeError(error: unknown): error is NodeJS.ErrnoException {
return error instanceof Error && "code" in error
}

export async function accessLogUsageReport(
options: AccessLogUsageOptions = {},
): Promise<AccessLogUsageReport> {
Expand Down
121 changes: 121 additions & 0 deletions packages/ragmir-core/src/cli-options.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
import { isTtsLanguage, TTS_LANGUAGES, type TtsLanguage } from "@jcode.labs/ragmir-tts"
import type { AgentInstallMode, AgentInstallScope } from "./skill.js"

/**
* Pure option-parsing and validation helpers for the Ragmir CLI. Kept separate
* from `cli.ts` (which wires Commander and side effects) so they can be unit
* tested without spawning a process or importing commander.
*
* Each helper either returns the validated value or throws an Error with a
* user-facing message; the CLI surfaces the message in red on stderr.
*/

export type AudioEngine = "auto" | "edge" | "transformers"

export interface AudioOptions {
out?: string
engine?: string
lang?: string
offline?: boolean
allowRemoteModels?: boolean
}

/** Parse and validate a positive integer CLI argument. */
export function parsePositiveInt(value: string): number {
const parsed = Number(value)
// Use Number() (not parseInt) so fractional input like "1.5" is rejected
// instead of silently truncating to 1, and so non-numeric strings become NaN.
if (!Number.isInteger(parsed) || parsed <= 0) {
throw new Error("Expected a positive integer.")
}
return parsed
}

/** Parse and validate a finite number CLI argument. */
export function parseNumber(value: string): number {
const parsed = Number.parseFloat(value)
if (!Number.isFinite(parsed)) {
throw new Error("Expected a number.")
}
return parsed
}

/** Parse and validate a recall threshold CLI argument in the inclusive range 0..1. */
export function parseRecallThreshold(value: string): number {
const trimmed = value.trim()
const parsed = Number(trimmed)
if (trimmed.length === 0 || !Number.isFinite(parsed) || parsed < 0 || parsed > 1) {
throw new Error("Expected a recall threshold between 0 and 1.")
}
return parsed
}

/**
* Resolve the `allowRemoteModels` flag from audio options. Offline mode forces
* remote model loading off; an explicit opt-in enables it; otherwise undefined
* lets the TTS package apply its own offline-by-default behaviour.
*/
export function audioAllowRemoteModels(options: AudioOptions): boolean | undefined {
if (options.offline) {
return false
}
if (options.allowRemoteModels) {
return true
}
return undefined
}

/**
* Resolve the spoken language from `--lang`. Throws on an unsupported value so
* the operator is told which languages are available.
*/
export function audioLanguage(options: AudioOptions): TtsLanguage | undefined {
if (options.lang === undefined) {
return undefined
}
if (isTtsLanguage(options.lang)) {
return options.lang
}
throw new Error(`Expected --lang to be one of: ${TTS_LANGUAGES.join(", ")}.`)
}

/**
* Resolve the TTS engine from audio options. Offline always forces the local
* Transformers.js renderer. An MP3 output without an explicit `--engine edge`
* is rejected as a confidentiality guard: MP3 requires the online Edge TTS
* service, so the operator must opt in knowingly rather than leak narration
* text by accident.
*/
export function audioEngine(options: AudioOptions): AudioEngine {
if (options.offline) {
return "transformers"
}
if (options.engine === undefined) {
if (options.out?.toLowerCase().endsWith(".mp3")) {
throw new Error(
"MP3 output uses online Edge TTS. Re-run with `--engine edge` only when sending narration text to Edge TTS is acceptable.",
)
}
return "transformers"
}
if (options.engine === "auto" || options.engine === "edge" || options.engine === "transformers") {
return options.engine
}
throw new Error("Expected --engine to be auto, edge, or transformers.")
}

/** Parse and validate the `--scope` agent-install argument. */
export function parseAgentInstallScope(value: string | undefined): AgentInstallScope {
if (value === "project" || value === "user") {
return value
}
throw new Error("Expected --scope to be project or user.")
}

/** Parse and validate the `--mode` agent-install argument. */
export function parseAgentInstallMode(value: string | undefined): AgentInstallMode {
if (value === "link" || value === "copy") {
return value
}
throw new Error("Expected --mode to be link or copy.")
}
Loading