Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,9 +32,9 @@
- Keep first-run UX centered on `mimir setup` for full onboarding and `mimir doctor --fix` for safe
repairs. `mimir init`, `mimir install-skill`, and `mimir ingest` remain available as explicit
lower-level commands.
- Keep monorepo source onboarding simple: `.mimir/sources.txt` accepts paths, glob patterns, and
`!` exclusions, and `mimir sources add/list` is the CLI surface for updating it without manual
editing.
- Keep monorepo source onboarding simple: the `sources` array in `.mimir/config.json` accepts paths,
glob patterns, and `!` exclusions. The legacy `.mimir/sources.txt` file (managed by `mimir sources
add/list`) is still read and merged when present, but `mimir init` no longer creates it.
- Keep product documentation canonical in the root `README.md`. Package README files under
`packages/*/README.md` are intentionally minimal npm entrypoints and must link clearly to the
GitHub root README because npm displays package README files separately.
Expand All @@ -53,6 +53,11 @@
under real Mimir domains, private documents, generated `.pid` files, committed secrets, internal
GTM/pricing ledgers, or wording that presents tracked MIT source as proprietary or closed source.
`pnpm public:smoke` enforces the cheap checks.
- The public-surface secret scanner (`scripts/public-surface-smoke.mjs`) runs over every tracked
file, tests included. Never write literal secret-shaped strings in source — PEM `PRIVATE KEY`
headers, `ghp_`/`github_pat_`/`sk_live_`/`sk_test_` tokens, or real checkout URLs. When a test
needs one to exercise redaction or skipping, build it at runtime from parts (e.g. interpolate the
`PRIVATE KEY` label from a variable) so no scannable literal is committed.
- Root `llms.txt` (the [llms.txt](https://llmstxt.org/) convention) and `context7.json` are the
LLM/Context7-facing doc index for this repository. Update `llms.txt` when adding or removing a
top-level `docs/*.md` file worth surfacing to agents, and keep `context7.json`'s
Expand Down
94 changes: 80 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -353,8 +353,7 @@ index rebuild when supported files are present and the privacy posture has no wa
Manual initialization is still available:

```plain text
.mimir/config.json # local config
.mimir/sources.txt # optional extra source paths
.mimir/config.json # local config (add extra paths to the "sources" array)
.mimir/raw/ # raw documents to ingest
.gitignore # ignores .mimir/
```
Expand All @@ -368,22 +367,28 @@ Put supported files under `.mimir/raw/`:
requirements.docx
```

For monorepos or downloaded local folders, list extra paths or glob patterns in `.mimir/sources.txt`.
Relative entries resolve from the Mimir project root, and `!` excludes matched files:
For monorepos or downloaded local folders, add extra paths or glob patterns to the `sources` array in
`.mimir/config.json`. Relative entries resolve from the Mimir project root, and `!` excludes matched files:

```json
{
"sources": [
"../apps/*/README.md",
"../apps/*/docs/**/*.{md,mdx}",
"../packages/*/architecture/**/*.md",
"!../apps/**/node_modules/**"
]
}
```

The legacy `.mimir/sources.txt` file (one entry per line) is still read when present and can be managed
from the CLI:

```bash
npx mimir sources add "../apps/*/README.md" "../apps/*/docs/**/*.{md,mdx}"
npx mimir sources add "!../apps/**/node_modules/**"
npx mimir sources list
```

```plain text
../apps/*/README.md
../apps/*/docs/**/*.{md,mdx}
../packages/*/architecture/**/*.md
!../apps/**/node_modules/**
```

### Team Workflow With A Shared Private Corpus

For a team of 10 developers, keep Git as the reproducible setup layer and keep the corpus in an
Expand Down Expand Up @@ -632,6 +637,7 @@ preload Transformers.js-compatible model files with non-sensitive text, then ren
npx mimir audio /tmp/MIMIR-SUMMARY-project.txt \
--engine transformers \
--offline \
--lang fr \
--model-path .mimir/models/tts \
--out .mimir/audio/project-summary.wav
```
Expand All @@ -645,8 +651,10 @@ npx mimir-tts render /tmp/MIMIR-SUMMARY-project.txt \
--out .mimir/audio/project-summary.mp3
```

The default standalone engine is `transformers`. The default Transformers.js model is
`Xenova/mms-tts-fra`. Override it with `--model` or `MIMIR_TTS_MODEL`.
The default standalone engine is `transformers` and the default language is `fr`. Pass
`--lang en|es|fr` (or `MIMIR_TTS_LANG`) to switch language: it selects the matching self-contained
offline model (`Xenova/mms-tts-eng`, `Xenova/mms-tts-spa`, or `Xenova/mms-tts-fra`) and, on the Edge
path, a native neural voice. Override the model directly with `--model` or `MIMIR_TTS_MODEL`.

See [`docs/offline-tts-preload.md`](./docs/offline-tts-preload.md) for the exact preload and
offline-check workflow.
Expand Down Expand Up @@ -787,6 +795,7 @@ Default `.mimir/config.json` for a fresh project:
"rawDir": ".mimir/raw",
"storageDir": ".mimir/storage",
"sourcesFile": ".mimir/sources.txt",
"sources": [],
"accessLogPath": ".mimir/access.log",
"embeddingModelPath": ".mimir/models",
"tableName": "chunks",
Expand Down Expand Up @@ -816,6 +825,63 @@ Default `.mimir/config.json` for a fresh project:
}
```

Every field, its default, and what it controls:

| Field | Default | Purpose |
| --- | --- | --- |
| `rawDir` | `.mimir/raw` | Local corpus folder, indexed recursively. The primary place to drop documents. |
| `sources` | `[]` | Extra file, directory, and glob paths (plus `!` exclusions) to index, resolved from the project root. See below. |
| `sourcesFile` | `.mimir/sources.txt` | Legacy one-path-per-line file; still read and merged with `sources` when present. |
| `storageDir` | `.mimir/storage` | LanceDB vector store location. |
| `accessLogPath` | `.mimir/access.log` | Query access log (stores hashes/metadata only). |
| `embeddingModelPath` | `.mimir/models` | Local cache for the Transformers.js embedding model. |
| `tableName` | `chunks` | LanceDB table name. |
| `embeddingProvider` | `local-hash` | `local-hash` (offline lexical, not semantic) or `transformers` (semantic). Switching requires `mimir ingest --rebuild`. |
| `embeddingModel` | `mixedbread-ai/mxbai-embed-xsmall-v1` | Model used when `embeddingProvider` is `transformers`. |
| `transformersAllowRemoteModels` | `false` | Allow downloading the embedding model at runtime. |
| `redaction.enabled` | `true` | Strip secrets/PII before anything is embedded. |
| `redaction.builtIn` | `true` | Apply the built-in secret/PII patterns. |
| `redaction.patterns` | `[]` | Extra `{ name, pattern, flags?, replacement? }` redaction rules. |
| `accessLog` | `true` | Record query metadata to `accessLogPath`. |
| `mcpMaxTopK` | `10` | Hard cap on results any MCP tool may return. |
| `topK` | `8` | Default number of passages returned by `search`/`ask`. |
| `chunkSize` | `1200` | Characters per chunk. |
| `chunkOverlap` | `200` | Overlapping characters between chunks (must be `< chunkSize`). |
| `maxFileBytes` | `50000000` | Skip files larger than this. |
| `ingestConcurrency` | `4` | Files processed in parallel during ingest. |
| `embeddingBatchSize` | `32` | Chunks embedded per batch. |
| `includeExtensions` | `[]` | Extra file extensions to treat as indexable text. |
| `pdfOcrCommand`, `imageOcrCommand`, `legacyWordCommand` | `[]` | Opt-in external extractors (see below). |
| `pdfOcrTimeoutMs`, `imageOcrTimeoutMs`, `legacyWordTimeoutMs` | `120000` | Timeouts for the external extractors. |

### Extra source paths (`sources`)

Mimir always indexes everything under `rawDir` (`.mimir/raw/`). To pull in files that live elsewhere —
sibling packages in a monorepo, a shared docs folder, a downloaded directory — add them straight to the
`sources` array in `.mimir/config.json`. No separate file is needed:

```json
{
"sources": [
"../packages/*/README.md",
"../docs",
"./NOTES.md",
"!../packages/**/node_modules/**"
]
}
```

Each entry is one of:

- a **file** or **directory** path — relative paths resolve from the project root; directories are indexed recursively;
- a **glob** pattern — any entry containing `*`, `?`, `[`, or `{`;
- an **exclusion** — starts with `!` and filters the glob matches.

> **Legacy `sources.txt`.** Paths listed one per line in `.mimir/sources.txt` are still read when the
> file exists, and `mimir sources add` / `mimir sources list` continue to manage it. Entries from both
> the `sources` array and `sources.txt` are merged, so existing projects keep working unchanged. New
> projects should prefer the `sources` array — `mimir init` no longer creates a `sources.txt`.

Environment overrides:

- `MIMIR_RAW_DIR`
Expand Down
3 changes: 2 additions & 1 deletion docs/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Mimir ships two CLIs:
| --- | --- |
| `mimir setup` | Initialize Mimir, install the agent kit, run doctor, and ingest when safe. |
| `mimir setup --semantic` | Run first setup and explicitly download the configured Transformers.js embedding model for higher-quality semantic retrieval. |
| `mimir init` | Create `.mimir/config.json`, `.mimir/sources.txt`, `.mimir/raw/`, and Git ignore rules. |
| `mimir init` | Create `.mimir/config.json` (with a `sources` array), `.mimir/raw/`, and Git ignore rules. |
| `mimir doctor` | Diagnose setup, index freshness, security warnings, and the next command to run. |
| `mimir doctor --fix` | Create missing scaffolding, install skills/MCP config, and update stale indexes when safe. |
| `mimir models pull` | Download the configured Transformers.js embedding model into `embeddingModelPath`. |
Expand Down Expand Up @@ -78,6 +78,7 @@ Mimir ships two CLIs:
| `--offline` | `audio`, `mimir-tts render` | Disable remote model downloads and force the local Transformers.js path. |
| `--allow-remote-models` | `audio`, `mimir-tts render` | Explicitly allow model downloads for Transformers.js. |
| `--engine edge` | `audio`, `mimir-tts render` | Use online Edge TTS for MP3 output. |
| `--lang <en\|es\|fr>` | `audio`, `mimir-tts render` | Select the TTS language; picks the offline model and Edge voice. Default `fr`. |

See [`offline-tts-preload.md`](./offline-tts-preload.md) before using `--offline` on a fully
air-gapped machine.
Expand Down
20 changes: 17 additions & 3 deletions packages/mimir-core/dist/cli.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion packages/mimir-core/dist/cli.js.map

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion packages/mimir-core/dist/config.d.ts.map

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions packages/mimir-core/dist/config.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading