Open-source local RAG library, CLI, and MCP server. Ragmir indexes your specs, docs, and code locally and gives your AI agents only the useful cited passages, over MCP, without burning tokens on your whole repo.
Build from your requirements, keep everything on your machine, and let Claude, Codex, Kimi, OpenCode, Cline, or any MCP client answer from your real sources. Ragmir installs into any Node.js repository, stores vectors locally with LanceDB, and runs fully offline by default, with built-in local-hash retrieval or optional Transformers.js semantic embeddings.
Ragmir Core returns cited retrieval context. Answer synthesis belongs to the AI agent, LLM, or local model runtime you choose around it, so every answer stays grounded in your real evidence.
Created by Jean-Baptiste Thery and published under the JCode Labs npm scope.
Ragmir is designed for agent-assisted development when the useful context is local, private, and spread across repositories, specifications, exports, and synced folders.
| Use case | What it enables |
|---|---|
| Index a repository's documentation | Ask Claude Code, Codex, Kimi Code CLI, OpenCode, Cline, or another agent to implement features from local README files, architecture notes, API contracts, ADRs, and runbooks. |
Code from a specification or cahier des charges |
Turn a local PRD, tender response, client brief, or engineering spec into an implementation plan, acceptance checklist, and cited change guidance. |
| Work from a downloaded Google Drive folder | Point Ragmir at files synced locally through Google Drive for desktop, then let the agent retrieve context without uploading the corpus to a hosted RAG service. |
| Onboard to a legacy codebase | Ask where a flow is implemented, which modules own a responsibility, which docs explain a behavior, and what to read before changing risky code. |
| Turn a dense document into a listenable mini-learning | Generate a short spoken summary (MP3/WAV) from cited passages with ragmir audio, to review a spec, architecture doc, or research pass hands-free instead of only reading dense text. |
| Keep multiple agents on the same evidence | Install the same project skills and MCP server for Claude Code, Codex, Kimi Code CLI, OpenCode, and Cline so each tool retrieves from the same local index. |
| Research before implementation | Run an audit-backed multi-query pass over specs, docs, and code references before asking an agent to plan a feature, migration, or review. |
| Prepare implementation and review work | Generate cited task breakdowns, migration notes, release checklists, QA plans, and code-review context from the same local sources the team uses. |
| Audit local knowledge coverage | Check which supported files were indexed, which formats were skipped, whether secrets are likely present, and whether golden queries still retrieve expected evidence. |
The workflow stays simple: keep files on disk, run ragmir ingest, connect your coding agent through
MCP or portable skills, then ask it to work from cited local passages.
Ragmir is the local evidence layer for AI agents: put documents in a repository, index them locally, then let your CLI, MCP-compatible agent, or bundled skills retrieve cited passages without uploading the corpus to a hosted RAG service.
flowchart TD
subgraph Workspace["Your repository"]
Docs["Local files<br/>docs, specs, code, PDFs"]
Config[".ragmir/config.json<br/>.ragmir/raw/"]
Index[".ragmir/storage<br/>local LanceDB index"]
end
subgraph Ragmir["Ragmir Core"]
Ingest["ragmir ingest<br/>parse, redact, chunk"]
Retrieve["ragmir search / ask / research<br/>rank cited evidence"]
Audit["doctor, audit,<br/>security-audit, evaluate"]
end
subgraph Agents["Developer tools"]
CLI["Terminal"]
MCP["MCP server"]
Skills["Portable agent skills"]
LLM["Claude, Codex,<br/>or your trusted model"]
end
Docs --> Ingest
Config --> Ingest
Ingest --> Index
Index --> Retrieve
Index --> Audit
Retrieve --> CLI
Retrieve --> MCP
Skills --> MCP
MCP --> LLM
The fastest useful path is to install Ragmir in the repository, wire it into the coding agent you already use, then ask that agent questions grounded in local files:
npm install --save-dev @jcode.labs/ragmir
npx ragmir setup
# Optional: download a Transformers.js embedding model once and enable higher-quality semantic retrieval.
npx ragmir setup --semantic
npx ragmir install-agent --agents claude,codex,kimi,opencode,cline
npx ragmir doctor --fix
npx ragmir research "release readiness and risks" --compact
# Claude Code
claude mcp add-json --scope local ragmir "$(cat .ragmir/claude-mcp-server.json)"
# Codex
cat .ragmir/codex-mcp.toml
# Kimi Code CLI
kimi --mcp-config-file .ragmir/kimi-mcp.json
# OpenCode
cat .ragmir/opencode.jsonc
# Cline
cat .ragmir/cline-mcp.jsonUse it when an agent needs grounded context over private specs, codebases, legal dossiers, tenders, course material, project archives, or meeting notes, but the files should remain on your machine.
This root README is the canonical product documentation for the public npm packages.
| Package | Role |
|---|---|
@jcode.labs/ragmir |
Ragmir Core: CLI, library, MCP server, bundled agent skills, and synthetic examples. |
@jcode.labs/ragmir-tts |
Ragmir add-on for Edge-quality MP3 and offline Transformers.js WAV rendering through ragmir audio. |
@jcode.labs/ragmir-ui |
Unpublished workspace UI package adapted from the WorkoutGen design foundation for Ragmir surfaces. |
@jcode.labs/ragmir-landing |
Unpublished Astro static landing package. Product-facing titles stay Ragmir. |
@jcode.labs/ragmir-app |
Unpublished Tauri desktop/mobile shell package. Native builds are explicit app commands. Core integration uses a bounded native command around the ragmir CLI, with packaged sidecar distribution still planned. |
@jcode.labs/ragmir-license-webhook |
Unpublished, undeployed MIT-licensed Cloudflare Worker handler for future Lemon Squeezy webhooks and local RAGMIR1 license issuance. |
The package README files are intentionally short because npm displays each package README separately. They point npm readers back to this GitHub documentation.
The product name visible to users is Ragmir. The technical core package is Ragmir Core and now
lives under packages/ragmir-core; the public npm package name remains @jcode.labs/ragmir.
The public source and commercial distribution boundary is tracked in
docs/source-boundary.md and
docs/commercial-distribution.md. No checkout URL, production
download URL, customer data, or license secret is committed to this repository.
Use this README as the entrypoint, then jump into the focused docs when you need command tables, agent wiring, API shapes, security details, or app packaging rules:
| Document | Use it for |
|---|---|
docs/cli-reference.md |
Complete ragmir and ragmir-tts command reference. |
docs/api-reference.md |
Public TypeScript API, setup options, semantic model preload, and MCP tool inputs. |
docs/agent-integration.md |
Claude Code, Codex, Kimi Code CLI, OpenCode, and Cline setup. |
docs/troubleshooting.md |
Empty indexes, weak search, strict security audit warnings, and audio preload fixes. |
SECURITY-HARDENING.md |
Threat model, offline operation, release verification, and higher-assurance deployment notes. |
docs/offline-tts-preload.md |
Preload and verify the offline Transformers.js TTS cache. |
docs/fr-eu-sovereign-positioning.md |
Bounded FR/EU sovereignty, GDPR, AI Act, and legal-vertical positioning. |
docs/source-boundary.md |
What the public MIT repository contains and what must stay outside Git. |
docs/commercial-distribution.md |
Public-safe commercial distribution rules for signed builds, licenses, and support. |
docs/app-sidecar-architecture.md |
Desktop app sidecar and native bridge constraints. |
docs/app-distribution.md |
Direct-download native app packaging and release preflight. |
docs/payment-webhook-architecture.md |
Future checkout, webhook, and local-license architecture. |
llms.txt |
LLM-oriented documentation index for tools such as Context7. |
Ragmir is a public open-source project under the MIT License. It is designed to be inspectable, forkable, and usable without a JCode Labs account.
Every tracked package in this repository is visible source. Commercial Ragmir app distribution can gate official signed builds, support, updates, and hosted license delivery, but it does not make the tracked Tauri app or webhook source proprietary.
Contributions are welcome through pull requests. Start with CONTRIBUTING.md.
Security reports should stay private and follow SECURITY.md.
Ragmir stays MIT open source. Sponsorship helps fund maintenance, issue triage, documentation, and practical agent-workflow improvements.
Sponsor the project through GitHub Sponsors.
Suggested GitHub Sponsors tiers:
- EUR 5/month: support the project.
- EUR 15/month: active sponsor.
- EUR 49/month: priority on issues and questions.
- EUR 199/month: company sponsor and light advisory support.
Early public package. APIs may evolve before 1.0.0.
Ragmir Core is the open-source product you can use today through the CLI, library, MCP server, and portable agent skills.
A cross-platform Ragmir desktop/mobile client is being developed in packages/ragmir-app. Its goal is
to make local confidential workspaces easier for non-CLI workflows: register a local dossier, run
setup and ingest, ask questions with cited local passages, inspect privacy posture, and preload
embedding models explicitly. Google Drive support is implemented as an opt-in local-sync folder flow
over files already present on disk, not as a default cloud API integration.
The native client is not released, signed, or commercially distributed yet. There is no checkout, waitlist, or hosted account flow in this repository. When released, it is planned for direct downloads and sideloadable installers, not App Store or Play Store distribution.
The canonical landing and future direct-download release URL is
ragmir.jcode.works. It is prepared as a Cloudflare Workers Static Assets
site, but public deployment remains a separate release action.
- Build a local RAG knowledge base inside any repository.
- Analyze confidential datasets while keeping raw files and generated indexes local.
- Give Claude, Codex, Kimi, OpenCode, Cline, internal assistants, or other MCP-compatible tools the same private retrieval layer.
- Retrieve grounded local evidence through CLI, library calls, MCP tools, or bundled agent skills.
- Optionally create listenable MP3/WAV summaries or cited Markdown reports with bundled skills.
- Prepare legal-dossier summaries, chronologies, clause reviews, and professional-review handoffs with the optional bundled legal skill.
Ragmir is not a hosted SaaS, not a remote vector database, and not a certified high-assurance system. For regulated or state-grade environments, pair it with encrypted disks, controlled machines, release verification, and an external security review.
- Node.js 20 or newer.
- pnpm, npm, yarn, or bun.
- A repository where generated local folders can be ignored by Git.
- No model runtime is required for the default
embeddingProvider: "local-hash"mode. - Optional semantic embeddings use Transformers.js with local model files under
.ragmir/modelsby default. Useragmir models pullwhen remote model download is acceptable, then keeptransformersAllowRemoteModelsfalse for confidential indexing. - Generated answers are intentionally outside Ragmir core. Use Claude, Codex, OpenAI, a local model MCP server, or another trusted model runtime to synthesize from Ragmir's cited context.
- Optional audio summaries use
@jcode.labs/ragmir-tts. For highest-quality MP3, install the externaledge-ttsCLI and render with--engine edge. For confidential or air-gapped content, use the Transformers.js WAV path with--engine transformers --offline; it does not require Python, ffmpeg, Piper, XTTS, or a local server. - Optional Markdown reports use the bundled
ragmir-markdown-reportskill and should stay under ignored.ragmir/reports/unless explicitly sanitized for sharing.
The package is public. Users do not need a JCode Labs account or npm token to install it.
With npm:
npm install --save-dev @jcode.labs/ragmirWith pnpm:
pnpm add -D @jcode.labs/ragmirInstall the standalone TTS package only when you want to use it directly:
npm install --save-dev @jcode.labs/ragmir-ttsMaintainer tokens are only needed to publish new versions.
Initialize a repository, install the portable agent kit, run readiness checks, and ingest documents when supported files are already present:
# Fast start: no model download, fully local lexical/hash retrieval.
npx ragmir setup
# Higher-quality natural-language retrieval: one-time Transformers.js model download,
# then remote model loading stays disabled for normal confidential indexing.
npx ragmir setup --semanticFresh setup keeps local state under one ignored .ragmir/ folder:
.ragmir/config.json # local config
.ragmir/sources.txt # optional extra source paths
.ragmir/raw/ # raw documents to ingest
.ragmir/storage/ # generated LanceDB index after ingest
.ragmir/access.log # metadata-only access log after use
.ragmir/skills/ragmir/SKILL.md # portable agent skill
.ragmir/skills/ragmir-audio-summary/SKILL.md
.ragmir/skills/ragmir-markdown-report/SKILL.md
.ragmir/skills/ragmir-legal-dossier/SKILL.md
.ragmir/mcp.json # generic MCP server config snippet
.ragmir/claude-mcp-server.json # Claude Code add-json payload
.ragmir/codex-mcp.toml # Codex config.toml snippet with MCP and skills.config
.ragmir/kimi-mcp.json # Kimi Code CLI MCP config
.ragmir/opencode.jsonc # OpenCode config snippet
.ragmir/cline-mcp.json # Cline MCP config
.ragmir/agent-setup.md # agent-specific setup guide
.gitignore # ignores .ragmir/
It detects the repository package manager and writes the MCP helper files with the right command:
npx ragmir serve-mcp, pnpm exec ragmir serve-mcp, yarn exec ragmir serve-mcp, or bunx ragmir serve-mcp.
When a repository needs a wrapper script or only a subset of agent helpers, make that explicit during
setup:
npx ragmir setup --agents claude,codex --mcp-name project-docs --mcp-command ./scripts/serve-mcp.shFor the usual agent-first workflow, expose Ragmir to the coding assistants used in the repository:
npx ragmir install-agent --agents claude,codex,kimi,opencode,clineThen wire the agent you use. Claude Code, Codex, and Cline follow the standard MCP shapes from their
public docs; Kimi and OpenCode use the generated helper files that Ragmir writes under .ragmir/.
# Claude Code: registers the local MCP server for this repository.
claude mcp add-json --scope local ragmir "$(cat .ragmir/claude-mcp-server.json)"
# Codex: review and merge the generated MCP and skills config.
cat .ragmir/codex-mcp.toml
# Kimi Code CLI: launch Kimi with the generated Ragmir MCP config.
kimi --mcp-config-file .ragmir/kimi-mcp.json
# OpenCode: review and merge the generated OpenCode JSONC snippet.
cat .ragmir/opencode.jsonc
# Cline: add the generated JSON under Cline's mcpServers configuration.
cat .ragmir/cline-mcp.jsonFrom the agent, ask naturally, for example: "Use Ragmir to find what this repository says about deployment." The agent calls the MCP tools and uses the bundled skills to work with cited local context.
Check readiness at any time:
npx ragmir doctorIf files are missing from the index, stale, or the setup is incomplete, run:
npx ragmir doctor --fixdoctor --fix performs safe repairs: missing scaffolding, Git ignore entries, agent kit install, and
index rebuild when supported files are present and the privacy posture has no warnings.
Manual initialization is still available:
.ragmir/config.json # local config (add extra paths to the "sources" array)
.ragmir/raw/ # raw documents to ingest
.gitignore # ignores .ragmir/
Put supported files under .ragmir/raw/:
.ragmir/raw/
policy.md
meeting-notes.pdf
requirements.docx
For monorepos or downloaded local folders, add extra paths or glob patterns to the sources array in
.ragmir/config.json. Relative entries resolve from the Ragmir project root, and ! excludes matched files:
{
"sources": [
"../apps/*/README.md",
"../apps/*/docs/**/*.{md,mdx}",
"../packages/*/architecture/**/*.md",
"!../apps/**/node_modules/**"
]
}The legacy .ragmir/sources.txt file (one entry per line) is still read when present and can be managed
from the CLI:
npx ragmir sources add "../apps/*/README.md" "../apps/*/docs/**/*.{md,mdx}"
npx ragmir sources listFor a team of 10 developers, keep Git as the reproducible setup layer and keep the corpus in an approved private source. Each developer materializes the same corpus locally, then builds their own local Ragmir index.
Git repository
README.md
ragmir.config.example.json
ragmir-sources.example.txt
scripts/sync-corpus.sh
Ignored local state on each developer machine
.ragmir/config.json
.ragmir/sources.txt
.ragmir/raw/ or data/private-corpus/
.ragmir/storage/
.ragmir/access.log
.ragmir/models/
If your team uses Google Drive, Dropbox, SharePoint, S3, rsync, an encrypted ZIP, or another private source, write a small project script that syncs into an ignored local folder and then ingests:
#!/usr/bin/env bash
set -euo pipefail
mkdir -p .ragmir/raw
# Example only: replace this with your approved private sync command.
# rclone copy "team-drive:Project Knowledge" .ragmir/raw --drive-export-formats docx,xlsx,pptx,pdf
npx ragmir ingest
npx ragmir doctorCommit the script and instructions, not the synced files. The same pattern works without Google
Drive: every developer downloads the same approved archive or mirror into the same ignored path, then
runs npx ragmir ingest. Ragmir compares checksums and reuses unchanged rows, so refreshes stay
incremental.
Build the local index:
npx ragmir ingest
npx ragmir doctorWhen the index is ready, ragmir doctor prints ready=true. ragmir ingest and ragmir audit also report
files that were discovered but not indexed because the type is unsupported, the file is too large,
or the file name looks like a secret/private key.
List skipped paths explicitly:
npx ragmir audit --unsupportedSummarize recent metadata-only usage without exposing raw queries or local paths:
npx ragmir usage-report --days 7Retrieve exact passages:
npx ragmir search "approval for offline operation"Return cited retrieval context for an agent or model:
npx ragmir ask "What evidence supports offline operation?"Run an audit-backed multi-query research pass before a broad synthesis or implementation task:
npx ragmir research "release readiness and risks" --compactMeasure recall against a golden query file:
npx ragmir evaluate --golden golden-queries.jsonFor private dogfooding, keep the real corpus and golden query file outside Git or under an ignored local path, then use a threshold that matches the evaluation phase:
npx ragmir --project-root /path/to/workspace ingest
npx ragmir --project-root /path/to/workspace evaluate --golden .ragmir/evaluations/golden-queries.json --fail-under 0.8 --jsonThe JSON report includes the active embeddingProvider and embeddingModel, so you can compare
default local-hash recall with a private Transformers semantic run without storing the report in Git.
Ragmir does not synthesize an LLM answer. It returns cited local passages; your chosen agent or model does the writing around those passages.
With pnpm, use pnpm exec after installing the package:
pnpm exec ragmir setup
pnpm exec ragmir doctor
pnpm exec ragmir search "approval for offline operation"Ragmir has two embedding modes.
Use this when you want a fully local, no-model smoke test or a dependency-light setup. Retrieval is lexical/hash-based, not semantic.
.ragmir/config.json:
{
"embeddingProvider": "local-hash"
}Commands:
npx ragmir ingest
npx ragmir search "offline retrieval approval"
npx ragmir ask "What evidence supports offline operation?"ragmir ask always returns cited retrieved passages instead of a generated synthesis. You can pass those
passages to any LLM or agent you trust.
Use this when you want better semantic retrieval while keeping Ragmir core free of an LLM server.
.ragmir/config.json:
{
"embeddingProvider": "transformers",
"embeddingModel": "mixedbread-ai/mxbai-embed-xsmall-v1",
"embeddingModelPath": ".ragmir/models",
"transformersAllowRemoteModels": false
}Commands:
npx ragmir setup --semantic
# Or later:
npx ragmir models pull --enable
npx ragmir ingest
npx ragmir ask "Which passages support offline operation?"ragmir setup --semantic is the first-run shortcut. It intentionally allows a one-time download from
Hugging Face into embeddingModelPath, switches .ragmir/config.json to embeddingProvider: "transformers", and leaves transformersAllowRemoteModels false for normal confidential indexing.
Use ragmir models pull --enable when you want to make the same choice later. Re-run
ragmir ingest --rebuild after changing embedding provider or model so stored vectors match the
active configuration.
Ragmir ships with portable agent skills and a standard MCP server.
Use ragmir setup for the normal path, or install only the agent layer later:
npx ragmir install-skill
npx ragmir install-skill --agents claude,codex --mcp-command ./scripts/serve-mcp.sh
npx ragmir install-agent --agents claude,codex,kimi,opencode,clineMain agent examples:
# Claude Code
claude mcp add-json --scope local ragmir "$(cat .ragmir/claude-mcp-server.json)"
# Codex
cat .ragmir/codex-mcp.toml
# Kimi Code CLI
kimi --mcp-config-file .ragmir/kimi-mcp.json
# OpenCode
cat .ragmir/opencode.jsonc
# Cline
cat .ragmir/cline-mcp.jsonStart the MCP server from the repository root when a compatible agent needs tool access:
npx ragmir serve-mcpThe MCP server exposes ragmir_status, ragmir_search, ragmir_ask, ragmir_research,
ragmir_audit, ragmir_evaluate, ragmir_usage_report, and ragmir_security_audit. The LLM does not
need to know about LanceDB or the raw file layout; it asks Ragmir for ranked passages, cited context,
audit-backed research, local recall gates, or metadata-only usage summaries and uses the returned
citations.
Per-agent setup details live in docs/agent-integration.md.
Ragmir includes a plug-and-play text-to-speech path for listenable summaries.
For the same quality path as the global Voice Forge skill, install edge-tts and render MP3:
npx ragmir audio --doctor
pipx install edge-tts
npx ragmir audio /tmp/RAGMIR-SUMMARY-project.txt \
--engine edge \
--out .ragmir/audio/project-summary.mp3The Edge path uses the online Microsoft Edge TTS service through the edge-tts CLI. Use it only
when sending the narration text to that service is acceptable. MP3 output requires explicit
--engine edge for this reason.
By default, ragmir audio uses the Transformers.js WAV path. For confidential or air-gapped work,
preload Transformers.js-compatible model files with non-sensitive text, then render WAV offline:
npx ragmir audio /tmp/RAGMIR-SUMMARY-project.txt \
--engine transformers \
--offline \
--lang fr \
--model-path .ragmir/models/tts \
--out .ragmir/audio/project-summary.wavUse the standalone package directly:
npx ragmir-tts doctor --json
npx ragmir-tts render /tmp/RAGMIR-SUMMARY-project.txt \
--engine edge \
--out .ragmir/audio/project-summary.mp3The default standalone engine is transformers and the default language is fr. Pass
--lang en|es|fr (or RAGMIR_TTS_LANG) to switch language: it selects the matching self-contained
offline model (Xenova/mms-tts-eng, Xenova/mms-tts-spa, or Xenova/mms-tts-fra) and, on the Edge
path, a native neural voice. Override the model directly with --model or RAGMIR_TTS_MODEL.
See docs/offline-tts-preload.md for the exact preload and
offline-check workflow.
The package code lives in node_modules or in this repository. Project data stays in the repository
where you run the CLI:
your-project/
.ragmir/config.json # local config
.ragmir/sources.txt # optional extra source paths
.ragmir/raw/ # raw documents to ingest
.ragmir/storage/ # generated LanceDB index
.ragmir/access.log # metadata-only access log
The package never ships project documents. ragmir setup adds a .ragmir/ gitignore entry, so
generated indexes, agent files, raw documents, reports, models, audio, and access logs stay local to
the target repository.
Legacy projects that already have .kb/config.json keep working. In that mode, Ragmir preserves the
old defaults (private/, .kb/storage, .kb/sources.txt, .kb/access.log) and accepts existing
KB_* environment variables. New setup and docs use .ragmir/ and RAGMIR_*.
Ragmir is designed for private repositories and sensitive local evidence.
- Zero telemetry: no analytics or document content is sent to JCode Labs.
- No LLM generation in core: Ragmir returns cited context for the agent/runtime you choose.
- Local-hash by default: no model runtime is required for the default retrieval path.
- Transformers.js remote model loading is disabled by default.
- Optional Transformers.js model downloads require an explicit preload command or
--allow-remote-models; confidential runs should use already cached local model files. - Redaction before indexing: common secrets and identifiers are redacted before chunks are embedded and stored.
- Metadata-only access logs: query hashes and action metadata are logged, not raw queries.
- Metadata-only usage reports:
ragmir usage-report --days 7summarizes recent local activity without exposing query text or local paths. - MCP is read-focused and bounded by
mcpMaxTopK. - Generated local state is ignored by Git.
Run:
npx ragmir security-audit --strictRemove the generated vector index:
npx ragmir destroy-index --yesdestroy-index does not securely erase SSD or copy-on-write storage. For strong deletion
guarantees, use encrypted storage and destroy the encryption key.
For air-gapped operation, release verification, secure deletion limits, and threat model details,
read SECURITY-HARDENING.md.
Ragmir supports common text, document, data, config, log, and source-code files out of the box:
- Markdown:
.md,.mdx - Text:
.txt,.text - JSON:
.json - YAML:
.yaml,.yml - CSV/TSV:
.csv,.tsv - HTML:
.html,.htm - EPUB:
.epub - PDF:
.pdf - Office/OpenDocument:
.docx,.pptx,.xlsx,.odt,.ods,.odp - Legacy Excel: convert
.xlsworkbooks to.xlsx, CSV, PDF, HTML, or text before ingesting - Legacy Word:
.doconly when an explicit locallegacyWordCommandis configured - Rich text:
.rtf - Notebook:
.ipynb - Subtitles/calendars/mail:
.vtt,.srt,.ics,.eml - Line data and logs:
.jsonl,.ndjson,.log - XML feeds and documents:
.xml,.rss,.atom,.svg - Config and data files:
.toml,.ini,.conf,.cfg,.properties,.sql,.example,.exemple - Common project metadata:
.gitignore,.dockerignore,.npmignore,.gitlab-ci.yml,.vscode/settings.json, Maven wrapper.properties - Source code:
.ts,.tsx,.mts,.cts,.js,.jsx,.mjs,.cjs,.py,.go,.rs,.java,.rb,.php,.cs,.c,.cpp,.h,.hpp,.css,.scss,.vue,.svelte,.astro,.sh,.bash,.bat,.cmd,.ps1 - Common extensionless text wrappers:
mvnw,gradlew,Dockerfile,Makefile,Procfile,Gemfile,Rakefile - Documentation/code review text:
.rst,.adoc,.tex,.diff,.patch,.markdown,.mdown,.mmd
Custom UTF-8 text extensions can be enabled without changing code:
{
"includeExtensions": [".transcript", ".evidence"]
}Or through:
RAGMIR_INCLUDE_EXTENSIONS=".transcript,.evidence" npx ragmir ingestAudio/video files and formats that are not listed are not useful to Ragmir as-is. They can still be
valuable source evidence, but they should be transcribed, converted, or exported to text/PDF/HTML
first. ragmir audit --unsupported prints per-file recommendations for these skipped formats.
Scanned PDFs can use an explicit pdfOcrCommand wrapper when you accept running local OCR tooling.
Standalone image files such as .png, .jpg, .heic, and .tiff stay unsupported by default, but
can be indexed through an explicit local imageOcrCommand wrapper. Old .doc Word binaries stay
unsupported by default, but can be indexed through an explicit local legacyWordCommand wrapper
when your workstation has a trusted extractor. If a supported file parses to no text, ragmir ingest --json reports it under emptyTextFiles. Ragmir intentionally avoids pretending that every binary
format can be indexed safely without extraction logic.
Secret-like files such as .env, .npmrc, private keys, and certificates are skipped by default.
Convert safe examples to a normal text format before ingestion.
Dotfiles are discovered so useful project metadata is not silently missed. Sensitive
key/certificate-like files such as .pem, .key, .p12, .pfx, .jks, .gpg, and common secret
filenames such as .env, .npmrc, .netrc, and .pgpass are skipped by default even if they sit
under a source directory.
Most users should start with ragmir setup and let ragmir doctor explain what is missing. Edit
.ragmir/config.json only when you need to change source paths, retrieval mode, chunking, privacy
limits, or local extractors.
Default .ragmir/config.json for a fresh project:
{
"rawDir": ".ragmir/raw",
"storageDir": ".ragmir/storage",
"sourcesFile": ".ragmir/sources.txt",
"sources": [],
"accessLogPath": ".ragmir/access.log",
"embeddingModelPath": ".ragmir/models",
"tableName": "chunks",
"embeddingProvider": "local-hash",
"embeddingModel": "mixedbread-ai/mxbai-embed-xsmall-v1",
"transformersAllowRemoteModels": false,
"redaction": {
"enabled": true,
"builtIn": true,
"patterns": []
},
"accessLog": true,
"mcpMaxTopK": 10,
"topK": 8,
"chunkSize": 1200,
"chunkOverlap": 200,
"maxFileBytes": 50000000,
"ingestConcurrency": 4,
"embeddingBatchSize": 32,
"includeExtensions": [],
"pdfOcrCommand": [],
"pdfOcrTimeoutMs": 120000,
"imageOcrCommand": [],
"imageOcrTimeoutMs": 120000,
"legacyWordCommand": [],
"legacyWordTimeoutMs": 120000
}Every field, its default, and what it controls:
| Field | Default | Purpose |
|---|---|---|
rawDir |
.ragmir/raw |
Local corpus folder, indexed recursively. The primary place to drop documents. |
sources |
[] |
Extra file, directory, and glob paths (plus ! exclusions) to index, resolved from the project root. See below. |
sourcesFile |
.ragmir/sources.txt |
Legacy one-path-per-line file; still read and merged with sources when present. |
storageDir |
.ragmir/storage |
LanceDB vector store location. |
accessLogPath |
.ragmir/access.log |
Query access log (stores hashes/metadata only). |
embeddingModelPath |
.ragmir/models |
Local cache for the Transformers.js embedding model. |
tableName |
chunks |
LanceDB table name. |
embeddingProvider |
local-hash |
local-hash (offline lexical, not semantic) or transformers (semantic). Switching requires ragmir ingest --rebuild. |
embeddingModel |
mixedbread-ai/mxbai-embed-xsmall-v1 |
Model used when embeddingProvider is transformers. |
transformersAllowRemoteModels |
false |
Allow downloading the embedding model at runtime. |
redaction.enabled |
true |
Strip secrets/PII before anything is embedded. |
redaction.builtIn |
true |
Apply the built-in secret/PII patterns. |
redaction.patterns |
[] |
Extra { name, pattern, flags?, replacement? } redaction rules. |
accessLog |
true |
Record query metadata to accessLogPath. |
mcpMaxTopK |
10 |
Hard cap on results any MCP tool may return. |
topK |
8 |
Default number of passages returned by search/ask. |
chunkSize |
1200 |
Characters per chunk. |
chunkOverlap |
200 |
Overlapping characters between chunks (must be < chunkSize). |
maxFileBytes |
50000000 |
Skip files larger than this. |
ingestConcurrency |
4 |
Files processed in parallel during ingest. |
embeddingBatchSize |
32 |
Chunks embedded per batch. |
includeExtensions |
[] |
Extra file extensions to treat as indexable text. |
pdfOcrCommand, imageOcrCommand, legacyWordCommand |
[] |
Opt-in external extractors (see below). |
pdfOcrTimeoutMs, imageOcrTimeoutMs, legacyWordTimeoutMs |
120000 |
Timeouts for the external extractors. |
Ragmir always indexes everything under rawDir (.ragmir/raw/). To pull in files that live elsewhere —
sibling packages in a monorepo, a shared docs folder, a downloaded directory — add them straight to the
sources array in .ragmir/config.json. No separate file is needed:
{
"sources": [
"../packages/*/README.md",
"../docs",
"./NOTES.md",
"!../packages/**/node_modules/**"
]
}Each entry is one of:
- a file or directory path — relative paths resolve from the project root; directories are indexed recursively;
- a glob pattern — any entry containing
*,?,[, or{; - an exclusion — starts with
!and filters the glob matches.
Legacy
sources.txt. Paths listed one per line in.ragmir/sources.txtare still read when the file exists, andragmir sources add/ragmir sources listcontinue to manage it. Entries from both thesourcesarray andsources.txtare merged, so existing projects keep working unchanged. New projects should prefer thesourcesarray —ragmir initno longer creates asources.txt.
Environment overrides:
RAGMIR_RAW_DIRRAGMIR_STORAGE_DIRRAGMIR_SOURCES_FILERAGMIR_ACCESS_LOG_PATHRAGMIR_EMBEDDING_PROVIDERRAGMIR_EMBEDDING_MODELRAGMIR_EMBEDDING_MODEL_PATHRAGMIR_TRANSFORMERS_ALLOW_REMOTE_MODELSRAGMIR_REDACTION_ENABLEDRAGMIR_REDACTION_BUILT_INRAGMIR_ACCESS_LOGRAGMIR_MCP_MAX_TOP_KRAGMIR_TOP_KRAGMIR_CHUNK_SIZERAGMIR_CHUNK_OVERLAPRAGMIR_MAX_FILE_BYTESRAGMIR_INGEST_CONCURRENCYRAGMIR_EMBEDDING_BATCH_SIZERAGMIR_INCLUDE_EXTENSIONSRAGMIR_PDF_OCR_COMMANDas a JSON array, for example["ragmir-pdf-ocr","{input}"]RAGMIR_PDF_OCR_TIMEOUT_MSRAGMIR_IMAGE_OCR_COMMANDas a JSON array, for example["ragmir-image-ocr","{input}"]RAGMIR_IMAGE_OCR_TIMEOUT_MSRAGMIR_LEGACY_WORD_COMMANDas a JSON array, for example["ragmir-doc-text","{input}"]RAGMIR_LEGACY_WORD_TIMEOUT_MS
Legacy KB_* aliases remain accepted for existing automation.
pdfOcrCommand is opt-in and only runs when normal PDF text extraction returns no text.
imageOcrCommand is also opt-in; image files are treated as supported only when it is configured.
legacyWordCommand is opt-in; .doc files are treated as supported only when it is configured.
External text commands are executed from the target project root without a shell, receive
RAGMIR_PDF_PATH, RAGMIR_IMAGE_PATH, or RAGMIR_LEGACY_WORD_PATH, replace {input} placeholders
with the source path, and must print UTF-8 text to stdout.
Ragmir ships two CLIs:
ragmir: the main local RAG, MCP, skills, security, and audio command.kbremains a legacy alias for compatibility.ragmir-tts: the standalone text-to-speech renderer used byragmir audio.
Most users start with ragmir setup, ragmir doctor, ragmir ingest, ragmir search, ragmir ask,
ragmir research, and ragmir security-audit.
Use ragmir setup --semantic during first setup, or ragmir models pull --enable later, when a
one-time Transformers.js model download is acceptable and you want higher-quality semantic retrieval.
Run ragmir ingest --rebuild after switching embedding provider or model.
Full command table: docs/cli-reference.md.
The TypeScript API mirrors the CLI for applications and sidecars:
import { ask, ingest, search } from "@jcode.labs/ragmir"
await ingest({ rebuild: true })
const results = await search("vendor invoice status")
const answer = await ask("What documents support the project timeline?")Full API reference: docs/api-reference.md.
Use ragmir doctor first. It is the shortest path to the next useful action:
npx ragmir doctorUse doctor --fix when you want Ragmir to repair safe setup issues automatically:
npx ragmir doctor --fixCommon fixes for empty indexes, weak search, strict security audit failures, and TTS setup live in
docs/troubleshooting.md.
For release or integration work in this repository, pnpm validate is the full local gate. It covers
Biome, dependency security audit, TypeScript, Vitest, build output, production CLI/MCP smoke tests,
npm package metadata, semantic-release wiring, and release artifacts.
Ragmir can run retrieval without a model runtime. Some runtime dependencies remain because they own core features:
| Dependency | Why it remains |
|---|---|
@huggingface/transformers |
Optional local semantic embeddings and offline TTS; remote model loading is disabled unless explicitly enabled for preload. |
| LanceDB | Local vector storage and nearest-neighbor retrieval. |
| MCP SDK | MCP server for compatible agents. |
| fast-glob | Safe source-file discovery. |
| unpdf, mammoth, read-excel-file, html-to-text, yaml, fflate | Document parsing for PDF, Office, HTML, YAML, OpenDocument, and EPUB files. |
| commander, zod, picocolors | CLI, config validation, readable terminal output. |
Direct runtime dependency scans do not show analytics SDKs or product telemetry calls. The Astro
landing package uses a wrapper that sets ASTRO_TELEMETRY_DISABLED=1 for dev, check, preview, and
build commands.
Removing more dependencies is possible only by dropping features or replacing them with smaller
internal implementations. The current low-friction path is dependency-light at runtime for users who
choose local-hash, while preserving richer parsing, MCP support, and optional semantic embeddings.
This repository ships two synthetic examples under
packages/ragmir-core/examples. Both use the default local-hash
retrieval mode, so they run without downloading an embedding or chat model, and neither uses private
documents.
Testing local changes: use the repository's own build, not
npx. Inside this reponpx ragmirresolves to the published npm package, not your working copy — so it would not exercise your local edits. The examples below run the localdist/build instead.
sovereign-rag-demo drives the CLI to test
ingestion, retrieval, security-audit, and custom text extensions.
pnpm build
cd packages/ragmir-core/examples/sovereign-rag-demo
node ../../dist/cli.js security-audit
node ../../dist/cli.js ingest
node ../../dist/cli.js search "offline retrieval approval"
node ../../dist/cli.js evaluate --golden golden-queries.json
node ../../dist/cli.js evaluate --golden golden-queries.json --fail-under 1
node ../../dist/cli.js auditlibrary-api-demo exercises the library API
the way an external consumer would import it, but Node self-referencing resolves
@jcode.labs/ragmir to the local build, never npm. It is the fast inner loop when developing Ragmir
Core itself:
pnpm exampleThat builds Ragmir Core, then runs ingest -> search -> ask -> audit through the public API against
the reused synthetic corpus.
Install and validate the monorepo:
pnpm install
pnpm validateUseful filtered commands:
pnpm --filter @jcode.labs/ragmir test
pnpm --filter @jcode.labs/ragmir mcp:smoke
pnpm --filter @jcode.labs/ragmir-tts test
pnpm --filter @jcode.labs/ragmir-app build
pnpm --filter @jcode.labs/ragmir-landing build
pnpm --filter @jcode.labs/ragmir build
pnpm --filter @jcode.labs/ragmir-tts buildpackages/ragmir-core/dist/ and packages/ragmir-tts/dist/ are committed. packages/ragmir-app/dist/
and packages/ragmir-landing/dist/ are ignored build artifacts. After changing TypeScript sources in
published packages, run:
pnpm build
pnpm validateCI checks that generated dist/ files match the source.
The root package is private and only orchestrates workspace tasks. npm publishing is handled by the
protected Release npm GitHub Actions workflow on main. semantic-release derives the version from
Conventional Commits, prepares both package tarballs, publishes @jcode.labs/ragmir-tts first, then
publishes @jcode.labs/ragmir.
Build from source:
git clone git@github.com:jcode-works/jcode-ragmir.git
cd jcode-ragmir
pnpm install
pnpm buildUse a local checkout in another repository:
pnpm add -D file:../jcode-ragmir/packages/ragmir-coreCreate a local npm tarball:
pnpm build
pnpm --dir packages/ragmir-core packMIT (c) Jean-Baptiste Thery.