Added
vector.HTTPEmbedder— adapter for any OpenAI-compatible embeddings API (OpenAI, Ollama, LM Studio, Voyage AI, llama.cpp server, vLLM), built on stdlibnet/httponly.EmbedBatchfor one-call corpus indexing,EmbedContext/EmbedBatchContextfor cancellation, dims validation with optional inference (dims = 0), bearer/custom-header auth, optional L2 normalization. Strict response validation: index permutation, empty embeddings, and inconsistent dims all error instead of silently corrupting results.pkg/onnx— new package running BERT-family transformer models (e.g.sentence-transformers/all-MiniLM-L6-v2) fully in-process via ONNX Runtime: no server, no API key, deterministic. Includes a pure-Go BERT WordPiece tokenizer (lowercase, NFD accent stripping, Unicode format-char removal, punctuation/CJK splitting — no Python/Rust tokenizer). Auto-detects model layout (mean-poolslast_hidden_stateor uses pre-pooledsentence_embedding); batch calls pad+mask so results match per-text calls.cmd/onnx-demo+make demo-onnx— end-to-end semantic search demo;make modeldownloads MiniLM pinned to a Hugging Face revision with sha256 verification.
Changed
- Dependency policy is now scoped rather than absolute:
pkg/vectorstill imports stdlib only and builds withCGO_ENABLED=0; the third-party deps (onnxruntime_go,golang.org/x/text) are quarantined inpkg/onnx, so consumers who don't import it pay no CGo or dependency cost. - README/AGENTS.md/CLAUDE.md updated for the new embedders and policy.
Details
// Remote: any OpenAI-compatible endpoint
e := vector.NewHTTPEmbedder("http://localhost:11434/v1", "nomic-embed-text", 0)
// Local: ONNX transformer, fully in-process
e, _ := onnx.New("model.onnx", "vocab.txt")
defer e.Close()
vecs, _ := e.EmbedBatch(docs) // one call, padded + masked
store := vector.NewStore(vector.CosineDistance)
q, _ := e.Embed("animals that live with people")
store.Search(q, 5) // real semantic matchesONNX setup: brew install onnxruntime (or set ONNXRUNTIME_SHARED_LIBRARY_PATH), then make model && make demo-onnx to see it run. MiniLM embeds at ~1ms/query after a ~200ms model load (384 dims).
Verification: 40+ new tests (race-clean), tokenizer fuzz harness (3.2M execs, 0 failures), end-to-end suite against real MiniLM artifacts, plus an adversarial multi-agent review pass — findings and certificate on #3.
🤖 Generated with Claude Code