Skip to content

v1.3.0 — Real Embeddings: OpenAI-Compatible APIs & Local ONNX Models

Latest

Choose a tag to compare

@jkyberneees jkyberneees released this 10 Jun 12:37
9ed09d9

Added

  • vector.HTTPEmbedder — adapter for any OpenAI-compatible embeddings API (OpenAI, Ollama, LM Studio, Voyage AI, llama.cpp server, vLLM), built on stdlib net/http only. EmbedBatch for one-call corpus indexing, EmbedContext/EmbedBatchContext for cancellation, dims validation with optional inference (dims = 0), bearer/custom-header auth, optional L2 normalization. Strict response validation: index permutation, empty embeddings, and inconsistent dims all error instead of silently corrupting results.
  • pkg/onnx — new package running BERT-family transformer models (e.g. sentence-transformers/all-MiniLM-L6-v2) fully in-process via ONNX Runtime: no server, no API key, deterministic. Includes a pure-Go BERT WordPiece tokenizer (lowercase, NFD accent stripping, Unicode format-char removal, punctuation/CJK splitting — no Python/Rust tokenizer). Auto-detects model layout (mean-pools last_hidden_state or uses pre-pooled sentence_embedding); batch calls pad+mask so results match per-text calls.
  • cmd/onnx-demo + make demo-onnx — end-to-end semantic search demo; make model downloads MiniLM pinned to a Hugging Face revision with sha256 verification.

Changed

  • Dependency policy is now scoped rather than absolute: pkg/vector still imports stdlib only and builds with CGO_ENABLED=0; the third-party deps (onnxruntime_go, golang.org/x/text) are quarantined in pkg/onnx, so consumers who don't import it pay no CGo or dependency cost.
  • README/AGENTS.md/CLAUDE.md updated for the new embedders and policy.

Details

// Remote: any OpenAI-compatible endpoint
e := vector.NewHTTPEmbedder("http://localhost:11434/v1", "nomic-embed-text", 0)

// Local: ONNX transformer, fully in-process
e, _ := onnx.New("model.onnx", "vocab.txt")
defer e.Close()

vecs, _ := e.EmbedBatch(docs)              // one call, padded + masked
store := vector.NewStore(vector.CosineDistance)
q, _ := e.Embed("animals that live with people")
store.Search(q, 5)                          // real semantic matches

ONNX setup: brew install onnxruntime (or set ONNXRUNTIME_SHARED_LIBRARY_PATH), then make model && make demo-onnx to see it run. MiniLM embeds at ~1ms/query after a ~200ms model load (384 dims).

Verification: 40+ new tests (race-clean), tokenizer fuzz harness (3.2M execs, 0 failures), end-to-end suite against real MiniLM artifacts, plus an adversarial multi-agent review pass — findings and certificate on #3.

🤖 Generated with Claude Code