Embedders Overview
Understand how sovseal generates 384-dimensional vector embeddings on-device using quantized ONNX pipelines.
sovseal uses local-first, CPU-bound machine learning to generate vector embeddings from your text. This eliminates dependency on external third-party embedding APIs (like OpenAI or Cohere), preventing user input leakage and saving cost.
Technical Specifications
- Default Model: Quantized
Xenova/all-MiniLM-L6-v2(ONNX format, ~22 MB). - Dimensions: 384 dimensions.
- Normalization: Vectors are mean-pooled and L2-normalized, allowing cosine similarity to be computed as a simple dot product (reducing local vector database search complexity).
- Execution: Runs entirely on your CPU via
ONNX Runtimeintegrated with Transformers.js. No GPU is required.
Model Lifecycle & Warmup
1. Eager Warmup on Startup
When your agent client connects or when the MCP server receives its initialization handshake, it executes the warmupEmbeddingPipeline() sequence. This:
- Spawns the ONNX runtime threads.
- Runs a tiny, single-character no-op embedding compilation.
- Pre-warms the JIT compiler path so that your first user-facing
recall_memoryquery has zero execution latency.
2. Local Disk Caching
On the first execution, the model is downloaded from Hugging Face and cached locally:
- Default Directory:
~/.sovseal/models/ - You can override the download and load cache path by setting the
SOVSEAL_MODEL_DIRenvironment variable.
Model Integrity Verification
To protect against MITM attacks or malicious tampering of the local model files, the embedding manager runs a SHA-256 integrity check against the cached assets before loading. It verifies:
config.jsontokenizer.jsontokenizer_config.jsononnx/model_quantized.onnx
If any file hash does not match the pinned release values, the runtime aborts and logs an error to protect your system.