Embedders Overview

sovseal uses local-first, CPU-bound machine learning to generate vector embeddings from your text. This eliminates dependency on external third-party embedding APIs (like OpenAI or Cohere), preventing user input leakage and saving cost.

Technical Specifications

Default Model: Quantized Xenova/all-MiniLM-L6-v2 (ONNX format, ~22 MB).
Dimensions: 384 dimensions.
Normalization: Vectors are mean-pooled and L2-normalized, allowing cosine similarity to be computed as a simple dot product (reducing local vector database search complexity).
Execution: Runs entirely on your CPU via ONNX Runtime integrated with Transformers.js. No GPU is required.

Model Lifecycle & Warmup

1. Eager Warmup on Startup

When your agent client connects or when the MCP server receives its initialization handshake, it executes the warmupEmbeddingPipeline() sequence. This:

Spawns the ONNX runtime threads.
Runs a tiny, single-character no-op embedding compilation.
Pre-warms the JIT compiler path so that your first user-facing recall_memory query has zero execution latency.

2. Local Disk Caching

On the first execution, the model is downloaded from Hugging Face and cached locally:

Default Directory: ~/.sovseal/models/
You can override the download and load cache path by setting the SOVSEAL_MODEL_DIR environment variable.

Model Integrity Verification

To protect against MITM attacks or malicious tampering of the local model files, the embedding manager runs a SHA-256 integrity check against the cached assets before loading. It verifies:

config.json
tokenizer.json
tokenizer_config.json
onnx/model_quantized.onnx

If any file hash does not match the pinned release values, the runtime aborts and logs an error to protect your system.

Technical Specifications

Model Lifecycle & Warmup

1. Eager Warmup on Startup

2. Local Disk Caching

Model Integrity Verification

On this page