Verified by the sovseal team

Embedders Overview

Understand how sovseal generates 384-dimensional vector embeddings on-device using quantized ONNX pipelines.

sovseal uses local-first, CPU-bound machine learning to generate vector embeddings from your text. This eliminates dependency on external third-party embedding APIs (like OpenAI or Cohere), preventing user input leakage and saving cost.


Technical Specifications

  • Default Model: Quantized Xenova/all-MiniLM-L6-v2 (ONNX format, ~22 MB).
  • Dimensions: 384 dimensions.
  • Normalization: Vectors are mean-pooled and L2-normalized, allowing cosine similarity to be computed as a simple dot product (reducing local vector database search complexity).
  • Execution: Runs entirely on your CPU via ONNX Runtime integrated with Transformers.js. No GPU is required.

Model Lifecycle & Warmup

1. Eager Warmup on Startup

When your agent client connects or when the MCP server receives its initialization handshake, it executes the warmupEmbeddingPipeline() sequence. This:

  • Spawns the ONNX runtime threads.
  • Runs a tiny, single-character no-op embedding compilation.
  • Pre-warms the JIT compiler path so that your first user-facing recall_memory query has zero execution latency.

2. Local Disk Caching

On the first execution, the model is downloaded from Hugging Face and cached locally:

  • Default Directory: ~/.sovseal/models/
  • You can override the download and load cache path by setting the SOVSEAL_MODEL_DIR environment variable.

Model Integrity Verification

To protect against MITM attacks or malicious tampering of the local model files, the embedding manager runs a SHA-256 integrity check against the cached assets before loading. It verifies:

  • config.json
  • tokenizer.json
  • tokenizer_config.json
  • onnx/model_quantized.onnx

If any file hash does not match the pinned release values, the runtime aborts and logs an error to protect your system.

On this page