Verified by the sovseal team

Transformers.js Local Embedding Subsystem

Details of the default CPU-bound ONNX model used to generate fact embeddings on-device.

Transformers.js serves as the default local execution runtime for generating semantic vector embeddings. It runs the quantized version of the all-MiniLM-L6-v2 model directly on the client CPU.


Model Pinned Integrity Hashes

For maximum security, sovseal verifies the model files on startup against pinned SHA-256 hashes to prevent local tampering or corrupted files:

File NamePinned SHA-256
config.json7135149f7cffa1a573466c6e4d8423ed73b62fd2332c575bf738a0d033f70df7
tokenizer.jsonda0e79933b9ed51798a3ae27893d3c5fa4a201126cef75586296df9b4d2c62a0
tokenizer_config.json9261e7d79b44c8195c1cada2b453e55b00aeb81e907a6664974b4d7776172ab3
onnx/model_quantized.onnxafdb6f1a0e45b715d0bb9b11772f032c399babd23bfc31fed1c170afc848bdb1

Performance & LRU Caching

Because generating vector embeddings via ONNX model forward-passes requires CPU cycles (taking ~5–8 ms on modern hardware), sovseal implements an embedding LRU cache to bypass the model execution entirely for redundant queries.

Cache Characteristics

  • Scope: Applied only to query/search operations (recall_memory). Storage operations (store_memory) always bypass the cache to ensure unique fact vectors are written to LanceDB.
  • Default Capacity: 256 queries.
  • Memory Overhead: Extremely low (sub-kilobyte footprint) as it only holds 384-dimensional Float32 vectors.
  • Configuration: You can adjust the cache capacity by setting the SOVSEAL_EMBEDDING_CACHE_SIZE environment variable (set to 0 to disable the cache entirely).

CPU Resource Costs

For typical developer machines and agent host environments:

Operation TypePayload Size (Chars)CPU Execution TimeMemory Footprint
Warmup / JIT Load-1.2s (First-call cold)~30 MB RAM
Cache Hit RecallAny< 0.1 ms (0 CPU forwards)-
Cache Miss Recall100 - 500 chars~4.2 ms-
Large Store Vector5,000 chars~9.5 ms-

On this page