Transformers.js Local Embedding Subsystem

Transformers.js serves as the default local execution runtime for generating semantic vector embeddings. It runs the quantized version of the all-MiniLM-L6-v2 model directly on the client CPU.

Model Pinned Integrity Hashes

For maximum security, sovseal verifies the model files on startup against pinned SHA-256 hashes to prevent local tampering or corrupted files:

File Name	Pinned SHA-256
`config.json`	`7135149f7cffa1a573466c6e4d8423ed73b62fd2332c575bf738a0d033f70df7`
`tokenizer.json`	`da0e79933b9ed51798a3ae27893d3c5fa4a201126cef75586296df9b4d2c62a0`
`tokenizer_config.json`	`9261e7d79b44c8195c1cada2b453e55b00aeb81e907a6664974b4d7776172ab3`
`onnx/model_quantized.onnx`	`afdb6f1a0e45b715d0bb9b11772f032c399babd23bfc31fed1c170afc848bdb1`

Performance & LRU Caching

Because generating vector embeddings via ONNX model forward-passes requires CPU cycles (taking ~5–8 ms on modern hardware), sovseal implements an embedding LRU cache to bypass the model execution entirely for redundant queries.

Cache Characteristics

Scope: Applied only to query/search operations (recall_memory). Storage operations (store_memory) always bypass the cache to ensure unique fact vectors are written to LanceDB.
Default Capacity: 256 queries.
Memory Overhead: Extremely low (sub-kilobyte footprint) as it only holds 384-dimensional Float32 vectors.
Configuration: You can adjust the cache capacity by setting the SOVSEAL_EMBEDDING_CACHE_SIZE environment variable (set to 0 to disable the cache entirely).

CPU Resource Costs

For typical developer machines and agent host environments:

Operation Type	Payload Size (Chars)	CPU Execution Time	Memory Footprint
Warmup / JIT Load	-	1.2s (First-call cold)	~30 MB RAM
Cache Hit Recall	Any	< 0.1 ms (0 CPU forwards)	-
Cache Miss Recall	100 - 500 chars	~4.2 ms	-
Large Store Vector	5,000 chars	~9.5 ms	-

Transformers.js Local Embedding Subsystem

Model Pinned Integrity Hashes

Performance & LRU Caching

Cache Characteristics

CPU Resource Costs

On this page