recall_memory
How semantic recall works locally with sub-25ms p99 latency and zero network round-trips.
recall_memory performs a vector similarity search locally against the on-device LanceDB database. Because the vector database and the embedding model run within the agent's host process, semantic reads never require network requests.
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
query | string | ✓ | The natural-language query to embed and search. |
topK | number | The maximum number of results to return. Default is 3. | |
minScore | number | Minimum composite-score threshold. Results below it are filtered out (see Reinforcement-Aware Ranking below). | |
filters | object | SQL-like conditions evaluated locally against decrypted record metadata (e.g., matching tags, specific lineages, or categories). |
Cold-Start Elimination
The local embedding model (Transformers.js executing all-MiniLM-L6-v2) requires a one-time loading phase into the ONNX runtime. To prevent the first search call from blocking (which takes ~1.2s):
- Eager Warming: The MCP server and Node SDK trigger model loading instantly upon startup (
await memory.ready()), rather than waiting for the first query. - Background Loading: In conversational environments, the model loads in a non-blocking background thread while the client registers, guaranteeing that by the time the agent performs its first tool call, the embedding engine is warm.
LRU Cache Tuning & Observability
To achieve sub-10ms query times, the SDK maintains a 256-entry LRU (Least Recently Used) cache of query strings mapped to their compiled 384-dimensional vector embeddings.
- Repeated or structurally similar queries bypass the ONNX execution path completely, reducing retrieval latency to <1ms.
- Observability: You can monitor cache performance by listening to the client's telemetry events:
memory.on("cache", (event) => { console.log(`Cache ${event.type}: ${event.query}`); // "hit" or "miss" }); - Configuration: Cache size can be configured in the client initialization options:
const memory = new sovseal({ cacheSize: 512, // increase for high-frequency agents });
Filter Syntax
Unlike typical vector databases that require server-side filtering, sovseal decodes metadata client-side before applying filters. This prevents the server from learning about your structural queries.
// Search with a category filter and parent constraint
const results = await memory.recall("user testing preferences", {
topK: 5,
filters: {
AND: [
{ category: "development" },
{ tags: { contains: "testing" } }
]
}
});Reinforcement-Aware Ranking (0.3.5)
Raw vector distance is only the first pass. recall_memory over-fetches 8 × topK candidates by vector distance, then re-ranks them by a composite score before returning the top topK:
score = similarity × decay × reinforcement| Factor | Definition |
|---|---|
| similarity | max(0, 1 − distance / 2) — cosine-equivalent of the L2 distance to the query vector. |
| decay | exp(−λ_type · days_since(last_reinforced)) — exponential temporal decay. Half-lives are per memory type: episodic 14d, semantic 90d, procedural 180d. Override with SOVSEAL_DECAY_EPISODIC / SOVSEAL_DECAY_SEMANTIC / SOVSEAL_DECAY_PROCEDURAL. |
| reinforcement | 1 + ln(1 + reinforce_count) — memories restated more often rank higher. |
The practical consequence: a frequently-reinforced older fact can out-rank a fresher, higher-raw-similarity one-off. This is what makes recall behave like memory rather than a plain nearest-neighbor index. Results are returned as { id, text, score } in descending composite-score order.
See Memory Model → Typing, reinforcement & provenance for how type and reinforce_count are set.
Integration Examples
import { sovseal } from "@sovseal/sdk";
const memory = new sovseal({ apiKey: process.env.SOVSEAL_API_KEY });
await memory.ready();
const hits = await memory.recall("prefers vitest over jest", {
topK: 3,
minScore: 0.85,
});
console.log(hits);
// Output: [{ payload: { framework: "vitest" }, score: 0.92, path: "user.preferences.testing" }]// Arguments passed to the tool
{
"name": "recall_memory",
"arguments": {
"query": "prefers vitest over jest",
"topK": 3
}
}
// Result
{
"success": true,
"data": [
{
"path": "user.preferences.testing",
"payload": {
"framework": "vitest"
},
"score": 0.92
}
],
"timestamp": 1716301928155
}# Under self-hosted, you can query the REST endpoint for active snapshot status.
# Note: Payload returns as ciphertext; the local SDK executes the decryption.
curl -X GET "https://your-endpoint.com/v2/agent-state?project_id=sov_proj_123&query_hash=5e883..." \
-H "Authorization: Bearer my-self-hosted-token"