Semantic Search

Find memories by meaning, not just keywords.

Every memory is embedded as a 384-dimensional vector on store. When you recall, the query is embedded too, and cosine similarity finds conceptually related memories — even if the words don't overlap.

How it works

On store: Memory content is sent to @cf/baai/bge-small-en-v1.5 (Cloudflare Workers AI). The resulting 384-dim vector is upserted to Cloudflare Vectorize.
On recall: The query is embedded with the same model. Vectorize returns the top-K most similar vectors filtered by workspace.
Hybrid ranking: Vector results are merged with keyword results using a configurable alpha weight.

Embedding model

Property	Value
Model	`@cf/baai/bge-small-en-v1.5`
Dimensions	384
Similarity metric	Cosine
Runtime	Cloudflare Workers AI
Storage	Cloudflare Vectorize

Embeddings happen fire-and-forget on store — the API responds immediately without waiting for the embedding to complete. The embedded_at column tracks which memories have been embedded.

Hybrid ranking formula

finalScore = alpha × normalizedKeywordScore + (1 - alpha) × normalizedVectorScore

Both score sets are normalized to [0, 1] before combining. The alpha parameter controls the balance:

Alpha	Behavior
`1.0`	Pure keyword (exact match only)
`0.7`	Keyword-heavy (prefers exact terms)
`0.5`	Balanced (default)
`0.3`	Semantic-heavy (prefers conceptual similarity)
`0.0`	Pure semantic (no keyword matching)

Tuning alpha per workspace

Alpha defaults to 0.5. To change it for a workspace, insert a row into the workspace_settings table:

-- Via SQL (Turso dashboard or migration)
INSERT OR REPLACE INTO workspace_settings (key, value)
VALUES ('recall_alpha', '0.3');

Vector index

All workspaces share a single Vectorize index (memento-memories) with workspace-level filtering via the workspace_id metadata field.

Property	Value
Index name	`memento-memories`
Vector ID format	`{workspaceId}:{memoryId}`
Metadata	`{ workspace_id, memory_id }`
Namespace filter	`workspace_id = "your-workspace"`

 Backfilling existing memories
 
Memories stored before semantic search was enabled won't have embeddings. Use the admin endpoint to backfill:
 POST /v1/admin/backfill-embeddings
Authorization: Bearer mp_live_...
X-Memento-Workspace: my-agent
 
This processes un-embedded memories in batches of 50. It's idempotent — safe to run multiple times. Rate limiting from Workers AI may require multiple runs for large workspaces.
 Graceful degradation
 
Every vector operation checks for the presence of AI and VECTORIZE bindings. Without them:
  Embedding on store silently skips
 Semantic search returns an empty array
 The context endpoint falls back to pure keyword ranking
 All other features work normally
 
 
This means the system works identically in local development (no Cloudflare bindings) and on free-tier deployments. The test suite runs without any AI dependencies.
 
 What to read next
  Context Endpoint — where hybrid ranking is used
 API Reference — all endpoints including graph traversal
 Core Concepts — the full memory model