Semantic Search
Find memories by meaning, not just keywords.
Every memory is embedded as a 384-dimensional vector on store. When you recall, the query is embedded too, and cosine similarity finds conceptually related memories — even if the words don't overlap.
How it works
- On store: Memory content is sent to
@cf/baai/bge-small-en-v1.5(Cloudflare Workers AI). The resulting 384-dim vector is upserted to Cloudflare Vectorize. - On recall: The query is embedded with the same model. Vectorize returns the top-K most similar vectors filtered by workspace.
- Hybrid ranking: Vector results are merged with keyword results using a configurable alpha weight.
Embedding model
| Property | Value |
|---|---|
| Model | @cf/baai/bge-small-en-v1.5 |
| Dimensions | 384 |
| Similarity metric | Cosine |
| Runtime | Cloudflare Workers AI |
| Storage | Cloudflare Vectorize |
Embeddings happen fire-and-forget on store — the API responds immediately without waiting for the embedding to complete. The embedded_at column tracks which memories have been embedded.
Hybrid ranking formula
finalScore = alpha × normalizedKeywordScore + (1 - alpha) × normalizedVectorScore Both score sets are normalized to [0, 1] before combining. The alpha parameter controls the balance:
| Alpha | Behavior |
|---|---|
1.0 | Pure keyword (exact match only) |
0.7 | Keyword-heavy (prefers exact terms) |
0.5 | Balanced (default) |
0.3 | Semantic-heavy (prefers conceptual similarity) |
0.0 | Pure semantic (no keyword matching) |
Tuning alpha per workspace
Alpha defaults to 0.5. To change it for a workspace, insert a row into the workspace_settings table:
-- Via SQL (Turso dashboard or migration)
INSERT OR REPLACE INTO workspace_settings (key, value)
VALUES ('recall_alpha', '0.3'); Vector index
All workspaces share a single Vectorize index (memento-memories) with workspace-level filtering via the workspace_id metadata field.
| Property | Value |
|---|---|
| Index name | memento-memories |
| Vector ID format | {workspaceId}:{memoryId} |
| Metadata | { workspace_id, memory_id } |
| Namespace filter | workspace_id = "your-workspace" |
Backfilling existing memories
Memories stored before semantic search was enabled won't have embeddings. Use the admin endpoint to backfill:
POST /v1/admin/backfill-embeddings
Authorization: Bearer mp_live_...
X-Memento-Workspace: my-agent
This processes un-embedded memories in batches of 50. It's idempotent — safe to run multiple times. Rate limiting from Workers AI may require multiple runs for large workspaces.
Graceful degradation
Every vector operation checks for the presence of AI and VECTORIZE bindings. Without them:
- Embedding on store silently skips
- Semantic search returns an empty array
- The context endpoint falls back to pure keyword ranking
- All other features work normally
This means the system works identically in local development (no Cloudflare bindings) and on free-tier deployments. The test suite runs without any AI dependencies.
What to read next
- Context Endpoint — where hybrid ranking is used
- API Reference — all endpoints including graph traversal
- Core Concepts — the full memory model