Semantic Search

Find memories by meaning, not just keywords.

Every memory is embedded as a 384-dimensional vector on store. When you recall, the query is embedded too, and cosine similarity finds conceptually related memories — even if the words don't overlap.

How it works

  1. On store: Memory content is sent to @cf/baai/bge-small-en-v1.5 (Cloudflare Workers AI). The resulting 384-dim vector is upserted to Cloudflare Vectorize.
  2. On recall: The query is embedded with the same model. Vectorize returns the top-K most similar vectors filtered by workspace.
  3. Hybrid ranking: Vector results are merged with keyword results using a configurable alpha weight.

Embedding model

PropertyValue
Model@cf/baai/bge-small-en-v1.5
Dimensions384
Similarity metricCosine
RuntimeCloudflare Workers AI
StorageCloudflare Vectorize

Embeddings happen fire-and-forget on store — the API responds immediately without waiting for the embedding to complete. The embedded_at column tracks which memories have been embedded.

Hybrid ranking formula

finalScore = alpha × normalizedKeywordScore + (1 - alpha) × normalizedVectorScore

Both score sets are normalized to [0, 1] before combining. The alpha parameter controls the balance:

AlphaBehavior
1.0Pure keyword (exact match only)
0.7Keyword-heavy (prefers exact terms)
0.5Balanced (default)
0.3Semantic-heavy (prefers conceptual similarity)
0.0Pure semantic (no keyword matching)

Tuning alpha per workspace

Alpha defaults to 0.5. To change it for a workspace, insert a row into the workspace_settings table:

-- Via SQL (Turso dashboard or migration)
INSERT OR REPLACE INTO workspace_settings (key, value)
VALUES ('recall_alpha', '0.3');

Vector index

All workspaces share a single Vectorize index (memento-memories) with workspace-level filtering via the workspace_id metadata field.

PropertyValue
Index namememento-memories
Vector ID format{workspaceId}:{memoryId}
Metadata{ workspace_id, memory_id }
Namespace filterworkspace_id = "your-workspace"

Backfilling existing memories

Memories stored before semantic search was enabled won't have embeddings. Use the admin endpoint to backfill:

POST /v1/admin/backfill-embeddings
Authorization: Bearer mp_live_...
X-Memento-Workspace: my-agent

This processes un-embedded memories in batches of 50. It's idempotent — safe to run multiple times. Rate limiting from Workers AI may require multiple runs for large workspaces.

Graceful degradation

Every vector operation checks for the presence of AI and VECTORIZE bindings. Without them:

  • Embedding on store silently skips
  • Semantic search returns an empty array
  • The context endpoint falls back to pure keyword ranking
  • All other features work normally

This means the system works identically in local development (no Cloudflare bindings) and on free-tier deployments. The test suite runs without any AI dependencies.


What to read next