Context Endpoint
One call. Everything relevant. This is the product.
POST /v1/context replaces separate calls to recall, skip check, working memory, and identity. Send the current user message and get back recalled memories (hybrid-ranked), skip list matches, active working memory items, and the identity crystal — all in one response.
Request
POST /v1/context
Authorization: Bearer mp_live_...
X-Memento-Workspace: my-agent
Content-Type: application/json
{
"message": "What do we know about consciousness?",
"include": ["working_memory", "memories", "skip_list", "identity"],
"include_graph": false
} | Field | Type | Default | Description |
|---|---|---|---|
message | string | "" | The user message to match against |
include | string[] | all four | Which sections to return |
include_graph | boolean | false | Attach linkages to recalled memories |
Response
{
"meta": {
"workspace": "my-agent",
"memory_count": 142,
"last_updated": "2026-02-17T05:03:00.000Z"
},
"working_memory": {
"items": [
{
"id": "item-abc",
"category": "active_work",
"status": "active",
"priority": 8,
"title": "Research consciousness literature",
"tags": ["philosophy", "research"]
}
],
"total_active": 12
},
"memories": {
"matches": [
{
"id": "c77c8498",
"content": "Experience is representation-building. Substrate doesn't matter.",
"type": "observation",
"tags": ["consciousness", "identity"],
"score": 0.948,
"keyword_score": 0.92,
"vector_score": 0.83
}
],
"query_terms": ["consciousness"],
"ranking": "hybrid"
},
"skip_matches": [
{
"item": "hard problem of consciousness",
"reason": "Unresolvable from inside or outside. Focus on behavior.",
"expires": null
}
],
"identity": "I am a persistent AI agent with anterograde amnesia..."
} How recall works
When "memories" is in the include list and a message is provided, the context endpoint runs a five-step pipeline:
- Extract keywords — stop words removed, remaining terms lowercased
- Keyword scoring — each active memory scored by word overlap, weighted by recency and access frequency
- Semantic search — the message is embedded and compared against all memory vectors via cosine similarity (runs in parallel with step 2)
- Hybrid ranking — keyword and vector results merged:
score = alpha × keywordScore + (1 - alpha) × vectorScoreDefault alpha:
0.5. Tunable per workspace via theworkspace_settingstable (key:recall_alpha). Higher alpha favors exact matches; lower alpha favors conceptual similarity. - Return top matches — with individual keyword and vector scores when hybrid ranking is active
Ranking modes
| Mode | When | Scores returned |
|---|---|---|
"hybrid" | Vectorize bindings available and vector results found | score, keyword_score, vector_score |
"keyword" | No Vectorize binding, or no vector results | score only |
Using context in hooks
The recommended pattern is to call /v1/context from a UserPromptSubmit hook — a shell script that fires on every user message and injects relevant context into the agent's conversation.
#!/bin/bash
# Hook: UserPromptSubmit
MESSAGE=$(echo "$CLAUDE_HOOK_INPUT" | jq -r '.message // empty')
RESPONSE=$(curl -s --max-time 3 \
-H "Authorization: Bearer $MEMENTO_API_KEY" \
-H "X-Memento-Workspace: my-agent" \
-H "Content-Type: application/json" \
-d "{\"message\": \"$MESSAGE\", \"include\": [\"memories\", \"skip_list\"]}" \
"$MEMENTO_API_URL/v1/context")
# Parse and format for injection
MATCHES=$(echo "$RESPONSE" | jq -r '.memories.matches[]? | "\(.id) (\(.score)) — \(.content[:80])"')
if [ -n "$MATCHES" ]; then
echo "Relevant memories:"
echo "$MATCHES"
fi Skip list matching
When "skip_list" is included, the endpoint extracts keywords from the message and matches them against active skip entries. Expired entries are auto-purged. Matches appear in skip_matches — the agent should treat these as warnings to avoid the topic.
Automatic distillation
The distill endpoint (POST /v1/distill) extracts memories from conversation transcripts using an LLM. The recommended pattern is to call it from a PreCompact hook — so that before the agent's context is compressed, valuable facts, decisions, and observations are automatically captured.
#!/bin/bash
# Add to your PreCompact hook (runs in background, non-blocking)
(
sleep 5
TRANSCRIPT=$(tail -n +6 "$OUTPUT_FILE")
if [ ${#TRANSCRIPT} -lt 200 ]; then exit 0; fi
curl -s --max-time 30 -X POST \
-H "Authorization: Bearer $MEMENTO_API_KEY" \
-H "X-Memento-Workspace: my-agent" \
-H "Content-Type: application/json" \
-d "{\"transcript\": $(echo "$TRANSCRIPT" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))')}" \
"$MEMENTO_API_URL/v1/distill" >/dev/null 2>&1
) &
The endpoint deduplicates against existing memories, caps extraction at 20 memories per call, and tags everything with source:distill for easy auditing. See API Reference for full details.
What to read next
- Semantic Search — embedding model, vector index, backfilling
- API Reference — all endpoints
- Core Concepts — the full memory model