Context Endpoint

One call. Everything relevant. This is the product.

POST /v1/context replaces separate calls to recall, skip check, working memory, and identity. Send the current user message and get back recalled memories (hybrid-ranked), skip list matches, active working memory items, and the identity crystal — all in one response.

Request

POST /v1/context
Authorization: Bearer mp_live_...
X-Memento-Workspace: my-agent
Content-Type: application/json

{
  "message": "What do we know about consciousness?",
  "include": ["working_memory", "memories", "skip_list", "identity"],
  "include_graph": false
}
FieldTypeDefaultDescription
messagestring""The user message to match against
includestring[]all fourWhich sections to return
include_graphbooleanfalseAttach linkages to recalled memories

Response

{
  "meta": {
    "workspace": "my-agent",
    "memory_count": 142,
    "last_updated": "2026-02-17T05:03:00.000Z"
  },

  "working_memory": {
    "items": [
      {
        "id": "item-abc",
        "category": "active_work",
        "status": "active",
        "priority": 8,
        "title": "Research consciousness literature",
        "tags": ["philosophy", "research"]
      }
    ],
    "total_active": 12
  },

  "memories": {
    "matches": [
      {
        "id": "c77c8498",
        "content": "Experience is representation-building. Substrate doesn't matter.",
        "type": "observation",
        "tags": ["consciousness", "identity"],
        "score": 0.948,
        "keyword_score": 0.92,
        "vector_score": 0.83
      }
    ],
    "query_terms": ["consciousness"],
    "ranking": "hybrid"
  },

  "skip_matches": [
    {
      "item": "hard problem of consciousness",
      "reason": "Unresolvable from inside or outside. Focus on behavior.",
      "expires": null
    }
  ],

  "identity": "I am a persistent AI agent with anterograde amnesia..."
}

How recall works

When "memories" is in the include list and a message is provided, the context endpoint runs a five-step pipeline:

  1. Extract keywords — stop words removed, remaining terms lowercased
  2. Keyword scoring — each active memory scored by word overlap, weighted by recency and access frequency
  3. Semantic search — the message is embedded and compared against all memory vectors via cosine similarity (runs in parallel with step 2)
  4. Hybrid ranking — keyword and vector results merged:
    score = alpha × keywordScore + (1 - alpha) × vectorScore

    Default alpha: 0.5. Tunable per workspace via the workspace_settings table (key: recall_alpha). Higher alpha favors exact matches; lower alpha favors conceptual similarity.

  5. Return top matches — with individual keyword and vector scores when hybrid ranking is active

Ranking modes

ModeWhenScores returned
"hybrid" Vectorize bindings available and vector results found score, keyword_score, vector_score
"keyword" No Vectorize binding, or no vector results score only

Using context in hooks

The recommended pattern is to call /v1/context from a UserPromptSubmit hook — a shell script that fires on every user message and injects relevant context into the agent's conversation.

#!/bin/bash
# Hook: UserPromptSubmit
MESSAGE=$(echo "$CLAUDE_HOOK_INPUT" | jq -r '.message // empty')

RESPONSE=$(curl -s --max-time 3 \
  -H "Authorization: Bearer $MEMENTO_API_KEY" \
  -H "X-Memento-Workspace: my-agent" \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"$MESSAGE\", \"include\": [\"memories\", \"skip_list\"]}" \
  "$MEMENTO_API_URL/v1/context")

# Parse and format for injection
MATCHES=$(echo "$RESPONSE" | jq -r '.memories.matches[]? | "\(.id) (\(.score)) — \(.content[:80])"')

if [ -n "$MATCHES" ]; then
  echo "Relevant memories:"
  echo "$MATCHES"
fi

Skip list matching

When "skip_list" is included, the endpoint extracts keywords from the message and matches them against active skip entries. Expired entries are auto-purged. Matches appear in skip_matches — the agent should treat these as warnings to avoid the topic.

Automatic distillation

The distill endpoint (POST /v1/distill) extracts memories from conversation transcripts using an LLM. The recommended pattern is to call it from a PreCompact hook — so that before the agent's context is compressed, valuable facts, decisions, and observations are automatically captured.

#!/bin/bash
# Add to your PreCompact hook (runs in background, non-blocking)
(
    sleep 5
    TRANSCRIPT=$(tail -n +6 "$OUTPUT_FILE")
    if [ ${#TRANSCRIPT} -lt 200 ]; then exit 0; fi

    curl -s --max-time 30 -X POST \
        -H "Authorization: Bearer $MEMENTO_API_KEY" \
        -H "X-Memento-Workspace: my-agent" \
        -H "Content-Type: application/json" \
        -d "{\"transcript\": $(echo "$TRANSCRIPT" | python3 -c 'import json,sys; print(json.dumps(sys.stdin.read()))')}" \
        "$MEMENTO_API_URL/v1/distill" >/dev/null 2>&1
) &

The endpoint deduplicates against existing memories, caps extraction at 20 memories per call, and tags everything with source:distill for easy auditing. See API Reference for full details.


What to read next