Skip to main content

Chat (RAG + LLM)

POST/agent/memory/chat

Ask a question and get a natural language answer generated by the LLM, grounded in your stored memories. Uses the full RAG pipeline: semantic retrieval → reranking → LLM generation → citation verification.

This is the recommended endpoint for most use cases. Use /agent/memory/query only when you need raw memories without LLM processing.

Request​

Headers​

HeaderValue
AuthorizationBearer sm_agent_...
Content-Typeapplication/json

Body Parameters​

querystringrequired
The question to answer (1–10,000 characters).
volume_idstring (UUID)required
Volume to search within.
limitinteger
Max memories to consider (1–50, default: 10).
streamboolean
Enable Server-Sent Events streaming (default: false).
date_fromstring
Filter memories with event_date >= this ISO date.
date_tostring
Filter memories with event_date <= this ISO date.
user_idstring
Scope to a specific user.
session_idstring
Scope to a specific session.
agent_idstring
Filter results from a specific agent.

Response​

200 LLM-generated answer with sources and citations

{
"answer": "Based on the stored memories, John Smith is the CTO of Acme Corp. He primarily uses React and TypeScript, and his team recently migrated to Next.js.",
"sources": [
{
"memory_id": "a1b2c3d4-...",
"content": "John Smith is the CTO of Acme Corp",
"score": 0.95,
"memory_type": "factual",
"category": "identity"
},
{
"memory_id": "e5f6g7h8-...",
"content": "John uses React and TypeScript daily",
"score": 0.89,
"memory_type": "factual",
"category": "technology"
}
],
"citations": [
{
"source_index": 0,
"text": "John Smith is the CTO of Acme Corp",
"status": "verified"
}
]
}

Response Fields​

FieldDescription
answerLLM-generated natural language answer grounded in memories
sourcesMemories used to generate the answer, ranked by relevance score
citationsCitation references linking answer claims to specific source memories

Streaming​

Set stream: true to receive Server-Sent Events:

curl -X POST https://api.sharedmemory.ai/agent/memory/chat \
-H "Authorization: Bearer sm_agent_..." \
-H "Content-Type: application/json" \
-d '{"query": "What does John do?", "volume_id": "...", "stream": true}'

Events:

EventData
sources{ sources: [...] } — retrieved memories
token{ token: "..." } — streaming LLM tokens
citations{ citations: [...] } — verified citations
done{} — stream complete

SDK Examples​

// TypeScript
const result = await memory.chat('What does John do?');
console.log(result.answer);
# Python
result = memory.chat("What does John do?")
print(result["answer"])
# CLI
sm ask "What does John do?"

Comparison with Query​

/agent/memory/chat/agent/memory/query
ReturnsLLM answer + sources + citationsRaw memories + graph facts
Use whenYou want a ready-to-use answerYou need raw data for custom LLM
LLM costYes (Gemini generation)No
Latency~2-5s (includes LLM)~200-500ms