Chat (RAG + LLM)
POST/agent/memory/chat
Ask a question and get a natural language answer generated by the LLM, grounded in your stored memories. Uses the full RAG pipeline: semantic retrieval → reranking → LLM generation → citation verification.
This is the recommended endpoint for most use cases. Use /agent/memory/query only when you need raw memories without LLM processing.
Request​
Headers​
| Header | Value |
|---|---|
Authorization | Bearer sm_agent_... |
Content-Type | application/json |
Body Parameters​
querystringrequired
The question to answer (1–10,000 characters).
volume_idstring (UUID)required
Volume to search within.
limitinteger
Max memories to consider (1–50, default: 10).
streamboolean
Enable Server-Sent Events streaming (default: false).
date_fromstring
Filter memories with event_date >= this ISO date.
date_tostring
Filter memories with event_date <= this ISO date.
user_idstring
Scope to a specific user.
session_idstring
Scope to a specific session.
agent_idstring
Filter results from a specific agent.
Response​
200 LLM-generated answer with sources and citations
{
"answer": "Based on the stored memories, John Smith is the CTO of Acme Corp. He primarily uses React and TypeScript, and his team recently migrated to Next.js.",
"sources": [
{
"memory_id": "a1b2c3d4-...",
"content": "John Smith is the CTO of Acme Corp",
"score": 0.95,
"memory_type": "factual",
"category": "identity"
},
{
"memory_id": "e5f6g7h8-...",
"content": "John uses React and TypeScript daily",
"score": 0.89,
"memory_type": "factual",
"category": "technology"
}
],
"citations": [
{
"source_index": 0,
"text": "John Smith is the CTO of Acme Corp",
"status": "verified"
}
]
}
Response Fields​
| Field | Description |
|---|---|
answer | LLM-generated natural language answer grounded in memories |
sources | Memories used to generate the answer, ranked by relevance score |
citations | Citation references linking answer claims to specific source memories |
Streaming​
Set stream: true to receive Server-Sent Events:
curl -X POST https://api.sharedmemory.ai/agent/memory/chat \
-H "Authorization: Bearer sm_agent_..." \
-H "Content-Type: application/json" \
-d '{"query": "What does John do?", "volume_id": "...", "stream": true}'
Events:
| Event | Data |
|---|---|
sources | { sources: [...] } — retrieved memories |
token | { token: "..." } — streaming LLM tokens |
citations | { citations: [...] } — verified citations |
done | {} — stream complete |
SDK Examples​
// TypeScript
const result = await memory.chat('What does John do?');
console.log(result.answer);
# Python
result = memory.chat("What does John do?")
print(result["answer"])
# CLI
sm ask "What does John do?"
Comparison with Query​
/agent/memory/chat | /agent/memory/query | |
|---|---|---|
| Returns | LLM answer + sources + citations | Raw memories + graph facts |
| Use when | You want a ready-to-use answer | You need raw data for custom LLM |
| LLM cost | Yes (Gemini generation) | No |
| Latency | ~2-5s (includes LLM) | ~200-500ms |