Chat (RAG + LLM)

POST/agent/memory/chat

Ask a question and get a natural language answer generated by the LLM, grounded in your stored memories. Uses the full RAG pipeline: semantic retrieval → reranking → LLM generation → citation verification.

This is the recommended endpoint for most use cases. Use /agent/memory/query only when you need raw memories without LLM processing.

Request

Headers

Header	Value
`Authorization`	`Bearer sm_agent_...`
`Content-Type`	`application/json`

Body Parameters

querystringrequired

The question to answer (1–10,000 characters).

volume_idstring (UUID)required

Volume to search within.

limitinteger

Max memories to consider (1–50, default: 10).

streamboolean

Enable Server-Sent Events streaming (default: false).

date_fromstring

Filter memories with event_date >= this ISO date.

date_tostring

Filter memories with event_date <= this ISO date.

user_idstring

Scope to a specific user.

session_idstring

Scope to a specific session.

agent_idstring

Filter results from a specific agent.

Response

200 LLM-generated answer with sources and citations

{
  "answer": "Based on the stored memories, John Smith is the CTO of Acme Corp. He primarily uses React and TypeScript, and his team recently migrated to Next.js.",
  "sources": [
    {
      "memory_id": "a1b2c3d4-...",
      "content": "John Smith is the CTO of Acme Corp",
      "score": 0.95,
      "memory_type": "factual",
      "category": "identity"
    },
    {
      "memory_id": "e5f6g7h8-...",
      "content": "John uses React and TypeScript daily",
      "score": 0.89,
      "memory_type": "factual",
      "category": "technology"
    }
  ],
  "citations": [
    {
      "source_index": 0,
      "text": "John Smith is the CTO of Acme Corp",
      "status": "verified"
    }
  ]
}

Response Fields

Field	Description
`answer`	LLM-generated natural language answer grounded in memories
`sources`	Memories used to generate the answer, ranked by relevance score
`citations`	Citation references linking answer claims to specific source memories

Streaming

Set stream: true to receive Server-Sent Events:

curl -X POST https://api.sharedmemory.ai/agent/memory/chat \
  -H "Authorization: Bearer sm_agent_..." \
  -H "Content-Type: application/json" \
  -d '{"query": "What does John do?", "volume_id": "...", "stream": true}'

Events:

Event	Data
`sources`	`{ sources: [...] }` — retrieved memories
`token`	`{ token: "..." }` — streaming LLM tokens
`citations`	`{ citations: [...] }` — verified citations
`done`	`{}` — stream complete

SDK Examples

// TypeScript
const result = await memory.chat('What does John do?');
console.log(result.answer);

# Python
result = memory.chat("What does John do?")
print(result["answer"])

# CLI
sm ask "What does John do?"

Comparison with Query

	`/agent/memory/chat`	`/agent/memory/query`
Returns	LLM answer + sources + citations	Raw memories + graph facts
Use when	You want a ready-to-use answer	You need raw data for custom LLM
LLM cost	Yes (Gemini generation)	No
Latency	~2-5s (includes LLM)	~200-500ms

Request​

Headers​

Body Parameters​

Response​

Response Fields​

Streaming​

SDK Examples​

Comparison with Query​