RAG Stack

Chat grounded in your own documents — retrieval-augmented generation with citations, not hallucinations.

What it is

RAG = find the relevant chunks of your data first, then ask the model to answer using only those. It’s how you get AI answers you can trust, with a source to point at.

How it fits together

Question


Embed ──► Qdrant  (find the top-k relevant chunks)


Claude API  (answer using ONLY those chunks) ──► answer + citation

Documents are chunked and embedded into Qdrant ahead of time.

The pieces

  • Qdrant — vector search over your chunks.
  • Claude API — the answer, constrained to retrieved context.
  • Supabase — store documents, users, and chat history.
  • LangGraph (optional) — when retrieval becomes multi-step or agentic.

When to use it

Any time the AI must answer from your content, not its training data.

Build it

Start from the chatbot workflow, then add the retrieval step before the model call.