Table of Contents
Quick Answer
RAG lets LLMs answer questions using your documents. Embed chunks, store in pgvector or Qdrant, retrieve top-k with reranking, then pass to the LLM as context. Always cite sources in the response.
- Chunk size of 500-1000 tokens works for most cases
- Reranking (Cohere, BGE) improves quality by 20-40%
- Always display citations — hallucinations kill trust
What You'll Need
- Document corpus (PDFs, markdown, web pages)
- Embedding model (text-embedding-3-small, bge-m3, or assisters-embed)
- Vector DB: pgvector, Qdrant, Weaviate, or Chroma
- LLM via OpenAI-compatible API
Steps
- Ingest and chunk. Use unstructured or langchain for PDFs. Chunk at 800 tokens with 100 overlap.
- Embed. Batch embed chunks:
const { data } = await ai.embeddings.create({
model: 'assisters-embed-v1',
input: chunks,
});
- Store in pgvector. INSERT INTO documents (content, embedding) VALUES (...)
- Create index. CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
- Query pipeline. Embed user question, vector search top 20, rerank to top 5.
- Rerank. Use Cohere Rerank or BGE reranker:
const { results } = await ai.rerank.create({
query,
documents: candidates,
top_n: 5,
});
- Prompt the LLM. System: Answer using only the provided context. Cite sources with [n].
- Return with citations. Link back to original documents.
Common Mistakes
- Bad chunking. Splitting mid-sentence destroys meaning. Use semantic chunking.
- No reranking. First-pass vector search is noisy.
- Losing metadata. Always keep doc_id, title, url.
- Ignoring recency. Add time decay for news/social corpora.
Top Tools
Tool
Purpose
pgvector
SQL + vectors in one DB
Qdrant
Dedicated vector DB
LangChain / LlamaIndex
Orchestration
Cohere Rerank
Reranking API
Unstructured
Document parsing
FAQs
Should I use pgvector or Qdrant? pgvector for < 10M docs and existing Postgres. Qdrant beyond.
Which embedding model is best? text-embedding-3-large or bge-m3 for multilingual.
How do I evaluate RAG quality? Use Ragas framework: faithfulness, answer relevancy, context precision.
Does RAG eliminate hallucinations? Reduces but doesn't eliminate. Citations + confidence scoring help.
Can I RAG over images? Yes — use CLIP embeddings for images, combine with text RAG.
How do I update the index? Incremental upserts. Delete old versions by doc_id.
Conclusion
RAG is the dominant pattern for domain-specific AI in 2026. Start with pgvector + Assisters, add reranking, always cite. Misar Dev↗ builds full RAG stacks in minutes.