Table of Contents
Quick Answer
Combine a web crawler (or SerpAPI), embedding model, vector DB (pgvector), and streaming LLM for RAG-based search. Stack: Next.js 15 for frontend, Supabase (self-hosted) for pgvector, assisters.dev-compatible API for inference.
- Time to MVP: 1-2 weeks
- Cost: $30-100/mo (API + VPS)
- Outcome: Cited, streaming answers from web or your docs
What You'll Need
- Next.js 15, TypeScript
- Supabase with pgvector extension
- SerpAPI or self-hosted SearXNG for web results
- Embedding API (OpenAI-compatible)
- Streaming LLM (assisters.dev-compatible)
Steps
- Design the pipeline. Query → web search → fetch top N pages → chunk text → embed → retrieve top chunks → LLM answer with citations → stream to UI.
- Set up pgvector. In Supabase: create extension vector; then create table docs (id uuid primary key, url text, chunk text, embedding vector(1536));.
- Build the search step. Use SerpAPI ($50/mo) or self-host SearXNG on your VPS (free). Fetch top 10 results for query.
- Scrape & chunk. For each URL, fetch HTML, extract main content (Readability.js or Trafilatura), chunk to ~500 tokens with 50-token overlap.
- Embed & store. Call embedding endpoint for each chunk. Upsert to pgvector table. Use a query_id to group chunks.
- Retrieve. Embed user query, then SELECT ... ORDER BY embedding <=> query_embedding LIMIT 8 to get top chunks.
- Stream LLM answer. Prompt: "Answer using ONLY these sources. Cite as [1], [2]. Refuse if sources don't cover it." Use streaming to reduce perceived latency.
- Render with citations. Frontend streams token-by-token, rendering [1] as hoverable source link.
Common Mistakes
- Hallucinated citations: Enforce "refuse if uncovered" prompt + show raw sources.
- Slow crawl step: Parallel fetches, 5s timeout per URL, skip PDFs on first pass.
- Huge chunks: 500 tokens max. Bigger chunks dilute relevance.
- Stale cache: Add TTL (7 days) + "recent results" flag for time-sensitive queries.
- No abuse protection: Rate limit per IP; searches cost real money.
Top Tools
Tool
Best For
Price
Supabase + pgvector
Vector DB
Free tier
SerpAPI
Google results
$50+/mo
SearXNG
Self-hosted search
Free
Trafilatura
Content extraction
Free
Next.js
Streaming UI
Free
FAQs
Q: Do I need a separate vector DB like Pinecone?
No — pgvector in self-hosted Supabase handles millions of vectors fine.
Q: Which embedding model?
OpenAI-compatible text-embedding-3-small via assisters.dev. 1536 dimensions.
Q: How do I handle follow-up questions?
Keep session context; re-embed with conversation history as query.
Q: Can I search private docs instead of web?
Yes — replace web crawl with doc upload + embed pipeline. That's RAG-over-docs.
Q: How fast should results be?
First token in <2s. Full answer in <8s. Cache common queries.
Q: Is this better than Google?
For synthesis, yes. For navigational queries, no. Position it as "research assistant."
Conclusion
AI search is the defining product category of the decade. Build a vertical search engine (legal docs, research papers, your company wiki) and you have a moat. Learn semantic search patterns before scaling.