Skip to content
Misar.io

How to Build Semantic Search with AI in 3 Hours (2026 Guide)

All articles
Guide

How to Build Semantic Search with AI in 3 Hours (2026 Guide)

Replace keyword search with semantic search using embeddings, pgvector, and hybrid BM25 + vector scoring — better results in an afternoon.

Misar Team·May 7, 2025·3 min read
How to Build Semantic Search with AI in 3 Hours (2026 Guide)
Photo by Sarah Blocksidge on pexels
Table of Contents

Quick Answer

Embed your content with a modern embedding model, store in pgvector, and query with cosine similarity. For best results, combine with BM25 (keyword) in a hybrid score. Beats traditional keyword search on 80% of query types.

  • Time to implement: 3-5 hours for a basic version
  • Cost: ~$0.02 per 1M input tokens for embeddings
  • Expected recall improvement: 20-60% over keyword-only

What You'll Need

  • Supabase (self-hosted) with pgvector extension
  • Embedding API (OpenAI-compatible)
  • Node.js / Next.js app
  • Content to search (articles, products, docs)

Steps

  1. Install pgvector. In Supabase SQL editor: create extension if not exists vector;.
  2. Add embedding column. alter table articles add column embedding vector(1536);. Dimension matches your embedding model.
  3. Create index. For <100K rows: create index on articles using ivfflat (embedding vector_cosine_ops) with (lists = 100);. For larger: use HNSW.
  4. Backfill embeddings. For each row, concat title + excerpt + body, call embedding API, store vector. Batch 100 rows per API call for speed.
  5. On insert/update trigger. Use Supabase Edge Function or app-level hook to re-embed when content changes.
  6. Query vector search. User types query → embed → SELECT *, 1 - (embedding <=> $1) as similarity FROM articles ORDER BY embedding <=> $1 LIMIT 20.
  7. Add hybrid BM25. Postgres has tsvector for full-text. Combine: final_score = 0.5 * vector_score + 0.5 * bm25_score. Ask AI: "Generate a Postgres function that returns top-K hybrid-ranked results."
  8. Optional: re-rank top 20 with cross-encoder. Cohere Rerank or a self-hosted bge-reranker slashes irrelevant results from top 3.

Common Mistakes

  • Only indexing title: Embed full content for meaningful similarity.
  • Wrong dimension: Mismatch between model and column dimension = error.
  • No BM25 fallback: Pure vector misses exact-match queries like SKUs, names.
  • No re-ranking: Top-20 vector results often have 3-5 off-topic hits. Re-rank fixes this.
  • Not filtering by metadata: Always pre-filter by user/category/language, then vector search.

Top Tools

ToolBest ForPrice
pgvectorPostgres vector storeFree
text-embedding-3-smallCheap & good$0.02/M tokens
bge-m3Self-hosted embedFree
Cohere Rerank-compatRe-ranking$1/1K
tsvector (pg)BM25Free

Conclusion

Semantic search is a one-afternoon upgrade that dramatically improves product experience. Add pgvector, embed your content, layer hybrid scoring, and watch bounce rates drop. No new infrastructure needed.

semantic-searchpgvectorembeddingsragsearch
Enjoyed this article? Share it with others.

More to Read

View all posts
Guide

Safely Train AI Chatbots on Website Content in 2026

Website content is one of the richest sources of information your business has. Every help article, FAQ, service description, and policy page is a direct line to your customers’ most pressing questions—yet most of this d

9 min read
Guide

E-commerce AI Assistants 2026: How to Drive Revenue with AI

E-commerce is no longer just about transactions—it’s about personalized experiences, instant support, and frictionless journeys. Today’s shoppers expect more than just a website; they want a concierge that understands th

10 min read
Guide

5 Must-Have Features for a Healthcare AI Assistant in 2026

Healthcare AI isn’t just about algorithms—it’s about trust. Patients, clinicians, and regulators all need to believe that your AI assistant will do more than talk; it will listen, remember, and act responsibly when it ma

11 min read
Guide

Best AI Chat Widgets for SaaS Conversions in 2026: Boost Leads Now

Website AI chat widgets have become a staple for SaaS companies looking to engage visitors, answer questions, and drive conversions. Yet, most chat widgets still rely on generic, rule-based bots that frustrate users with

11 min read

Explore Misar AI Products

From AI-powered blogging to privacy-first email and developer tools — see how Misar AI can power your next project.

Stay in the loop

Follow our latest insights on AI, development, and product updates.

How to Build Semantic Search with AI in 3 Hours (2026 Guide) | Misar.io