Table of Contents
Quick Answer
Use any OpenAI-compatible API (OpenAI, Claude, Assisters) with the openai npm package. Stream responses via Server-Sent Events, store conversation history in Postgres, and add function calling for tool use.
- Streaming feels 5x faster even at the same latency
- Store every message for debugging and fine-tuning
- Rate-limit per user to prevent abuse
What You'll Need
- Next.js 15+ app or any Node backend
- OpenAI-compatible API key (Assisters recommended for self-hosted)
- Postgres or Supabase for history
- Vercel AI SDK or raw openai client
Steps
- Install dependencies. pnpm add openai ai @ai-sdk/openai
- Configure client.
import OpenAI from 'openai';
const ai = new OpenAI({
baseURL: 'https://assisters.dev/api/v1',
apiKey: process.env.ASSISTERS_API_KEY!,
});
- Create streaming endpoint. In app/api/chat/route.ts:
const stream = await ai.chat.completions.create({
model: 'assisters-chat-v1',
messages,
stream: true,
});
return new Response(stream.toReadableStream());
- Build the UI. Use Vercel AI SDK's useChat hook.
- Persist messages. On each exchange, insert into messages table with conversation_id.
- Add function calling. Define tools (search DB, call API). AI decides when to invoke.
- Moderate input and output. Call /moderate endpoint before responding.
- Rate limit. @upstash/ratelimit or self-hosted Redis: 20 msg/min per user.
Common Mistakes
- Skipping moderation. A single jailbreak screenshot destroys trust.
- Infinite context. Truncate history to last 20 messages + summary of older.
- No retry logic. Network blips kill UX. Use exponential backoff.
- Exposing API key in client. Always proxy through your server.
Top Tools
Tool
Use
Vercel AI SDK
Chat UI primitives
Assisters
OpenAI-compatible gateway
Supabase
History + auth
Langfuse
Observability
Upstash / Redis
Rate limiting
FAQs
Which model should I use? Start with assisters-chat-v1 — cheaper than GPT, comparable quality.
How much does it cost? $5-50/mo for a low-volume chatbot. Scales linearly with usage.
Can I fine-tune? Yes — see our next article on fine-tuning.
Does it work on mobile? Next.js PWA or React Native with EventSource polyfill.
How do I handle long conversations? Summarize the first half every 20 turns.
What about function calling safety? Always confirm destructive actions with the user before executing.
Conclusion
A production chatbot is a weekend project in 2026 with OpenAI-compatible APIs and the Vercel AI SDK. Self-host the model gateway (Assisters) to control costs and data. Try Misar Dev↗ to generate the entire scaffold from a prompt.