I built the tech stack behind ChatRAG to handle the increasing number of clients I started getting about a year ago who needed Retrieval Augmented Generation (RAG) powered chatbots.
After a lot of trial and error, I settled on this tech stack for ChatRAG:
Frontend
- Next.js 16 (App Router) Latest React framework with server components and streaming
- React 19 + React Compiler: Automatic memoization, no more useMemo/useCallback hell
- Zustand: Lightweight state management (3kb vs Redux bloat)
- Embed a chat widget version of your RAG chatbot on any web page, apart from creating a ChatGPT or Claude looking web UI
AI / LLM Layer
- Vercel AI SDK 5 – Unified streaming interface for all providers
- OpenRouter – Single API for Claude, GPT-4, DeepSeek, Gemini, etc.
- MCP (Model Context Protocol) – Tool use and function calling across models
RAG Pipeline
- Text chunking → documents split for optimal retrieval
- OpenAI embeddings (1536 dim vectors) – Semantic search representation
- pgvector with HNSW indexes – Fast approximate nearest neighbor search directly in Postgres
Database & Auth
- Supabase (PostgreSQL) – Database, auth, realtime, storage in one
- GitHub & Google OAuth via Supabase – Third party sign in providers managed by Supabase
- Row Level Security – Multi-tenant data isolation at the DB level
Multi-Modal Generation
- Use Fal.ai or Replicate.ai API keys for generating image, video and 3D assets inside of your RAG chatbot
Integrations
- WhatsApp via Baileys – Chat with your RAG from WhatsApp
- Stripe / Polar – Payments and subscriptions
Infra
- Fly.io / Koyeb – Edge deployment for WhatsApp workers
- Vercel – Frontend hosting with edge functions
My special sauce: pgvector HNSW indexes (m=64, ef_construction=200) give you sub-100ms semantic search without leaving Postgres. No Pinecone/Weaviate vendor lock-in.
Single-tenant vs Multi-tenant RAG setups: Why not both?
ChatRAG supports both deployment modes depending on your use case:
Single-tenant
- One knowledge base → many users
- Ideal for celebrity/expert AI clones or brand-specific agents
- e.g., "Tony Robbins AI chatbot" or "Deepak Chopra AI"
- All users interact with the same dataset and the same personality layer
Multi-tenant
- Users have workspace/project isolation — each with its own knowledge base, project-based system prompt and settings
- Perfect for SaaS products or platform builders that want to offer AI chatbots to their customers
- Every customer gets private data and their own RAG
My long term vision is to keep evolving ChatRAG so I can eventually release a fully open-source version for everyone to build with.
carlos_marcial•18m ago
After a lot of trial and error, I settled on this tech stack for ChatRAG:
Frontend
- Next.js 16 (App Router) Latest React framework with server components and streaming
- React 19 + React Compiler: Automatic memoization, no more useMemo/useCallback hell
- Zustand: Lightweight state management (3kb vs Redux bloat)
- Tailwind CSS + Framer Motion: Styling + buttery animations
- Embed a chat widget version of your RAG chatbot on any web page, apart from creating a ChatGPT or Claude looking web UI
AI / LLM Layer
- Vercel AI SDK 5 – Unified streaming interface for all providers
- OpenRouter – Single API for Claude, GPT-4, DeepSeek, Gemini, etc.
- MCP (Model Context Protocol) – Tool use and function calling across models
RAG Pipeline
- Text chunking → documents split for optimal retrieval
- OpenAI embeddings (1536 dim vectors) – Semantic search representation
- pgvector with HNSW indexes – Fast approximate nearest neighbor search directly in Postgres
Database & Auth
- Supabase (PostgreSQL) – Database, auth, realtime, storage in one
- GitHub & Google OAuth via Supabase – Third party sign in providers managed by Supabase
- Row Level Security – Multi-tenant data isolation at the DB level
Multi-Modal Generation
- Use Fal.ai or Replicate.ai API keys for generating image, video and 3D assets inside of your RAG chatbot
Integrations
- WhatsApp via Baileys – Chat with your RAG from WhatsApp
- Stripe / Polar – Payments and subscriptions
Infra
- Fly.io / Koyeb – Edge deployment for WhatsApp workers
- Vercel – Frontend hosting with edge functions
My special sauce: pgvector HNSW indexes (m=64, ef_construction=200) give you sub-100ms semantic search without leaving Postgres. No Pinecone/Weaviate vendor lock-in.
Single-tenant vs Multi-tenant RAG setups: Why not both?
ChatRAG supports both deployment modes depending on your use case:
Single-tenant
- One knowledge base → many users
- Ideal for celebrity/expert AI clones or brand-specific agents
- e.g., "Tony Robbins AI chatbot" or "Deepak Chopra AI"
- All users interact with the same dataset and the same personality layer
Multi-tenant
- Users have workspace/project isolation — each with its own knowledge base, project-based system prompt and settings
- Perfect for SaaS products or platform builders that want to offer AI chatbots to their customers
- Every customer gets private data and their own RAG
My long term vision is to keep evolving ChatRAG so I can eventually release a fully open-source version for everyone to build with.