ai chatbot on website (rag)

SDX VISION

Cover Image for ai chatbot on website (rag)

SDX VISION

December 15, 2025

RAG (Retrieval-Augmented Generation) AI chatbots use your own content to answer questions accurately. This guide will teach you how to build and deploy RAG-based chatbots for your website.

What is RAG?

RAG (Retrieval-Augmented Generation) combines:

Retrieval: Finding relevant information from your content
Augmentation: Adding context to AI prompts
Generation: Creating accurate responses

Benefits:

Accurate Answers: Based on your content
No Hallucinations: Grounded in real data
Up-to-Date: Uses current information
Customizable: Trained on your data
Cost-Effective: No retraining needed

How RAG Works

Process Flow:

User Question
    ↓
Vector Search (Find relevant content)
    ↓
Retrieve Context
    ↓
Augment AI Prompt with Context
    ↓
Generate Answer
    ↓
Return Response to User

Setting Up RAG Chatbot

Step 1: Prepare Your Content

Content Sources:

Website pages
Blog posts
Documentation
FAQs
Product information
Support articles

Content Preparation:

Clean and format
Remove HTML
Structure properly
Organize by topic

Step 2: Choose RAG Platform

Options:

1. Custom Implementation:

Full control
Use OpenAI/Anthropic APIs
Vector database (Pinecone, Weaviate)
Custom development

2. Pre-built Solutions:

Chatbase
CustomGPT
ChatBot.com
Intercom AI

3. Open Source:

LangChain
LlamaIndex
Haystack

Step 3: Set Up Vector Database

Popular Options:

1. Pinecone:

Managed service
Easy to use
Good performance
Pricing: Free tier available

2. Weaviate:

Open source
Self-hosted or cloud
Good features
Free option

3. Chroma:

Open source
Easy setup
Good for small projects
Free

Setup Example (Pinecone):

import pinecone

# Initialize
pinecone.init(api_key="your-api-key", environment="us-east-1")

# Create index
pinecone.create_index("website-content", dimension=1536)

# Get index
index = pinecone.Index("website-content")

Step 4: Create Embeddings

Embedding Process:

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

def create_embeddings(text_chunks):
    embeddings = []
    for chunk in text_chunks:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=chunk
        )
        embeddings.append(response.data[0].embedding)
    return embeddings

Chunking Content:

def chunk_text(text, chunk_size=500, overlap=50):
    chunks = []
    words = text.split()
    
    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        chunks.append(chunk)
    
    return chunks

Step 5: Store in Vector Database

Indexing Content:

def index_content(content_chunks, metadata):
    embeddings = create_embeddings(content_chunks)
    
    vectors = []
    for i, (chunk, embedding, meta) in enumerate(zip(content_chunks, embeddings, metadata)):
        vectors.append({
            'id': f'chunk-{i}',
            'values': embedding,
            'metadata': {
                'text': chunk,
                'source': meta['source'],
                'url': meta['url']
            }
        })
    
    index.upsert(vectors=vectors)

Step 6: Implement Retrieval

Search Function:

def retrieve_relevant_content(query, top_k=3):
    # Create query embedding
    query_embedding = create_embeddings([query])[0]
    
    # Search vector database
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    # Extract relevant content
    context = []
    for match in results.matches:
        context.append(match.metadata['text'])
    
    return '\n\n'.join(context)

Step 7: Generate Responses

RAG Implementation:

def generate_response(user_question):
    # Retrieve relevant context
    context = retrieve_relevant_content(user_question)
    
    # Create prompt with context
    prompt = f"""You are a helpful assistant for our website. 
    Use the following context to answer the question. 
    If the answer is not in the context, say you don't know.
    
    Context:
    {context}
    
    Question: {user_question}
    
    Answer:"""
    
    # Generate response
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=500
    )
    
    return response.choices[0].message.content

Frontend Implementation

React Component:

'use client';

import { useState } from 'react';

export default function RAGChatbot() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [loading, setLoading] = useState(false);
  
  const sendMessage = async () => {
    if (!input.trim()) return;
    
    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setLoading(true);
    
    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: input })
      });
      
      const data = await response.json();
      setMessages(prev => [...prev, { role: 'assistant', content: data.response }]);
    } catch (error) {
      setMessages(prev => [...prev, { role: 'assistant', content: 'Sorry, I encountered an error.' }]);
    } finally {
      setLoading(false);
    }
  };
  
  return (
    <div className="chatbot-container">
      <div className="chat-messages">
        {messages.map((msg, idx) => (
          <div key={idx} className={`message ${msg.role}`}>
            {msg.content}
          </div>
        ))}
        {loading && <div className="loading">Thinking...</div>}
      </div>
      <div className="chat-input">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          placeholder="Ask a question..."
        />
        <button onClick={sendMessage}>Send</button>
      </div>
    </div>
  );
}

API Implementation

Next.js API Route:

// app/api/chat/route.js
import { OpenAI } from 'openai';
import { Pinecone } from '@pinecone-database/pinecone';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index('website-content');

export async function POST(request) {
  const { message } = await request.json();
  
  // Create query embedding
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: message
  });
  
  const queryEmbedding = embeddingResponse.data[0].embedding;
  
  // Search vector database
  const searchResults = await index.query({
    vector: queryEmbedding,
    topK: 3,
    includeMetadata: true
  });
  
  // Build context
  const context = searchResults.matches
    .map(match => match.metadata.text)
    .join('\n\n');
  
  // Generate response
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant. Answer questions based on the following context. If the answer is not in the context, say you don't know.
        
        Context: ${context}`
      },
      {
        role: 'user',
        content: message
      }
    ],
    temperature: 0.7,
    max_tokens: 500
  });
  
  return Response.json({ 
    response: completion.choices[0].message.content 
  });
}

Best Practices

1. Content Quality

Guidelines:

Accurate information
Well-structured
Up-to-date
Comprehensive coverage
Clear and concise

2. Chunking Strategy

Best Practices:

Optimal chunk size (300-500 tokens)
Overlap between chunks
Semantic boundaries
Preserve context
Test different sizes

3. Retrieval Optimization

Improvements:

Use multiple search strategies
Re-rank results
Filter by relevance
Combine sources
Test retrieval quality

4. Response Quality

Enhancements:

Clear prompts
Context limits
Source citations
Fallback responses
Error handling

Advanced Features

1. Source Citations

Implementation:

def generate_response_with_sources(user_question):
    context, sources = retrieve_relevant_content(user_question)
    
    response = generate_response(user_question, context)
    
    # Add sources
    response_with_sources = {
        'answer': response,
        'sources': sources
    }
    
    return response_with_sources

2. Conversation History

Implementation:

def generate_with_history(user_question, conversation_history):
    context = retrieve_relevant_content(user_question)
    
    messages = [
        {"role": "system", "content": f"Context: {context}"},
        *conversation_history,
        {"role": "user", "content": user_question}
    ]
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=messages
    )
    
    return response.choices[0].message.content

3. Multi-Modal Support

Features:

Image understanding
Document processing
PDF support
Video transcripts

Deployment

Hosting Options:

1. Vercel (Next.js):

Easy deployment
Serverless functions
Good performance

2. AWS:

Lambda functions
Scalable
Enterprise-grade

3. Custom Server:

Full control
Custom setup
More maintenance

Monitoring and Optimization

Key Metrics:

Performance:

Response time
Accuracy rate
User satisfaction
Error rate

Usage:

Number of queries
Popular questions
Unanswered questions
User feedback

Optimization:

Improvements:

Update content regularly
Refine retrieval
Improve prompts
Add more context
Test and iterate

Common Challenges

Challenge 1: Irrelevant Retrievals

Solutions:

Improve embeddings
Better chunking
Re-ranking results
Filter by relevance

Challenge 2: Incomplete Answers

Solutions:

Increase context
Better retrieval
Improve prompts
Add more sources

Challenge 3: Hallucinations

Solutions:

Better context
Source citations
Prompt engineering
Validation checks

Implementation Checklist

[ ] Content prepared
[ ] Platform chosen
[ ] Vector database set up
[ ] Embeddings created
[ ] Content indexed
[ ] Retrieval implemented
[ ] Response generation working
[ ] Frontend built
[ ] API configured
[ ] Testing completed
[ ] Deployed
[ ] Monitoring set up

Next Steps

Prepare Content: Gather and clean content
Choose Platform: Select RAG solution
Set Up Infrastructure: Vector database and APIs
Build Chatbot: Implement RAG system
Test Thoroughly: Validate functionality
Deploy: Launch on website
Monitor and Optimize: Improve continuously

Thanks for reading the blog. If you want more help, do contact us at https://sdx.vision

Blog.

ai chatbot on website (rag)

What is RAG?

Benefits:

How RAG Works

Process Flow:

Setting Up RAG Chatbot

Step 1: Prepare Your Content

Step 2: Choose RAG Platform

Step 3: Set Up Vector Database

Step 4: Create Embeddings

Step 5: Store in Vector Database

Step 6: Implement Retrieval

Step 7: Generate Responses

Frontend Implementation

React Component:

API Implementation

Next.js API Route:

Best Practices

1. Content Quality

2. Chunking Strategy

3. Retrieval Optimization

4. Response Quality

Advanced Features

1. Source Citations

2. Conversation History

3. Multi-Modal Support

Deployment

Hosting Options:

Monitoring and Optimization

Key Metrics:

Optimization:

Common Challenges

Challenge 1: Irrelevant Retrievals

Challenge 2: Incomplete Answers

Challenge 3: Hallucinations

Implementation Checklist

Next Steps