Blog.

ai chatbot on website (rag)

Cover Image for ai chatbot on website (rag)
SDX VISION
SDX VISION

RAG (Retrieval-Augmented Generation) AI chatbots use your own content to answer questions accurately. This guide will teach you how to build and deploy RAG-based chatbots for your website.

What is RAG?

RAG (Retrieval-Augmented Generation) combines:

  • Retrieval: Finding relevant information from your content
  • Augmentation: Adding context to AI prompts
  • Generation: Creating accurate responses

Benefits:

  • Accurate Answers: Based on your content
  • No Hallucinations: Grounded in real data
  • Up-to-Date: Uses current information
  • Customizable: Trained on your data
  • Cost-Effective: No retraining needed

How RAG Works

Process Flow:

User Question
    ↓
Vector Search (Find relevant content)
    ↓
Retrieve Context
    ↓
Augment AI Prompt with Context
    ↓
Generate Answer
    ↓
Return Response to User

Setting Up RAG Chatbot

Step 1: Prepare Your Content

Content Sources:

  • Website pages
  • Blog posts
  • Documentation
  • FAQs
  • Product information
  • Support articles

Content Preparation:

  • Clean and format
  • Remove HTML
  • Structure properly
  • Organize by topic

Step 2: Choose RAG Platform

Options:

1. Custom Implementation:

  • Full control
  • Use OpenAI/Anthropic APIs
  • Vector database (Pinecone, Weaviate)
  • Custom development

2. Pre-built Solutions:

  • Chatbase
  • CustomGPT
  • ChatBot.com
  • Intercom AI

3. Open Source:

  • LangChain
  • LlamaIndex
  • Haystack

Step 3: Set Up Vector Database

Popular Options:

1. Pinecone:

  • Managed service
  • Easy to use
  • Good performance
  • Pricing: Free tier available

2. Weaviate:

  • Open source
  • Self-hosted or cloud
  • Good features
  • Free option

3. Chroma:

  • Open source
  • Easy setup
  • Good for small projects
  • Free

Setup Example (Pinecone):

import pinecone

# Initialize
pinecone.init(api_key="your-api-key", environment="us-east-1")

# Create index
pinecone.create_index("website-content", dimension=1536)

# Get index
index = pinecone.Index("website-content")

Step 4: Create Embeddings

Embedding Process:

from openai import OpenAI

client = OpenAI(api_key="your-api-key")

def create_embeddings(text_chunks):
    embeddings = []
    for chunk in text_chunks:
        response = client.embeddings.create(
            model="text-embedding-3-small",
            input=chunk
        )
        embeddings.append(response.data[0].embedding)
    return embeddings

Chunking Content:

def chunk_text(text, chunk_size=500, overlap=50):
    chunks = []
    words = text.split()
    
    for i in range(0, len(words), chunk_size - overlap):
        chunk = ' '.join(words[i:i + chunk_size])
        chunks.append(chunk)
    
    return chunks

Step 5: Store in Vector Database

Indexing Content:

def index_content(content_chunks, metadata):
    embeddings = create_embeddings(content_chunks)
    
    vectors = []
    for i, (chunk, embedding, meta) in enumerate(zip(content_chunks, embeddings, metadata)):
        vectors.append({
            'id': f'chunk-{i}',
            'values': embedding,
            'metadata': {
                'text': chunk,
                'source': meta['source'],
                'url': meta['url']
            }
        })
    
    index.upsert(vectors=vectors)

Step 6: Implement Retrieval

Search Function:

def retrieve_relevant_content(query, top_k=3):
    # Create query embedding
    query_embedding = create_embeddings([query])[0]
    
    # Search vector database
    results = index.query(
        vector=query_embedding,
        top_k=top_k,
        include_metadata=True
    )
    
    # Extract relevant content
    context = []
    for match in results.matches:
        context.append(match.metadata['text'])
    
    return '\n\n'.join(context)

Step 7: Generate Responses

RAG Implementation:

def generate_response(user_question):
    # Retrieve relevant context
    context = retrieve_relevant_content(user_question)
    
    # Create prompt with context
    prompt = f"""You are a helpful assistant for our website. 
    Use the following context to answer the question. 
    If the answer is not in the context, say you don't know.
    
    Context:
    {context}
    
    Question: {user_question}
    
    Answer:"""
    
    # Generate response
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7,
        max_tokens=500
    )
    
    return response.choices[0].message.content

Frontend Implementation

React Component:

'use client';

import { useState } from 'react';

export default function RAGChatbot() {
  const [messages, setMessages] = useState([]);
  const [input, setInput] = useState('');
  const [loading, setLoading] = useState(false);
  
  const sendMessage = async () => {
    if (!input.trim()) return;
    
    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setLoading(true);
    
    try {
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({ message: input })
      });
      
      const data = await response.json();
      setMessages(prev => [...prev, { role: 'assistant', content: data.response }]);
    } catch (error) {
      setMessages(prev => [...prev, { role: 'assistant', content: 'Sorry, I encountered an error.' }]);
    } finally {
      setLoading(false);
    }
  };
  
  return (
    <div className="chatbot-container">
      <div className="chat-messages">
        {messages.map((msg, idx) => (
          <div key={idx} className={`message ${msg.role}`}>
            {msg.content}
          </div>
        ))}
        {loading && <div className="loading">Thinking...</div>}
      </div>
      <div className="chat-input">
        <input
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          placeholder="Ask a question..."
        />
        <button onClick={sendMessage}>Send</button>
      </div>
    </div>
  );
}

API Implementation

Next.js API Route:

// app/api/chat/route.js
import { OpenAI } from 'openai';
import { Pinecone } from '@pinecone-database/pinecone';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const pinecone = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
const index = pinecone.index('website-content');

export async function POST(request) {
  const { message } = await request.json();
  
  // Create query embedding
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: message
  });
  
  const queryEmbedding = embeddingResponse.data[0].embedding;
  
  // Search vector database
  const searchResults = await index.query({
    vector: queryEmbedding,
    topK: 3,
    includeMetadata: true
  });
  
  // Build context
  const context = searchResults.matches
    .map(match => match.metadata.text)
    .join('\n\n');
  
  // Generate response
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      {
        role: 'system',
        content: `You are a helpful assistant. Answer questions based on the following context. If the answer is not in the context, say you don't know.
        
        Context: ${context}`
      },
      {
        role: 'user',
        content: message
      }
    ],
    temperature: 0.7,
    max_tokens: 500
  });
  
  return Response.json({ 
    response: completion.choices[0].message.content 
  });
}

Best Practices

1. Content Quality

Guidelines:

  • Accurate information
  • Well-structured
  • Up-to-date
  • Comprehensive coverage
  • Clear and concise

2. Chunking Strategy

Best Practices:

  • Optimal chunk size (300-500 tokens)
  • Overlap between chunks
  • Semantic boundaries
  • Preserve context
  • Test different sizes

3. Retrieval Optimization

Improvements:

  • Use multiple search strategies
  • Re-rank results
  • Filter by relevance
  • Combine sources
  • Test retrieval quality

4. Response Quality

Enhancements:

  • Clear prompts
  • Context limits
  • Source citations
  • Fallback responses
  • Error handling

Advanced Features

1. Source Citations

Implementation:

def generate_response_with_sources(user_question):
    context, sources = retrieve_relevant_content(user_question)
    
    response = generate_response(user_question, context)
    
    # Add sources
    response_with_sources = {
        'answer': response,
        'sources': sources
    }
    
    return response_with_sources

2. Conversation History

Implementation:

def generate_with_history(user_question, conversation_history):
    context = retrieve_relevant_content(user_question)
    
    messages = [
        {"role": "system", "content": f"Context: {context}"},
        *conversation_history,
        {"role": "user", "content": user_question}
    ]
    
    response = client.chat.completions.create(
        model="gpt-4",
        messages=messages
    )
    
    return response.choices[0].message.content

3. Multi-Modal Support

Features:

  • Image understanding
  • Document processing
  • PDF support
  • Video transcripts

Deployment

Hosting Options:

1. Vercel (Next.js):

  • Easy deployment
  • Serverless functions
  • Good performance

2. AWS:

  • Lambda functions
  • Scalable
  • Enterprise-grade

3. Custom Server:

  • Full control
  • Custom setup
  • More maintenance

Monitoring and Optimization

Key Metrics:

Performance:

  • Response time
  • Accuracy rate
  • User satisfaction
  • Error rate

Usage:

  • Number of queries
  • Popular questions
  • Unanswered questions
  • User feedback

Optimization:

Improvements:

  • Update content regularly
  • Refine retrieval
  • Improve prompts
  • Add more context
  • Test and iterate

Common Challenges

Challenge 1: Irrelevant Retrievals

Solutions:

  • Improve embeddings
  • Better chunking
  • Re-ranking results
  • Filter by relevance

Challenge 2: Incomplete Answers

Solutions:

  • Increase context
  • Better retrieval
  • Improve prompts
  • Add more sources

Challenge 3: Hallucinations

Solutions:

  • Better context
  • Source citations
  • Prompt engineering
  • Validation checks

Implementation Checklist

  • [ ] Content prepared
  • [ ] Platform chosen
  • [ ] Vector database set up
  • [ ] Embeddings created
  • [ ] Content indexed
  • [ ] Retrieval implemented
  • [ ] Response generation working
  • [ ] Frontend built
  • [ ] API configured
  • [ ] Testing completed
  • [ ] Deployed
  • [ ] Monitoring set up

Next Steps

  1. Prepare Content: Gather and clean content
  2. Choose Platform: Select RAG solution
  3. Set Up Infrastructure: Vector database and APIs
  4. Build Chatbot: Implement RAG system
  5. Test Thoroughly: Validate functionality
  6. Deploy: Launch on website
  7. Monitor and Optimize: Improve continuously

Thanks for reading the blog. If you want more help, do contact us at https://sdx.vision