Creating a Chatbot using Precision and NVIDIA AI Workbench

Revolutionize business information retrieval with a custom RAG chatbot powered by NVIDIA AI Workbench. Experience innovative AI-driven solutions for seamless data access.

Build Your Own Chatbot: Easy AI Information Retrieval

We’ve all experienced the frustration of searching for important information across multiple company systems. Whether in sales, HR, or support, finding the right data can be time-consuming and difficult. Now, imagine a world where you can simply ask a question and get the answer in seconds. That’s what a Retrieval-Augmented Generation (RAG) chatbot can do—instantly pulling the most relevant information from your company’s documents.

And the best part? With tools like NVIDIA AI Workbench, you can build a RAG chatbot on your personal PC—no massive infrastructure needed. In this article, we’ll walk through the process of setting up your own RAG chatbot, using an AI Workbench example project to show how AI can simplify information retrieval, and how you can scale it for business use.

Why Build a RAG Chatbot?

A RAG chatbot combines natural language generation with the ability to search through your internal data. Unlike traditional chatbots that rely solely on pre-trained models, RAG retrieves real data before generating its response, meaning the answers are accurate and contextually relevant.

This technology is well-suited for a variety of business applications, such as:

  • HR departments answering policy questions quickly.
  • Customer service teams instantly retrieving product details or FAQs.
  • Sales teams accessing real-time data to improve response times during negotiations.

By integrating company-specific data with the chatbot, your business can provide personalized, context-aware answers, saving time, reducing manual searches and improving the efficiency of internal communications. Learn more about building a hybrid RAG chatbot while maintaining data privacy with NVIDIA AI Workbench.

Getting Started: What You’ll Need

  1. NVIDIA AI Workbench: This platform helps you run AI models on any NVIDIA RTX GPU, locally or remote. Download it here.
  2. AI Workbench Hybrid RAG Project: Customize this example project and build your own chatbot. Access it here.
  3. Company Data: You’ll need to upload the internal documents, knowledge bases, or data sources the chatbot will use to retrieve information.
  4. A capable workstation PC with an NVIDIA RTX GPU: Ideally, a Precision workstation powered by an NVIDIA RTX Ada Generation GPU for faster processing.

Step-by-Step Guide to Building Your RAG Chatbot

To kick off your own RAG chatbot locally, you can follow these steps:

    1. Set up your NVIDIA NGC account and get your NVCF API key.
    2. Install NVIDIA AI Workbench and add the API Key Secret.
    3. Run the RAG Client.
    4. Pick a model, pick an inference mode, and add your data!

1. Set up your NVIDIA NGC account and get your NVCF API key

Save the generated key somewhere secure for later steps.

2. Install NVIDIA AI Workbench and Add the API Key Secret

  • Clone the AI Workbench Hybrid RAG project from GitHub:

  • After the project completes building, this modal should pop up. You can input the key we generated earlier here:

  • If the modal does not pop up, you can input the API key by going to Environment→Secrets:

3. Run the RAG Client

  • Now, when you press “Open Chat,” this window to a chat interface should pop up:

4. Pick a model, pick an inference mode, and add your data

  • Select “Local System” as the inference mode. This helps ensure that your data, queries, and computations remain completely private and self-contained on your local system.
  • Then, select a model family.
  • In our case, we used an Ungated Model, the Microsoft/Phi-3-mini-128-instruct with 4-Bit quantization:

Now that you’ve set up your chatbot, you can add data and start making queries. Make sure to test the chatbot by asking real questions based on the data you provided, which you know the exact answer to.

This step can be expanded as your company’s data needs grow. Regularly updating the chatbot with new information ensures it remains relevant and useful.

How Do We Scale This? Dell’s Validated Designs

Scaling an AI solution like a RAG chatbot can feel daunting, especially as your business grows and your chatbot needs to handle more queries, data, and complex tasks. Dell DVDs are designed to simplify this process by providing a roadmap for scalability, performance optimization and security. Dell developed this free design guide so that you are set up to succeed in creating a secure, performant and scalable AI solution.

Here are some of the basic principles you will learn by reading the guide:

Modular and Scalable Architecture

When you’re starting out with your RAG chatbot, it may handle only a few queries at a time. But as usage increases, so will the demands on your infrastructure. Dell’s validated architecture lays out a modular approach that allows your system to grow without needing major reconfigurations.

    • Start small, scale as needed: Initially, deploy your chatbot on a personal PC or small server. As the number of users and queries grows, you can expand the system incrementally by adding resources.
    • Kubernetes for dynamic scaling: Use Kubernetes so your chatbot infrastructure can automatically scale to accommodate increased demand. Resources are then allocated efficiently as your system grows.

On-Premises Data Security

As your RAG chatbot grows, so does the importance of securing your private data. Dell’s architecture emphasizes on-premises deployment for businesses that need to keep sensitive data in-house, away from cloud-based systems.

    • Run your chatbot on local hardware: Dell’s architecture supports on-premises deployment, meaning you can grow your system on Dell PowerEdge servers or other local infrastructure, keeping your data protected.
    • Faster response times: By keeping your data and processing local, you can expect faster responses as the system grows.

Performance Optimization with NVIDIA RTX Professional GPUs

Dell recommends leveraging NVIDIA RTX GPUs so that your chatbot scales efficiently while maintaining high performance.

    • Incorporate NVIDIA RTX GPUs: Scaling with NVIDIA RTX GPUs ensures your chatbot can handle more data-intensive queries without suffering from slowdowns or latency issues. Depending on the models selected and you want to run it locally, you will want a 12GB or higher NVIDIA RTX GPU. Although, you don’t need to run it locally you can also run Workbench with NIMs or NeMO.
    • Optimize for heavier workloads: When scaling up your workload, you also want to consider the hardware supporting it. Dell has options that can be scaled up to your workload:
      • Tower Servers: Good for small to medium businesses needing a cost-effective, easy-to-manage solution. Perfect for starting small without the need for a full data center.
      • Rack Servers: Better for larger-scale operations with existing IT infrastructure (e.g. a server room).
      • AI Servers: Heavier duty— designed for intensive AI workloads, like deep learning and large-scale data processing.
      • Edge Servers: Great for environments where data needs to be processed in real-time at remote locations. Useful for low-latency, distributed systems like IoT.

Why RAG and Scaling Matter for Your Business

A RAG chatbot simplifies how your business accesses critical information. Whether in HR, sales, or customer service, a RAG chatbot ensures that the right data is always at your fingertips, instantly pulling relevant information from your internal systems. This reduces time spent searching for answers, improves decision-making, and enhances overall productivity.

However, building the chatbot is just the start. As your business grows, your chatbot needs to scale alongside it. That’s where Dell’s validated AI design principles come in. Dell offers a proven framework for expanding your chatbot efficiently and securely, with modular architecture that allows you to grow seamlessly, on-premises deployment to protect sensitive data, and NVIDIA RTX GPUs to maintain high performance even under heavier workloads.

By implementing these scalable strategies, your RAG chatbot will evolve from a simple information retrieval tool into a powerful AI system that grows with your company—delivering fast, accurate insights every step of the way.

About the Author: Logan Lawler

Logan has worked in various roles at Dell for 16 years, including sales, marketing, merchandising, services, and e-commerce. Before joining Dell, Logan grew up in Missouri and graduated from the University of Missouri (MIZ!). Logan lives in Round Rock with his wife Ally, daughter Calloway, and labradoodle Truman.