Hybrid RAG Advanced Walkthrough

User Guide (Latest)

Use this documentation if you are already familiar with NVIDIA AI Workbench to get an introduction to example projects available from NVIDIA. For the full list of quickstarts, see Quickstart Guides.

In this quickstart, you work with an NVIDIA-created example project for hybrid Retrieval-Augmented Generation (RAG). You learn how AI Workbench and NVIDIA’s catalog of example projects can help you get started developing RAG applications on your own choice of hardware.


This project is named “hybrid” because you can run inference locally on a Hugging Face TGI server, in the cloud using NVIDIA inference endpoints, or by using microservices like NVIDIA Inference Microservices (NIMs).

In this quickstart, you perform the following tasks:

  1. Clone the example project

  2. Enter your NVCF run key

  3. Start the Gradio chat app

  4. Upload documents

Before you can complete the steps in this quickstart, you need the following:

  • NVIDIA AI Workbench is installed on your local computer. For details, see Install AI Workbench.

  • You have an NVIDIA NGC account.

  • You need an NVCF run key to access NVIDIA endpoints. - Create a run key here. - Click Generate API Key and login with your NGC credentials if prompted.

  1. Navigate to this Github Repository managed by NVIDIA and fork the project to your own Github account.

  2. Open the AI Workbench desktop application and select the location where you want to work.

  3. Click Clone Project near the top right. The Clone Project window appears.

  4. In the Clone Project window, for Repository URL enter the URL of your forked repo. For Path, accept the default. Then click Clone.

  5. The repo clones and AI Workbench builds the container, which can take several minutes. While your project builds, you can do the following:

    1. You can track the build progress in the status bar of the AI Workbench window.

    2. You can see the logs for the build by clicking Building or Build Ready in the status bar.

    Wait until you see Build Ready in the status bar, and then proceed to the next section.

  1. Select Environment > Secrets > NVCF_RUN_KEY > Configure, and then enter your credentials. This allows you to access NVIDIA’s cloud endpoints for this quickstart.

  1. Click Open Chat. The Gradio chat app opens in a browser.

  2. In the Gradio chat app, select Set Up RAG Backend. This triggers a one-time build. After the build finishes, you are redirected to the settings panel.

  3. Select the Cloud option on the right-hand settings panel.

  4. Select a Model Family and Model.

  5. Submit a query.

You are now able to generate inference responses with the out-of-the-box cloud endpoints.

  1. To perform RAG, select Upload Documents Here tab from the right-hand panel of the chat UI.


    You may see a warning that the vector database is not ready yet. If so, wait a moment for it to finish warming up and try again.

  2. After the database starts, click the file field to select files to upload or drag and drop your documents.

  3. After the files are uploaded, the Toggle to Use Vector Database next to the text input box will turn on by default.

  4. You can now query your documents. Toggling the Use Vector Database toggle back off reverts the model back to basic, out-of-the-box inference.

  5. To change the endpoint, navigate back to the Inference Settings tab. Select a different model from the dropdown and continue querying.

  6. To clear out the database (irreversible!), select the Upload Documents Here tab on the right-hand panel and then Clear Database.

  7. To make edits to the gradio app or the backend logic, switch to the AI Workbench project window, select the dropdown from the top right and select Jupyterlab. You are now able to edit the source code.

    You can commit and push changes to your forked project repository on Github from the AI Workbench window.

Previous Customize Your Environment Quickstart (CLI)
Next NVIDIA AI Workbench Example Projects
© Copyright © 2024, NVIDIA Corporation. Last updated on Jun 10, 2024.