Retrieval Augmented Generation (RAG) Pipeline

(Latest Version)

All environments require additional Conda packages which can be installed with either the conda/environments/all_cuda-121_arch-x86_64.yaml or conda/environments/examples_cuda-121_arch-x86_64.yaml environment files. This example also requires the VDB upload pipeline to have been run previously.

Environment

Supported

Notes

Conda

Morpheus Docker Container Requires launching Milvus on the host
Morpheus Release Container Requires launching Milvus on the host
Dev Container

Purpose

The purpose of this example is to illustrate how a user could build a Retrieval Augmented Generation pipeline integrating informational feeds and an LLM service into a Morpheus pipeline. This example builds on the previous example completionpipeline, by adding the ability to augment LLM queries with context information from a knowledge base. Appending this context helps improve the responses from the LLM by providing additional background contextual and factual information which the LLM can pull from for its response.

Source Documents

  • In order for this pipeline to function correctly, a Vector Database must already have been populated with information that can be retrieved.

  • An example of populating a database is illustrated in VDB upload

  • This example assumes that pipeline has already been run to completion.

Vector Database Service

  • Any vector database can be used to store the resulting embedding and corresponding metadata.

  • It would be trivial to update the example to use Chroma or FAISS if needed.

  • For this example, we will be using Milvus since it is the default VDB used in the VDB upload pipeline.

Implementation and Design Decisions

Implementation Details

Original GitHub issue

In order to cater to the unique requirements of the Retrieval Augmented Generation (RAG) mechanism, the following steps were incorporated:

  • Embedding Retrieval: Before the LLM can make a completion, relevant context is retrieved from the Vector Database. This context is in the form of embeddings that represent pieces of information closely related to the query.

  • Context Augmentation: The retrieved context is then appended to the user’s query, enriching it with the necessary background to assist the LLM in generating a more informed completion.

  • LLM Query Execution: The augmented query is then sent to the LLM, which generates a response based on the combination of the original query and the appended context.

Rationale Behind Design Decisions

  • Using Milvus as VDB: Milvus offers scalable and efficient vector search capabilities, making it a natural choice for embedding retrieval in real-time.

  • Flexible LLM integration: The LLM is integrated into the pipeline as a standalone component, which allows for easy swapping of models and ensures that the pipeline can be easily extended to support multiple LLMs.

Standalone Morpheus Pipeline

The standalone Morpheus pipeline is built using the following components:

  • An InMemorySourceStage to hold the LLM queries in a DataFrame.

    • We supply a fixed set of questions in a source_df which are then processed by the LLMEngineStage

  • A DeserializationStage to convert the MessageMeta objects into ControlMessages needed by the LLMEngine.

    • New functionality was added to the DeserializeStage to support ControlMessages and add a default task to each message.

  • A LLMEngineStage then wraps the core LLMEngine functionality.

    • An ExtracterNode pulls the questions out of the DataFrame.

    • A RAGNode performs the retrieval and adds the context to the query using the supplied template and executes the LLM.

    • Finally, the responses are put back into the ControlMessage using a SimpleTaskHandler.

  • The pipeline concludes with an InMemorySink stage to store the results.

Note: For this to function correctly, the VDB upload pipeline must have been run previously.

Before running the pipeline, we need obtain service API keys for the following services:

Ensure that LFS files are downloaded

To retrieve models from LFS run the following:

Copy
Copied!
            

./scripts/fetch_data.py fetch models

Obtain an OpenAI API or NGC API Key

NGC

  • Follow the instructions here

  • We’ll refer to your NGC API key as ${NGC_API_KEY} for the rest of this document.

OpenAI

  • Follow the instructions here to obtain an OpenAI API key.

  • We’ll refer to your OpenAI API key as ${OPENAI_API_KEY} for the rest of this document.

Before running the pipeline, we need to ensure that the following services are running:

Milvus Service

  • Follow the instructions here to install and run a Milvus service.

Running the Morpheus Pipeline

The top level entrypoint to each of the LLM example pipelines is examples/llm/main.py. This script accepts a set of Options and a Pipeline to run. Baseline options are below, and for the purposes of this document we’ll assume a pipeline option of rag:

Run example (Standalone Pipeline):

Using NGC Nemo LLMs

Copy
Copied!
            

export NGC_API_KEY=[YOUR_KEY_HERE] python examples/llm/main.py rag pipeline

Using OpenAI LLM models

Copy
Copied!
            

export OPENAI_API_KEY=[YOUR_KEY_HERE] python examples/llm/main.py rag pipeline --llm_service=OpenAI --model_name=gpt-3.5-turbo

Previous Vector DataBase Upload (VDB Upload) Pipeline
Next Morpheus LLM Agents Pipeline
© Copyright 2024, NVIDIA. Last updated on Jul 8, 2024.