Important

You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.

RAG Pipeline Overview#

Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from Large Language Models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly improve the quality of generative AI systems.

A basic text RAG pipeline includes the following steps:

  • Indexing: Building an index from text corpus with an embedder for retrieval.

  • Generating: Given a query, embed the query with the embedder, then search for related contexts embeddings from the index. The retrieved contexts will be concatenated with the query to feed into the LLM to generate answer.

This RAG pipeline can be improved by intermediate steps to improve efficiency and quality of the answers, using technique such as Reranker, Adaptive RAG, Self-RAG, etc. Currently, NeMo Framework supports a basic RAG pipeline components and procedure, with plans to further support more models and RAG features.

../_images/rag_pipeline.png
RAG Pipeline Feature Support:
  • Embedding/indexing corpus

  • Retriveing/generating text

  • Future support: Reranking after Retrieval

Embedders Support:
  • BERT embedder

  • Future support for text-based RAG: GPT embedder

  • Future support for multimodal-based RAG: CLIP embedder

Language model Support:
  • GPT

  • LLama

  • Future support for text-based RAG: Gemma, Mistral, Mamba

  • Future support for multimodal-based RAG: NeVa (Visual Language Model)

To orchestrate the RAG steps, NeMo RAG pipeline uses LlamaIndex , which can be installed with:

!pip install llama-index