Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
RAG Pipeline Overview#
Retrieval-augmented generation (RAG) is a technique that combines information retrieval with a set of carefully designed system prompts to provide more accurate, up-to-date, and contextually relevant responses from Large Language Models (LLMs). By incorporating data from various sources such as relational databases, unstructured document repositories, internet data streams, and media news feeds, RAG can significantly improve the quality of generative AI systems.
A basic text RAG pipeline includes the following steps:
Indexing: Building an index from text corpus with an embedder for retrieval.
Generating: Given a query, embed the query with the embedder, then search for related contexts embeddings from the index. The retrieved contexts will be concatenated with the query to feed into the LLM to generate answer.
This RAG pipeline can be improved by intermediate steps to improve efficiency and quality of the answers, using technique such as Reranker, Adaptive RAG, Self-RAG, etc. Currently, NeMo Framework supports a basic RAG pipeline components and procedure, with plans to further support more models and RAG features.
- RAG Pipeline Feature Support:
Embedding/indexing corpus
Retriveing/generating text
Future support: Reranking after Retrieval
- Embedders Support:
BERT embedder
Future support for text-based RAG: GPT embedder
Future support for multimodal-based RAG: CLIP embedder
- Language model Support:
GPT
LLama
Future support for text-based RAG: Gemma, Mistral, Mamba
Future support for multimodal-based RAG: NeVa (Visual Language Model)
To orchestrate the RAG steps, NeMo RAG pipeline uses LlamaIndex , which can be installed with:
!pip install llama-index