Important
You are viewing the NeMo 2.0 documentation. This release introduces significant changes to the API and a new library, NeMo Run. We are currently porting all features from NeMo 1.0 to 2.0. For documentation on previous versions or features not yet available in 2.0, please refer to the NeMo 24.07 documentation.
Generate Text with RAG#
After indexing the corpus data for the RAG pipeline, you can retrieve the relevant contexts to augment text generation. Given a query or prompt, you first extract embeddings from the query using the same embedder that was used during index creation. Next, you retrieve k-nearest-neighbors contexts related to the query from the index. Finally, you concatenate these contexts with the query and feed the resulting prompt to the NeMo LLM.
The supplied script runs the entire process with LlamaIndex library, using a trained NeMo embedding model, and a NeMo LLM model.
In this procedure, you use the same NeMo embedding model as the one used when indexing the corpus data. Additionally, you’ll work with a NeMo LLM model, such as GPT, LLama, or Gemma. For instructions on training a LLM model in NeMo, see NVIDIA GPT.
Run Text Generation on a Base Model#
This section provides basic instructions for running text generation on a Slurm cluster.
To initiate text generation:
Assign the
stages
variablein conf/config.yaml
to “rag_generating”.Define the configuration for text generation by setting the
rag_generating
variable to<llm_model_type>/<model_size>
to a specific LLM config file path.
For example, setting the rag_generating
variable to gpt3/7b
specifies the configuration file path as conf/rag_generating/gpt3/7b.yaml
. This path corresponds to the GPT-type LLM model with 7 billion parameters.
Run Text Generation on a Slurm Cluster#
To run text generation on a Slurm cluster:
Set the
run
configuration inconf/rag_generating/gpt3/7b.yaml
to define the job-specific configuration:run: name: ${.eval_name}_${.model_train_name} time_limit: "4:00:00" dependency: "singleton" nodes: 1 ntasks_per_node: 1 eval_name: rag_generating model_train_name: rag_pipeline results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}
Set the path for the embedder checkpoint and saved index. Ensure that the values correspond to the same embedder model used in the indexing step.
indexing: embedder: model_path: /path/to/embedder_checkpoint_dir index_path: /path/to/saved_index
Set the values for text generation including the LLMs checkpoints path, query, temperature, and tokens to generate:
generating: llm: model_path: /path/to/llm_checkpoint_dir inference: tokens_to_generate: 50 greedy: False temperature: 1.0 query: 'Which art schools did I applied to?'
Based on the
query
, relevant contexts will be retrieved from the corpus to augment text generation.Set the configuration for the Slurm cluster in
conf/cluster/bcm.yaml
:partition: null account: null exclusive: True gpus_per_task: null gpus_per_node: 8 mem: 0 job_name_prefix: 'nemo-megatron-' srun_args: - "--no-container-mount-home"
Set the
stages
section ofconf/config.yaml
:stages: - rag_generating
Run the Python script:
python3 main.py
All the configurations are read from
conf/config.yaml
andconf/rag_generating/gpt3/7b.yaml
.