Generate Text with RAG

User Guide (Latest Version)

After indexing the corpus data for the RAG pipeline, you can retrieve the relevant contexts to augment text generation. Given a query or prompt, you first extract embeddings from the query using the same embedder that was used during index creation. Next, you retrieve k-nearest-neighbors contexts related to the query from the index. Finally, you concatenate these contexts with the query and feed the resulting prompt to the NeMo LLM.

The supplied script runs the entire process with LlamaIndex library, using a trained NeMo embedding model, and a NeMo LLM model.

In this procedure, you use the same NeMo embedding model as the one used when indexing the corpus data. Additionally, you’ll work with a NeMo LLM model, such as GPT, LLama, or Gemma. For instructions on training a LLM model in NeMo, see NVIDIA GPT.

This section provides basic instructions for running text generation on a Slurm cluster.

To initiate text generation:

  1. Assign the stages variable in conf/config.yaml to “rag_generating”.

  2. Define the configuration for text generation by setting the rag_generating variable to <llm_model_type>/<model_size> to a specific LLM config file path.

For example, setting the rag_generating variable to gpt3/7b specifies the configuration file path as conf/rag_generating/gpt3/7b.yaml. This path corresponds to the GPT-type LLM model with 7 billion parameters.

Run Text Generation on a Slurm Cluster

To run text generation on a Slurm cluster:

  1. Set the run configuration in conf/rag_generating/gpt3/7b.yaml to define the job-specific configuration:

    Copy
    Copied!
                

    run: name: ${.eval_name}_${.model_train_name} time_limit: "4:00:00" dependency: "singleton" nodes: 1 ntasks_per_node: 1 eval_name: rag_generating model_train_name: rag_pipeline results_dir: ${base_results_dir}/${.model_train_name}/${.eval_name}


  2. Set the path for the embedder checkpoint and saved index. Ensure that the values correspond to the same embedder model used in the indexing step.

    Copy
    Copied!
                

    indexing: embedder: model_path: /path/to/embedder_checkpoint_dir index_path: /path/to/saved_index


  3. Set the values for text generation including the LLMs checkpoints path, query, temperature, and tokens to generate:

    Copy
    Copied!
                

    generating: llm: model_path: /path/to/llm_checkpoint_dir inference: tokens_to_generate: 50 greedy: False temperature: 1.0 query: 'WhichartschoolsdidIappliedto?'


    Based on the query, relevant contexts will be retrieved from the corpus to augment text generation.

  4. Set the configuration for the Slurm cluster in conf/cluster/bcm.yaml:

    Copy
    Copied!
                

    partition: null account: null exclusive: True gpus_per_task: null gpus_per_node: 8 mem: 0 job_name_prefix: 'nemo-megatron-' srun_args: - "--no-container-mount-home"


  5. Set the stages section of conf/config.yaml:

    Copy
    Copied!
                

    stages: - rag_generating


  6. Run the Python script:

    Copy
    Copied!
                

    python3 main.py


    All the configurations are read from conf/config.yaml and conf/rag_generating/gpt3/7b.yaml.

Previous RAG
Next Index Corpus Data for RAG
© | | | | | | |. Last updated on Jun 19, 2024.