Llama Reranker#

The Llama Reranker is a specialized model designed to improve the quality of retrieved documents in retrieval-augmented generation (RAG) systems. It works by scoring and reranking candidate documents based on their relevance to a given query, helping to ensure that the most relevant information is prioritized in the final results. This reranking step is crucial for enhancing the accuracy and relevance of retrieved information in RAG applications.

Based on the Llama architecture, this reranker leverages the model’s strong language understanding capabilities and transformer-based architecture to effectively process and evaluate document-query pairs. The model utilizes self-attention mechanisms to capture complex relationships between queries and documents, enabling it to make nuanced relevance judgments.

NeMo 2.0 Fine-Tuning Recipes#

Note

The fine-tuning recipes use the SpecterReRankerDataModule for the data argument. You can replace the SpecterReRankerDataModule with your custom dataset.

Alternatively, you can use the CustomReRankerDataModule for loading your own dataset. Details are provided in the later sections.

To import the Hugging Face model and convert it to the NeMo 2.0 format, run the following command (this only needs to be done once):

from nemo.collections import llm
llm.import_ckpt(model=llm.ReRankerModel(llm.Llama32Reranker1BConfig()), source='hf://meta-llama/Llama-3.2-1B')

The above command will import the original Llama 3.2 1B CasualLM model and convert it to the NeMo 2.0 format, adding a newly initialized reranker head.

Similarly, if you want to convert a HF Llama Reranker model, you can use the same commands but change the source to the HF Llama Reranker model.

We provide an example below on how to invoke the default recipe and override the data argument:

from nemo.collections import llm

recipe = llm.recipes.llama_reranker_1b.finetune_recipe(
    name="llama_reranker_finetuning",
    resume_path="path/to/original/reranker/ckpt"
    num_nodes=1,
    num_gpus_per_node=8,
)

# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
#     gbs=gbs,
#     mbs=mbs,
#     seq_length=recipe.model.config.seq_length,
# )
# recipe.data = dataloader

Note

The configuration in the recipes is done using the NeMo-Run run.Config and run.Partial configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.

resume_path is the path to the NeMo2 Reranker model checkpoint after conversion.

We provide a customized dataset class specifically for RAG training. To load your customized dataset, provide a .json file with question, pos_doc, and neg_doc as shown below:

from nemo.collections import llm


dataloader = run.Config(
    llm.CustomReRankerDataModule,
    data_root='path/to/json/data',
    seq_length=sequence_length,
    micro_batch_size=micro_batch_size,
    global_batch_size=global_batch_size,
    tokenizer=tokenizer,
    num_workers=num_dataloader_workers,
    dataset_kwargs=model.get_dataset_kwargs,
)
# # To override the data argument
recipe.data = dataloader

Once your final configuration is ready, you can execute it on any of the NeMo-Run supported executors. The simplest option is the local executor, which runs the pretraining locally in a separate process. You can use it as follows:

import nemo_run as run

run.run(recipe, executor=run.LocalExecutor())

Additionally, you can also run it directly in the same Python process as follows:

run.run(recipe, direct=True)

Once training is finished, you can optionally convert the model to a Hugging Face model using the script below:

from pathlib import Path

llm.export_ckpt(
    path = Path('path/to/finetuned/ckpt'),
    target='hf',
    output_path=Path('path/to/converted/hf/ckpt'),
)

Recipe	Status
Llama3.2 1B Reranker Model	Yes
Llama3.2 500M Reranker Model	Yes