Llama Reranker#
The Llama Reranker is a specialized model designed to improve the quality of retrieved documents in retrieval-augmented generation (RAG) systems. It works by scoring and reranking candidate documents based on their relevance to a given query, helping to ensure that the most relevant information is prioritized in the final results. This reranking step is crucial for enhancing the accuracy and relevance of retrieved information in RAG applications.
Based on the Llama architecture, this reranker leverages the model’s strong language understanding capabilities and transformer-based architecture to effectively process and evaluate document-query pairs. The model utilizes self-attention mechanisms to capture complex relationships between queries and documents, enabling it to make nuanced relevance judgments.
NeMo 2.0 Fine-Tuning Recipes#
Note
The fine-tuning recipes use the SpecterReRankerDataModule
for the data
argument. You can replace the SpecterReRankerDataModule
with your custom dataset.
Alternatively, you can use the CustomReRankerDataModule
for loading your own dataset. Details are provided in the later sections.
To import the Hugging Face model and convert it to the NeMo 2.0 format, run the following command (this only needs to be done once):
from nemo.collections import llm
llm.import_ckpt(model=llm.ReRankerModel(llm.Llama32Reranker1BConfig()), source='hf://meta-llama/Llama-3.2-1B')
The above command will import the original Llama 3.2 1B CasualLM model and convert it to the NeMo 2.0 format, adding a newly initialized reranker head.
Similarly, if you want to convert a HF Llama Reranker model, you can use the same commands but change the source to the HF Llama Reranker model.
We provide an example below on how to invoke the default recipe and override the data argument:
from nemo.collections import llm
recipe = llm.recipes.llama_reranker_1b.finetune_recipe(
name="llama_reranker_finetuning",
resume_path="path/to/original/reranker/ckpt"
num_nodes=1,
num_gpus_per_node=8,
)
# # To override the data argument
# dataloader = a_function_that_configures_your_custom_dataset(
# gbs=gbs,
# mbs=mbs,
# seq_length=recipe.model.config.seq_length,
# )
# recipe.data = dataloader
Note
The configuration in the recipes is done using the NeMo-Run run.Config
and run.Partial
configuration objects. Please review the NeMo-Run documentation to learn more about its configuration and execution system.
resume_path
is the path to the NeMo2 Reranker model checkpoint after conversion.
We provide a customized dataset class specifically for RAG training. To load your customized dataset, provide a .json file with question
, pos_doc
, and neg_doc
as shown below:
from nemo.collections import llm
dataloader = run.Config(
llm.CustomReRankerDataModule,
data_root='path/to/json/data',
seq_length=sequence_length,
micro_batch_size=micro_batch_size,
global_batch_size=global_batch_size,
tokenizer=tokenizer,
num_workers=num_dataloader_workers,
dataset_kwargs=model.get_dataset_kwargs,
)
# # To override the data argument
recipe.data = dataloader
Once your final configuration is ready, you can execute it on any of the NeMo-Run supported executors. The simplest option is the local executor, which runs the pretraining locally in a separate process. You can use it as follows:
import nemo_run as run
run.run(recipe, executor=run.LocalExecutor())
Additionally, you can also run it directly in the same Python process as follows:
run.run(recipe, direct=True)
Once training is finished, you can optionally convert the model to a Hugging Face model using the script below:
from pathlib import Path
llm.export_ckpt(
path = Path('path/to/finetuned/ckpt'),
target='hf',
output_path=Path('path/to/converted/hf/ckpt'),
)
Recipe |
Status |
---|---|
Llama3.2 1B Reranker Model |
Yes |
Llama3.2 500M Reranker Model |
Yes |