Use Reward Models with NVIDIA NIM for LLMs#
NIM for LLMs supports deploying large language reward models, in addition to chat and completion models. Reward models are often used to score the outputs of another large language model for fine-tuning that model or filtering synthetically created datasets.
When deploying NIMs with reward models, specify the environment variables NIM_REWARD_MODEL
, NIM_REWARD_LOGITS_RANGE
, and NIM_REWARD_MODEL_STRING
as described in Environment Variables.
To send text to a reward model, use the chat/completions
endpoint. Include the prompt that was used to generate the text as the first user
content, and include the response from the model as the assistant
content. The reward model scores the provided model response, taking into account the query that generated it. For example:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
original_query = "I am going to Paris, what should I see?"
original_response = "Ah, Paris, the City of Light! There are so many amazing things to see and do in this beautiful city ..."
messages = [
{"role": "user", "content": original_query},
{"role": "assistant", "content": original_response}
]
response = client.chat.completions.create(
model="nvidia/nemotron-4-340b-reward",
messages=messages,
stream=False
)
The response from NIM includes attribute and score pairs in the message content, where a regular chat completion model would return its generated text. The attributes that a reward model scores responses on are specific to each reward model. Reward models trained using the HelpSteer dataset (such as nemotron-4-340b
) score responses according to the following metrics:
Helpfulness
Correctness
Coherence
Complexity
Verbosity
You can use this response in your downstream applications. For example, you can parse the scores into a Python dictionary:
response_content = response.choices[0].message.content
reward_pairs = [pair.split(":") for pair in response_content.split(",")]
reward_dict = {attribute: float(score) for attribute, score in reward_pairs}
print(reward_dict)
# Prints:
# {'helpfulness': 1.2578125, 'correctness': 0.43359375, 'coherence': 3.34375, 'complexity': 0.045166015625, 'verbosity': 0.6953125}