Use Reward Models with NVIDIA NIM for LLMs#

NIM LLM supports deploying Large Language reward models, in addition to chat and completion models. Reward models are often used to score the outputs of another large language model for further fine tuning that model or filtering synthetically created datasets.

While deploying NIMs with reward models, specify environment variables NIM_REWARD_MODEL, NIM_REWARD_LOGITS_RANGE and NIM_REWARD_MODEL_STRING from the NIM environment variables table.

To send text to a reward model, you can use the chat/completions endpoint like other kinds of models. Include the prompt that was used to generate the text as the first user content, and the response from the model as the assistant content. The reward model will score the provided model response, taking into account the query that generated it. For example:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

original_query = "I am going to Paris, what should I see?"
original_response = "Ah, Paris, the City of Light! There are so many amazing things to see and do in this beautiful city ..."

messages = [
    {"role": "user", "content": original_query},
    {"role": "assistant", "content": original_response}
]

response = client.chat.completions.create(
    model="nvidia/nemotron-4-340b-reward",
    messages=messages,
    stream=False
)

The response from NIM will include attribute and score pairs in the message content, where a regular chat completion model would return its generated text. The attributes that a reward model scores responses on are specific to each reward model. Reward models that are trained using the HelpSteer dataset (like nemotron-4-340b) score responses according to the following metrics:

Helpfulness
Correctness
Coherence
Complexity
Verbosity

You can use this response in your downstream applications. For example, you may want to parse the scores into a python dictionary:

response_content = response.choices[0].message.content
reward_pairs = [pair.split(":") for pair in response_content.split(",")]
reward_dict = {attribute: float(score) for attribute, score in reward_pairs}
print(reward_dict)
# Prints:
# {'helpfulness': 1.2578125, 'correctness': 0.43359375, 'coherence': 3.34375, 'complexity': 0.045166015625, 'verbosity': 0.6953125}