Passing System Prompts for Advanced Reasoning#
To use advanced reasoning features such as “detailed thinking” with Llama Nemotron models, you must pass a custom system prompt to the model. You can do this by including a system_instruction
field in the params.extra
section of your evaluation configuration.
Ensure that you properly escape single quotes around the system instruction as shown in the example. Also, when enabling Llama Nemotron reasoning, use the recommended temperature
(0.6
) and top_p
(0.95
). Finally, increase max_tokens
to account for the extra reasoning tokens in the output.
curl -X "POST" "${EVALUATOR_SERVICE_URL}/v1/evaluation/configs" \
-H 'accept: application/json' \
-H 'Content-Type: application/json' \
-d '
{
"type": "gsm8k",
"name": "my-gsm8k-config-nemotron",
"namespace": "my-organization",
"tasks": {
"gsm8k_cot_llama": {
"type": "gsm8k_cot_llama"
}
},
"params": {
"temperature": 0.6,
"top_p": 0.95,
"max_tokens": 2048,
"stop": ["<|eot|>"],
"extra": {
"system_instruction": "'detailed thinking on'",
"use_greedy": false,
"top_k": 1,
"num_fewshot": 8,
"batch_size": 16,
"bootstrap_iters": 100000,
"dataset_seed": 42,
"apply_chat_template": true,
"fewshot_as_multiturn": true,
"tokenizer_backend": "hf",
"hf_token": "<my-token>",
"tokenizer": "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
}
}
}'
data = {
"type": "gsm8k",
"tasks": {
"gsm8k_cot_zeroshot": {
"type": "gsm8k_cot_zeroshot"
}
},
"name": "my-gsm8k-config-nemotron",
"namespace": "my-organization",
"params": {
"temperature": 0.6,
"top_p": 0.95,
"max_tokens": 2048,
"stop": ["<|eot|>"],
"extra": {
"system_instruction": "'detailed thinking on'",
"use_greedy": False,
"top_k": 1,
"num_fewshot": 8,
"batch_size": 16,
"bootstrap_iters": 100000,
"dataset_seed": 42,
"apply_chat_template": True,
"fewshot_as_multiturn": True,
"tokenizer_backend": "hf",
"hf_token": "<my-token>",
"tokenizer": "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
}
}
}
endpoint = f"{EVALUATOR_SERVICE_URL}/v1/evaluation/configs"
response = requests.post(endpoint, json=data).json()
Note
The value of system_instruction
can be set to "detailed thinking on"
or "detailed thinking off"
depending on your evaluation needs. This field is passed directly to the model as the system prompt, enabling or disabling detailed reasoning as supported by Llama Nemotron models.