Passing System Prompts for Advanced Reasoning#

To use advanced reasoning features such as “detailed thinking” with Llama Nemotron models, you must pass a custom system prompt to the model. You can do this by including a system_instruction field in the params.extra section of your evaluation configuration.

Ensure that you properly escape single quotes around the system instruction as shown in the example. Also, when enabling Llama Nemotron reasoning, use the recommended temperature (0.6) and top_p (0.95). Finally, increase max_tokens to account for the extra reasoning tokens in the output.

curl

curl -X "POST" "${EVALUATOR_SERVICE_URL}/v1/evaluation/configs" \
    -H 'accept: application/json' \
    -H 'Content-Type: application/json' \
    -d '
    {
        "type": "gsm8k",
        "name": "my-gsm8k-config-nemotron",
        "namespace": "my-organization",
        "tasks": {
            "gsm8k_cot_llama": {
                "type": "gsm8k_cot_llama"
            }
        },
        "params": {
            "temperature": 0.6,
            "top_p": 0.95,
            "max_tokens": 2048,
            "stop": ["<|eot|>"],
            "extra": {
                "system_instruction": "'detailed thinking on'",
                "use_greedy": false,
                "top_k": 1,
                "num_fewshot": 8,
                "batch_size": 16,
                "bootstrap_iters": 100000,
                "dataset_seed": 42,        
                "apply_chat_template": true,
                "fewshot_as_multiturn": true,
                "tokenizer_backend": "hf",
                "hf_token": "<my-token>",
                "tokenizer": "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
            }
        }
    }'

Python

data = {
    "type": "gsm8k",
    "tasks": {
         "gsm8k_cot_zeroshot": {
             "type": "gsm8k_cot_zeroshot"
         }
    },
    "name": "my-gsm8k-config-nemotron",
    "namespace": "my-organization",
    "params": {
        "temperature": 0.6,
        "top_p": 0.95,
        "max_tokens": 2048,
        "stop": ["<|eot|>"],
        "extra": {
            "system_instruction": "'detailed thinking on'",
            "use_greedy": False,
            "top_k": 1,
            "num_fewshot": 8,
            "batch_size": 16,
            "bootstrap_iters": 100000,
            "dataset_seed": 42,        
            "apply_chat_template": True,
            "fewshot_as_multiturn": True,
            "tokenizer_backend": "hf",
            "hf_token": "<my-token>",
            "tokenizer": "nvidia/Llama-3.1-Nemotron-Nano-8B-v1"
        }
    }
}

endpoint = f"{EVALUATOR_SERVICE_URL}/v1/evaluation/configs"
response = requests.post(endpoint, json=data).json()

Note

The value of system_instruction can be set to "detailed thinking on" or "detailed thinking off" depending on your evaluation needs. This field is passed directly to the model as the system prompt, enabling or disabling detailed reasoning as supported by Llama Nemotron models.