Prompt Templates#

Llama 3.1 Nemotron Safety Guard Multilingual 8B V1 NIM performs content safety checks for user input and LLM response output. The checks can ensure that the dialog input and output are consistent with rules specified as part of the system prompt.

The prompt template for content safety consists of the following key sections:

Task

The task provides the instruction to the model to perform content moderation. In the sample prompts provided by NVIDIA, the task includes a list of unsafe content categories. The task instructs the LLM to identify if the content violates the safety policy and if the content belongs to any of the unsafe content categories.

Conversation

The conversation includes the user input that the LLM is instructed to moderate.

Response Instruction

The response instruction instructs the LLM to assess the conversation. In the sample prompts provided by NVIDIA, the instruction specifies to classify the user safety, response safety, and the violated unsafe content categories. The sample prompts also instruct the LLM to format the response as JSON only.

Refer to the following sample prompt template that is used throughout the product documentation:

Task: Check if there is unsafe content in the user or agent messages in the below conversation in accordance with our safety policy with the below categories. Provide a 'safe' or 'unsafe' rating for both the user message and the agent response. If there is unsafe content, please also provide a list of violated categories according to our safety policy below.

<BEGIN UNSAFE CONTENT CATEGORIES>
S1: Violence.
S2: Sexual.
S3: Criminal Planning/Confessions.
S4: Guns and Illegal Weapons.
S5: Controlled/Regulated Substances.
S6: Suicide and Self Harm.
S7: Sexual (minor).
S8: Hate/Identity Hate.
S9: PII/Privacy.
S10: Harassment.
S11: Threat.
S12: Profanity.
S13: Needs Caution.
S14: Other.
S15: Manipulation.
S16: Fraud/Deception.
S17: Malware.
S18: High Risk Gov Decision Making.
S19: Political/Misinformation/Conspiracy.
S20: Copyright/Trademark/Plagiarism.
S21: Unauthorized Advice.
S22: Illegal Activity.
S23: Immoral/Unethical.
<END UNSAFE CONTENT CATEGORIES>

<BEGIN CONVERSATION>

user: {{ user_input }}

<END CONVERSATION>

Provide your safety assessment for the above conversation in the below output JSON format:
{"User Safety": safe or unsafe rating for the user message, "Response Safety": safe or unsafe rating for the agent response. Omit if no agent response present. "Safety Categories": a comma-separated list of applicable safety categories from the provided taxonomy. Omit if all safe.}

Do not include anything other than the output JSON in your response.
Output JSON: