Release Notes#

v1.10.1#

Features#

The context length for the model is increased from 8 K tokens to 128 K tokens.
This release adds support for NVIDIA B200 and H200 GPUs.
This release adds generic model profiles that use the vLLM engine to run the model. These model profiles avoid the high host memory and GPU requirements that are necessary for building a locally-buildable engine on the first run. For information about the profiles, refer to NVIDIA Llama 3.1 NemoGuard 8B ContentSafety Model Profiles.

v1.0.0#

Features#

This is the first release of Llama 3.1 NemoGuard 8B ContentSafety NIM. The microservice serves a GPU-accelerated LLM model that performs content moderation for building trustworthy LLM applications. The LLM model detects harmful content in user messages or bot responses.

Known Issues#

The tensor parallel 4 GPU model profiles are not runnable. This is a known issue.