Release Notes#

v1.10.1#

Features#

The context length for the model is increased from 8 K tokens to 128 K tokens.
This release adds support for NVIDIA B200 and H200 GPUs.
This release adds generic model profiles that use the vLLM engine to run the model. These model profiles avoid the high host memory and GPU requirements that are necessary for building a locally-buildable engine on the first run. For information about the profiles, refer to NVIDIA Llama 3.1 NemoGuard 8B TopicGuard Model Profiles.

v1.0.0#

Features#

This is the first release of Llama 3.1 NemoGuard 8B TopicControl NIM. The microservice serves a GPU-accelerated LLM model for conversational dialog moderation to keep conversations on-topic and build trustworthy LLM applications.

Known Issues#

The tensor parallel 4 GPU model profiles are not runnable. This is a known issue.