Target Workloads#

This RA provides optimal configuration for finetuning and inference of Large Language Model, as well as Traditional DL Inference Models. To run example workloads with NIM Operator refer to the Caching Models and NIM Services sections of the NVIDIA NIM Operator Documentation. Additionally, Sample RAG Application documentation is provided to extend the capabilities of NIM. Additionally, KServe may also be used to orchestrate and expose the APIs included in NIM. An example KServe implementation is provided on the CNS Github.

NVIDIA provides reference solutions for various AI use cases via NVIDIA Blueprints, some of which can leverage the K8’s stack. NVIDIA also provides software for AI development with NVIDIA NeMo.

NVIDIA NeMo is a set of microservices that help enterprise AI developers to easily curate data at scale, customize LLMs with the popular fine-tuning techniques, evaluate models on standard and custom benchmarks, and guardrail them for appropriate and grounded outputs.
- NeMo Curator: A powerful microservice for enterprise developers to efficiently curate high-quality datasets for training LLMs, thereby enhancing model performance and accelerating the deployment of AI solutions.
- NeMo Customizer: A high-performance, scalable microservice that simplifies the fine-tuning and alignment of LLMs with popular parameter efficient fine tuning techniques including LoRA, DPO.
- NeMo Evaluator: An enterprise-grade microservice that provides industry-standard benchmarking of generative AI models, synthetic data generation, and end-to-end RAG pipelines.
- NeMo Guardrails: Microservice for developers to implement robust safety and security measures in LLM-based applications, ensuring that these applications remain reliable and aligned with organizational policies and guidelines.