Playbooks

The NVIDIA NIM for Large Language Models (LLMs) playbooks demonstrate how to use NVIDIA NIM for LLMs to self-host RAG, deploy on Hugging Face, and fine-tune with LoRA.

The Build a RAG using a locally hosted NIM playbook demonstrates how to build an RAG using NVIDIA NIM for LLMs with a locally hosted Llama3-8b-instruct NIM and deploy it using NVIDIA AI Endpoints for LangChain.
The Llama 3 LoRA Fine-Tuning and Deployment with NeMo Framework and NVIDIA NIM playbook demonstrates how to perform LoRA PEFT on a Llama 3 8B Instruct using a dataset for bio-medical domain question answering and deploy multiple LoRA adapters with NVIDIA NIM for LLMs.
The Llama 3.1 Law-Domain LoRA Fine-Tuning and Deployment with NeMo Framework and NVIDIA NIM playbook demonstrates how to perform LoRA PEFT on Llama 3.1 8B Instruct using a synthetically augmented version of Law StackExchange with NeMo Framework, followed by deployment with NVIDIA NIM for LLMs. As a pre-requisite, follow the tutorial for data curation using NeMo Curator.

Previous Deployment Guide

Next Multi-node Deployment