Use Llama Stack API#

This tutorial shows how you can use the Llama Stack APIs to achieve the same end-to-end workflow introduced in the previous tutorials.

Prerequisites#

Before you begin, complete the following:

Demo Cluster Setup on Minikube
A meta-llama/llama-3.2-1B-Instruct NIM microservice deployed to your environment, either through Helm deployment or following the instructions in the NIM Deployment tutorial.
Llama Stack running in your environment. For more information, see Running Llama Stack with NVIDIA in the Llama Stack documentation.

Set up environment variables for using the NeMo microservices through the Llama Stack APIs.

For a complete list of environment variables, see Environment Variables for NVIDIA Distribution in the Llama Stack documentation.
For BASE_MODEL, use a model from the available models for NVIDIA Distribution in the Llama Stack documentation.

After setting environment variables, initialize the Llama Stack client for use in subsequent steps.

After uploading a dataset to NeMo Data Store using the Hugging Face CLI, register it with NeMo Entity Store using the Llama Stack client:

Run customization jobs using the Llama Stack client as follows.

Run evaluation jobs using the Llama Stack client as follows.

Register a shield to deployed NIM microservices and run a safety check using Guardrails.

Run inference on deployed NIMs using the Llama Stack client as follows.