Getting Started#
Prerequisites#
Refer to the Support Matrix to make sure that you have the supported hardware and software stack.
An NGC personal API key. The NIM microservice uses the API key to download models from NVIDIA NGC. Refer to Generating a Personal API Key in the NVIDIA NGC User Guide for more information.
When you create an NGC API personal key, select at least NGC Catalog from the Services Included menu. You can specify more services to use the key for additional purposes.
Model Specific Credentials#
To access FLUX.1-dev model read and accept FLUX.1-dev , FLUX.1-Canny-dev , FLUX.1-Depth-dev and FLUX.1-dev-onnx License Agreements and Acceptable Use Policy.
Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.
Running on Windows#
You can run NVIDIA NIM for Visual Generative AI on an RTX Windows system with Windows Subsystem for Linux (WSL).
Note
Support for Visual Generative AI NIMs on WSL is in Public Beta.
Refer to the NVIDIA NIM on WSL documentation for setup instructions.
Refer to the Supported Models to make sure hardware and software requirements are met.
By default, WSL has access to half of system RAM. To change the memory available for WSL create
.wslconfig
in the home directoryC:\Users\<UserName>
with the following content:# Settings apply across all Linux distros running on WSL [wsl2] # Limits RAM memory to use no more than 38GB, this can be set as whole numbers using GB or MB memory=38GB
Restart WSL instances to apply the configuration:
wsl --shutdown
For further customization of your WSL setup refer to WSL configuration.
Use the
podman
command examples in the following section.
Starting the NIM Container#
Export your personal credentials as environment variables:
export NGC_API_KEY="..." export HF_TOKEN="..."
A more secure alternative is to use a password manager, such as pass.
Login to NVIDIA NGC so that you can pull the NIM container:
echo "$NGC_API_KEY" | docker login nvcr.io --username '$oauthtoken' --password-stdin
echo "$NGC_API_KEY" | podman login nvcr.io --username '$oauthtoken' --password-stdin
Use
$oauthtoken
as the user name and$NGC_API_KEY
as the password. The$oauthtoken
user name indicates that you authenticate with an API key and not a user name and password.Start the NIM container with one of the Visual Generative AI models:
# Create the cache directory on the host machine. export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" chmod 777 $LOCAL_NIM_CACHE docker run -it --rm --name=nim-server \ --runtime=nvidia \ --gpus='"device=0"' \ -e NGC_API_KEY=$NGC_API_KEY \ -e HF_TOKEN=$HF_TOKEN \ -p 8000:8000 \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache/" \ nvcr.io/nim/black-forest-labs/flux.1-dev:1.0.1
# Create the cache directory on the host machine. export LOCAL_NIM_CACHE=~/.cache/nim mkdir -p "$LOCAL_NIM_CACHE" chmod 777 $LOCAL_NIM_CACHE podman run -it --rm --name=nim-server \ --device nvidia.com/gpu=all \ -e NGC_API_KEY=$NGC_API_KEY \ -e HF_TOKEN=$HF_TOKEN \ -p 8000:8000 \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache/" \ nvcr.io/nim/black-forest-labs/flux.1-dev:1.0.1
You can specify the desired variant of FLUX by adding
-e NIM_MODEL_VARIANT=<you variant>
. Available variants are base, canny, depth and their combinations, such as base+depth.When you run the preceding command, the container downloads the model, initializes a NIM inference pipeline, and performs a pipeline warm up. A pipeline warm up typically requires up to three minutes. The warm up is complete when the container logs show
Pipeline warmup: start/done
.Optional: Confirm the service is ready to respond to inference requests:
$ curl -X GET http://localhost:8000/v1/health/ready
Example Output
{"status":"ready"}
Send an inference request:
Select an example according to the deployed model variant.
invoke_url="http://localhost:8000/v1/infer" output_image_path="result.jpg" response=$(curl -X POST $invoke_url \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -d '{ "prompt": "A simple coffee shop interior", "mode": "base", "seed": 0, "steps": 50 }') response_body=$(echo "$response" | awk '/{/,EOF-1') echo $response_body | jq .artifacts[0].base64 | tr -d '"' | base64 --decode > $output_image_path
invoke_url="http://localhost:8000/v1/infer" input_image_path="input.jpg" # download an example image curl https://assets.ngc.nvidia.com/products/api-catalog/flux/input/1.jpg > $input_image_path image_b64=$(base64 -w 0 $input_image_path) echo '{ "prompt": "A simple coffee shop interior", "mode": "canny", "image": "data:image/png;base64,'${image_b64}'", "preprocess_image": true, "seed": 0, "steps": 50 }' > payload.json output_image_path="result.jpg" response=$(curl -X POST $invoke_url \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -d @payload.json ) response_body=$(echo "$response" | awk '/{/,EOF-1') echo $response_body | jq .artifacts[0].base64 | tr -d '"' | base64 --decode > $output_image_path
invoke_url="http://localhost:8000/v1/infer" input_image_path="input.jpg" # download an example image curl https://assets.ngc.nvidia.com/products/api-catalog/flux/input/1.jpg > $input_image_path image_b64=$(base64 -w 0 $input_image_path) echo '{ "prompt": "A simple coffee shop interior", "mode": "depth", "image": "data:image/png;base64,'${image_b64}'", "preprocess_image": true, "seed": 0, "steps": 50 }' > payload.json output_image_path="result.jpg" response=$(curl -X POST $invoke_url \ -H "Accept: application/json" \ -H "Content-Type: application/json" \ -d @payload.json ) response_body=$(echo "$response" | awk '/{/,EOF-1') echo $response_body | jq .artifacts[0].base64 | tr -d '"' | base64 --decode > $output_image_path
The prompt parameter represents the description of the image to generate. image parameter takes an input image in base64 format and preprocess_image indicates if the image shoould be preprocess to canny edges or depth map according to the mode. seed parameter governs the generation process (use 0 to generate a new image each call).
Refer to the API Reference for parameter descriptions.
Runtime Parameters for the Container#
Flags |
Description |
---|---|
|
|
|
Delete the container after it stops (Refer to Docker documentation). |
|
Give a name to the NIM container. Use any preferred value. |
|
Ensure NVIDIA drivers are accessible in the container. |
|
Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs. |
|
Provide the container with the token necessary to download adequate models and resources from NGC. |
|
Specify the profile to load. Refer to Models for information about the available profiles. |
|
Specify the preferred model variant to select. By default, the container selects the first available for the host GPU model. |
|
Specify the preferred offloading policy: |
|
Forward the port where the NIM HTTP server is published inside the container to access from the host system. The left-hand side of |
NIM Offloading Policies#
Visual GenAI NIMs support multiple model offloading policies, allowing optimization of model deployment based on specific use cases and host system resources.
The following offloading policies are currently supported:
Policy |
Description |
Performance Impact |
SRAM Usage |
VRAM Usage |
---|---|---|---|---|
disk |
Offloads all models to the disk, reducing the memory footprint of the NIM. |
High |
Low |
Low |
system_ram |
Offloads all models to the system RAM (SRAM), providing faster access to the models compared to disk storage. |
Medium |
High |
Low |
none |
Disables offloading, storing all models in VRAM. |
- |
Low |
High |
default |
Automatically selects the best offloading policy based on the host system’s resources. |
Varies |
Varies |
Varies |
The offloading policy can be selected using the NIM_OFFLOADING_POLICY environmental variable. By setting this variable to one of the supported policies, the NIM will use the specified policy to manage model offloading.
For detailed information on VRAM and SRAM usage for each policy, please refer to Support Matrix.
Stopping the Container#
The following commands stop the container by stopping and removing the running docker container.
docker stop nim-server docker rm nim-serverpodman stop nim-server podman rm nim-server
Next Steps#
Configuration for environment variables and command-line arguments.
Customization to build a custom engine for your GPU model and host.