Customization#

Prerequisites#

Refer to the Support Matrix to make sure that you have the supported hardware and software stack.
An NGC personal API key. The NIM microservice uses the API key to download models from NVIDIA NGC. Refer to Generating a Personal API Key in the NVIDIA NGC User Guide for more information.

When you create an NGC API personal key, select at least NGC Catalog from the Services Included menu. You can specify more services to use the key for additional purposes.

Model specific credentials#

FLUX.1-dev

To access FLUX.1-dev model read and accept FLUX.1-dev , FLUX.1-Canny-dev , FLUX.1-Depth-dev and FLUX.1-dev-onnx License Agreements and Acceptable Use Policy.

Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.

FLUX.1-schnell

To access FLUX.1-schnell model read and accept FLUX.1-schnell and FLUX.1-schnell-onnx License Agreements and Acceptable Use Policy.

Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.

FLUX.1-Kontext-dev

To access FLUX.1-Kontext-dev model read and accept FLUX.1-Kontext-dev and FLUX.1-Kontext-dev-onnx License Agreements and Acceptable Use Policy.

Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.

Stable Diffusion 3.5 Large

To access Stable Diffusion 3.5 Large model read and accept Stable Diffusion 3.5 Large, Stable Diffusion 3.5 Large TensorRT and Stable Diffusion 3.5 Large ControlNet TensorRT License Agreements and Acceptable Use Policy.

Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.

System requirements#

The customization requires higher minimal system requirements in comparison to inference:

Model	GPU Memory	RAM	OS	CPU
black-forest-labs/flux.1-dev	16GB	50GB	Linux/WSL2	x86_64
black-forest-labs/flux.1-schnell	16GB	50GB	Linux/WSL2	x86_64
black-forest-labs/flux.1-kontext-dev	16GB	50GB	Linux/WSL2	x86_64
stabilityai/stable-diffusion-3.5-large	32GB	50GB	Linux	x86_64

About Customizing Models#

NVIDIA NIM for Visual Generative AI offers a range of customization options including specific models precisions for inference pipeline components and specific output images resolutions for the best performance.

Building an Optimized TensorRT Engine#

You can build an optimized engine to provide GPU-model specific optimizations for your host.

Parameters for the Container#

Flags	Description
`-it`	`--interactive` + `--tty` (see Docker docs)
`--rm`	Delete the container after it stops (see Docker docs)
`--name=<container-name`	Give a name to the NIM container. Use any preferred value
`--runtime=nvidia`	Ensure NVIDIA drivers are accessible in the container
`--gpus '"device=0"'`	Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs
`-e NGC_API_KEY=$NGC_API_KEY`	Provide the container with the token necessary to download adequate models and resources from NGC
`-v $(pwd)/$OUTPUT_DIR:/output_dir`	Mount the local `$(pwd)/$OUTPUT_DIR` to the `/output_dir` directory inside the container
`--entrypoint "python3"`	Change the default entrypoint that starts NIM server to the `python3` to run the optimization script
`optimize.py --gpu ${your_gpu_name} --export-path /output_dir`	Call of the optimization script with 2 required parameters

Parameters for the Optimization Script#

Parameter	Default Value	Description
`--export-path EXPORT_PATH`	Required	The path to the optimization output directory where TRT engines are saved.
`--gpu GPU`	Required	The GPU model the system uses.
`--height HEIGHT`	1024	The optimal height for generated images. Supported values include {512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280, and 1344}. For Flux.1-Kontext-dev, supported values include {672, 688, 720, 752, 800, 832, 880, 944, 1024, 1104, 1184, 1248, 1328, 1392, 1456, 1504, and 1568}.
`--width WIDTH`	1024	The optimal width of generated images. Supported values include {512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280, and 1344}. For Flux.1-Kontext-dev, supported values include {672, 688, 720, 752, 800, 832, 880, 944, 1024, 1104, 1184, 1248, 1328, 1392, 1456, 1504, and 1568}.
`--min-height MIN_HEIGHT`	HEIGHT	The minimum height for generated images. If not specified, the system applies the value of –height. Supported values include {512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280, and 1344}. For Flux.1-Kontext-dev, supported values include {672, 688, 720, 752, 800, 832, 880, 944, 1024, 1104, 1184, 1248, 1328, 1392, 1456, 1504, and 1568}.
`--max-height MAX_HEIGHT`	HEIGHT	The maximum height for generated images. If not specified, the system applies the –height value. Supported values include {512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280, and 1344}. For Flux.1-Kontext-dev, supported values include {672, 688, 720, 752, 800, 832, 880, 944, 1024, 1104, 1184, 1248, 1328, 1392, 1456, 1504, and 1568}.
`--min-width MIN_WIDTH`	WIDTH	The minimum width for generated images. If omitted, the system uses the –width value. Supported values include {512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280, and 1344}. For Flux.1-Kontext-dev, supported values include {672, 688, 720, 752, 800, 832, 880, 944, 1024, 1104, 1184, 1248, 1328, 1392, 1456, 1504, and 1568}.
`--max-width MAX_WIDTH`	WIDTH	The maximum width for generated images. If not specified, the system uses the –width value. Supported values include {512, 576, 640, 704, 768, 832, 896, 960, 1024, 1088, 1152, 1216, 1280, and 1344}. For Flux.1-Kontext-dev, supported values include {672, 688, 720, 752, 800, 832, 880, 944, 1024, 1104, 1184, 1248, 1328, 1392, 1456, 1504, and 1568}.
`--variant VARIANT_1 VARIANT2 ...`	base	A set of supported model variants (see Support Matrix). To specify multiple variants, use a space-separated list.
`--fp4`		Use the FP4 checkpoint. Available only for GPU compute capability 10.0 or higher (Blackwell).
`--fp8`		Use the FP8 checkpoint. Available only for GPU compute capability 8.9 or higher (Ada).
`--build-t5-fp8`		Uses the FP8 T5 model checkpoint instead of the BF16 checkpoint. It runs independently of the –fp4 and –fp8 flags. Available only on GPUs with Compute Compatibility 8.9 or higher (Ada).
`--t5-ws-percentage`	None	The percentage of T5 weights to stream from host to device to reduce device memory usage. Accept values between 0 and 100. Supported only for `black-forest-labs/flux.1-schnell` and `black-forest-labs/flux.1-kontext-dev`.
`--transformer-ws-percentage`	None	The percentage of Transformer (diffusion denoiser) weights to stream from host to device to reduce device memory usage. Use a value between 0 and 100. Supported only for `black-forest-labs/flux.1-kontext-dev`.
`--profile-repository <repository-name>`	local:///opt/nim/local	The NIM manifest entries required to start the NIM server. The `local://` prefix instructs NIM to use only local files. Use `/opt/nim/local` as the container path where NIM mounts the profile data.
`--silent-mode`		Disable TRT optimization logs.
`--low-vram`		DEPRECATED: The system automatically selects the offloading policy based on the memory profile.
`--no-perf-measurements`		Disable the e2e pipeline run after building the engines.
`--force-rebuild`		Enforce building new engines by removing existing ones.
`--no-memory-profile`		Disable memory profile generation.