Customization#

Prerequisites#

Refer to the Support Matrix to make sure that you have the supported hardware and software stack.
An NGC personal API key. The NIM microservice uses the API key to download models from NVIDIA NGC. Refer to Generating a Personal API Key in the NVIDIA NGC User Guide for more information.

When you create an NGC API personal key, select at least NGC Catalog from the Services Included menu. You can specify more services to use the key for additional purposes.

Model specific credentials#

FLUX.1-dev

To access FLUX.1-dev model read and accept FLUX.1-dev , FLUX.1-Canny-dev , FLUX.1-Depth-dev and FLUX.1-dev-onnx License Agreements and Acceptable Use Policy.

Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.

FLUX.1-schnell

To access FLUX.1-schnell model read and accept FLUX.1-schnell and FLUX.1-schnell-onnx License Agreements and Acceptable Use Policy.

Create a new Hugging Face token with Read access to contents of all public gated repos you can access permission.

System requirements#

The customization requires higher minimal system requirements in comparison to inference:

Model	GPU Memory	RAM	OS	CPU
black-forest-labs/flux.1-dev	16GB	50GB	Linux/WSL2	x86_64
black-forest-labs/flux.1-schnell	16GB	50GB	Linux/WSL2	x86_64

About Customizing Models#

NVIDIA NIM for Visual Generative AI offers a range of customization options including specific models precisions for inference pipeline components and specific output images resolutions for the best performance.

Building an Optimized TensorRT Engine#

You can build an optimized engine to provide GPU-model specific optimizations for your host.

Parameters for the Container#

Flags	Description
`-it`	`--interactive` + `--tty` (see Docker docs)
`--rm`	Delete the container after it stops (see Docker docs)
`--name=<container-name`	Give a name to the NIM container. Use any preferred value
`--runtime=nvidia`	Ensure NVIDIA drivers are accessible in the container
`--gpus '"device=0"'`	Expose NVIDIA GPU 0 inside the container. If you are running on a host with multiple GPUs, you need to specify which GPU to use. See GPU Enumeration for further information on for mounting specific GPUs
`-e NGC_API_KEY=$NGC_API_KEY`	Provide the container with the token necessary to download adequate models and resources from NGC
`-v $(pwd)/$OUTPUT_DIR:/output_dir`	Mount the local `$(pwd)/$OUTPUT_DIR` to the `/output_dir` directory inside the container
`--entrypoint "python3"`	Change the default entrypoint that starts NIM server to the `python3` to run the optimization script
`optimize.py --gpu ${your_gpu_name} --export-path /output_dir`	Call of the optimization script with 2 required parameters

Parameters for the Optimization Script#

Parameter	Default Value	Description
`--export-path EXPORT_PATH`	Required	Path to the optimization output directory where TRT engines would be saved
`--gpu GPU`	Required	used GPU model
`--height HEIGHT`	1024	the optimal height of the generated images. Supported values {512,576,640,704,768,832,896,960,1024,1088,1152,1216,1280,1344}
`--width WIDTH`	1024	the optimal width of the generated images. Supported values {512,576,640,704,768,832,896,960,1024,1088,1152,1216,1280,1344}
`--min-height MIN_HEIGHT`	HEIGHT	the minimum height of generated images if not specified the value of –height will be used. Supported values {512,576,640,704,768,832,896,960,1024,1088,1152,1216,1280,1344}
`--max-height MAX_HEIGHT`	HEIGHT	the maximum height of generated images if not specified the value of –height will be used. Supported values {512,576,640,704,768,832,896,960,1024,1088,1152,1216,1280,1344}
`--min-width MIN_WIDTH`	WIDTH	the minimum width of generated images if not specified the value of –width will be used. Supported values {512,576,640,704,768,832,896,960,1024,1088,1152,1216,1280,1344}
`--max-width MAX_WIDTH`	WIDTH	the maximum width of generated images if not specified the value of –width will be used. Supported values {512,576,640,704,768,832,896,960,1024,1088,1152,1216,1280,1344}
`--fp4`		Use fp4 checkpoint. Available only for GPU Compute Compatibility >= 10.0 (Blackwell)
`--fp8`		Use fp8 checkpoint. Available only for GPU Compute Compatibility >= 8.9 (Ada)
`--build-t5-fp8`		Uses fp8 T5 model checkpoint instead of bf16 one. Independent from –fp4 and –fp8 flags. Available only for GPU Compute Compatibility >= 8.9 (Ada)
`--t5-ws-percentage`	None	The percentage of the T5 weights that should be streamed from host to device. Useful for reducing device memory usage. Value should be between 0 and 100. Only supported for `black-forest-labs/flux.1-schnell`.
`--profile-repository <repository-name>`	local:///opt/nim/local	The information that would be used in NIM manifest to start the NIM server. `local://` means that NIM will use the local files only `/opt/nim/local` the path inside the container where the profile data would be mounted
`--silent-mode`		Disables TRT optimization logs
`--low-vram`		DEPRECATED: Offloading policy is automatically selected based on the memory profile
`--no-perf-measurements`		Disables e2e pipeline run after the engines are build
`--force-rebuild`		Enforce building new engines buy removing the old ones
`--no-memory-profile`		Disables memory profile generation