Models#

About Model Profiles#

The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.

The available model profiles are stored in a file in the NIM container file system. The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml in the container.

FLUX.1-dev Model Profiles#

FLUX.1-dev is a collection of generative image AI models creating high quality, realistic images. FLUX.1-dev generates images from simple text prompts, while FLUX.1-Depth-dev and FLUX.1-Canny-dev enable greater control by combining the text prompt with an image input to guide the output image structure.

GPU

Backend

Resolution

Variant

Precision

Model Profile ID

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

base

FP4

1b2d236d5fa4e0425e80ff17c9480ed73f2a66a5190a102299b3c9b8936670ff

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

canny

FP4

8b42564dd5dc5dc021b47027fc25e8de3c3f20541b06643b80143facd338480b

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

depth

FP4

66188a8ebcad93374ef35c7fb89df3db16ea9176aee3515ad1a4d333d9fc8676

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

b44b6dbfc4d414f5b2d11c401606380d616939bf4f9470de78b9e25de6f143e3

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

base

FP4

ac727b88271b5dc493e23ade2568954e0deaa1d76a2227a6670d6ed821fb9953

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

canny

FP4

1907fccdb6a42689ee3d448d6a93ca911f8674c2aa1ebc81b7d1f7db436eecc1

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

depth

FP4

e9d0786a812eda295914d5c7e4e1a9c989324912af3f73eeaa9631eda616d78f

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

9bd6fd188f53bce2eb42f11f81fadd1d11c3823c506b7a1c96b705f6c5e41b3a

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

base

FP4

365d6883d978bb2f2c00f5af2678115e0d92c2d09f1fe4f8bcdd813b8d731a5f

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

canny

FP4

36c44753a9a188e8a36e717c4cd2d08c7c8cc4281f59c750cfda49bd9e72a0bf

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

depth

FP4

387b0d749f1f6c39f7dd9b57e1e6872f809c6bf0422c71cda164be32c0fb7d79

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

b454d497c90956b1bf546720c1df00c1888865050d72290191f36ada319ecc6c

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

base

FP4

9ad7ffac9b8260d15ab286637d444363a6899159e903e9cce3594a58be1489f9

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

canny

FP4

c8cfa63ee8cba592b3f52edefa18a5fda9e8f512ee3da8bc938a90336a0e75ea

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

depth

FP4

3bdeb471bb31950b3a7a759b5dea3aeb80083fd328a2cee445463fcf79141373

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

b912befc35951aa88e450d4b0ff7ec9576688c44f434cec624d814f954b16c10

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

base

FP4

34f18766736a8842a8248cffc18881bf850d04698ab1e25fa7e9fc65fae82688

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

canny

FP4

d65f03f4d849fd152ce78961bec9868652db607f7e7f8d02eeea68de9e964cfc

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

depth

FP4

48e32cc14e07205437fa4484893e017fe6ce7149de6ef3b935e61482cc43d3e7

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

094a53dd3d6b4a67e8ba8b215f996acb0f0114afc8b1a2503068ebd7e2dc4b67

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

base

FP8

9b8c05dd711ea235c7390c838a54730dd762466484996275c6b362ed3c87d4f7

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

canny

FP8

6a4f28dc7ce68a6f63cf4361cbe84341932d2c61acd6725e08fe222725be53b3

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

depth

FP8

e8bf15bd38e3766339517218899a9a0ec63f4ca9d6d7086f99115b617dcf71f2

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

5ec1a6c7284f4e55127ffdceae12684c0a50242cfdeff940f53e359dc636b267

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

base

FP8

93bd95c143ef54b2ad47c47ac3d5742f9fff5ff80baa8228ce19a8577a75ebc8

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

canny

FP8

75af532c3d833d82ad27fab8bb190f60fbb3a91b0cf70bea33d294a7c8ce5baf

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

depth

FP8

bdf9998149e94cdaf5221aa9baebc30f27449925da8eac6d1508cd945cdb643a

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

61070e912036a7a9140d5e64126bd623522293ea3076c2f167d963a94c863b13

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

base

FP8

96295541130ed46e3de0b25d1a95e7409784bb19da7b4b97cb586eac4e4ab778

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

canny

FP8

842dee095df8d6ad5a2b8678605e677fea46882ff1eba1ecde76a186e8b0d1c5

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

depth

FP8

b490222762872588294023feaecc384bbba054ae06256abd7b166d5e007cb764

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

2a0ff19006f215b4dbb2266240c12d91ac6a005c402124c3ca3916141096fd0a

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

base

FP8

9ed3f545f2316939af1984fb703115e9b706f8c7e9b4eb452f37f86a06df5bbc

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

canny

FP8

cc19715f2bd209a45773ec4131c346b4c88b44d3e8f67145e719d63f6bf512d4

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

depth

FP8

a02d1b01eb43980224ebc91a471d415be2886849bce69374e9c2a63289d8debe

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

cf766e0c4e718cccf1e771e27d7bb8181120ea21219533f8d9d166f1df1bbedd

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

base

FP4

c6d1fad563e06a49946adfa773b9117b0485ec7cd0640386f0a5884bb350a51a

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

canny

FP4

a1f563c2ce47feeff632d0306083ad45e05d268cffb080a34caf5f2ed14ebbcc

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

depth

FP4

9b45f1c8bb44d13e6d6067799e90f472001845bd76bbe4da9669214deda62eda

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

d4ffdd037cbdb279689bc6f5cd969de4cdf2e63b47edc055413b759cc25bdcff

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

base

FP8

2cf27ae9a70fb4d765e646530d14d26f380fb4cefe3c93555faaf2d84061e475

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

canny

FP8

24f330eafd299ac785cc72f70cfb8d64ec1c15e16766e55ab570e6e97ef57d8b

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

depth

FP8

8964aba253650b90dc4bf8cd24e4c139ebd54518a9b546cb05cc2e2f23155a39

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

f3036de58626350a45af7c1d24b77bed31feb35848685870bb0690d18310c178

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base

FP8

0376eb85528b177c914b3a435c6d34456f1ce16bd9287c7e9f22392d87de0441

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

canny

FP8

ea523d996ab2f281ca305f7de7f36f348f8203a8fe72e0bb7620931a50d82fb6

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

depth

FP8

2a971111162d9d9a60648fd97c3d5338501b538e017c302589b7c920fc81bde1

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base+canny+depth

FP8

1f9080e10c8ffc4ae59d15277171b0ee3fef9b987f9b45410920ad41f7c15cde

NVIDIA L40

TensorRT

768-1344x768-1344

base

FP8

fde1571bb1c3127b047f5e7ab37b48c893b055988473bab4fc5399874b964337

NVIDIA L40

TensorRT

768-1344x768-1344

canny

FP8

52035cc50f1e63c3cba7319f8e365f23e29442d11f768b6b87e11eea3de5cd38

NVIDIA L40

TensorRT

768-1344x768-1344

depth

FP8

1c55cd56fd15786b7729a3880defc5ef4284904f99f7bc6912c46e9620c43021

NVIDIA L40

TensorRT

768-1344x768-1344

base+canny+depth

FP8

a12aa1e722ccc7f7685ea7663009cea0d02c49d38fce981f8177eaa6ad8e1341

If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. Pytorch checkpoints are not quantized so they consume more GPU memory.

GPU

Backend

Resolution

Variant

Precision

Model Profile ID

Generic

PyTorch

768-1344x768-1344

base

BF16

f0d0d4ac2ea5b121defa3e82a1fe82f289856cf5db49aa99e670e8851d8f0305

Generic

PyTorch

768-1344x768-1344

canny

BF16

351a04dd6ca4e445f1ae4fe0da0190133c79ed4eedd2965e5da41cbb2b48826c

Generic

PyTorch

768-1344x768-1344

depth

BF16

7280cf728c45505c1a8def558d9c18534096c0fe9a976b138818e31b33e859b7

Generic

PyTorch

768-1344x768-1344

base+canny+depth

BF16

f02c296542632aef64d11cbb13026c2502da2c290cc5b05f507a4922eedd1dda

FLUX.1-schnell Model Profiles#

GPU

Backend

Resolution

Precision

Model Profile ID

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

FP4

1b2d236d5fa4e0425e80ff17c9480ed73f2a66a5190a102299b3c9b8936670ff

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

FP4

ac727b88271b5dc493e23ade2568954e0deaa1d76a2227a6670d6ed821fb9953

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

FP4

365d6883d978bb2f2c00f5af2678115e0d92c2d09f1fe4f8bcdd813b8d731a5f

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

FP4

9ad7ffac9b8260d15ab286637d444363a6899159e903e9cce3594a58be1489f9

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

FP4

34f18766736a8842a8248cffc18881bf850d04698ab1e25fa7e9fc65fae82688

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

FP8

9b8c05dd711ea235c7390c838a54730dd762466484996275c6b362ed3c87d4f7

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

FP8

93bd95c143ef54b2ad47c47ac3d5742f9fff5ff80baa8228ce19a8577a75ebc8

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

FP8

96295541130ed46e3de0b25d1a95e7409784bb19da7b4b97cb586eac4e4ab778

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

FP8

9ed3f545f2316939af1984fb703115e9b706f8c7e9b4eb452f37f86a06df5bbc

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

FP4

c6d1fad563e06a49946adfa773b9117b0485ec7cd0640386f0a5884bb350a51a

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

FP8

ea9115a32e460d58aa89e79baee8fa1668305d5a74558d81ebfddb41a2fb3c28

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

FP8

0376eb85528b177c914b3a435c6d34456f1ce16bd9287c7e9f22392d87de0441

NVIDIA L40

TensorRT

768-1344x768-1344

FP8

fde1571bb1c3127b047f5e7ab37b48c893b055988473bab4fc5399874b964337

If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. Pytorch checkpoints are not quantized so they consume more GPU memory.

GPU

Backend

Resolution

Precision

Model Profile ID

Generic

PyTorch

768-1344x768-1344

BF16

f0d0d4ac2ea5b121defa3e82a1fe82f289856cf5db49aa99e670e8851d8f0305