Models#

About Model Profiles#

The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.

The available model profiles are stored in a file in the NIM container file system. The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml in the container.

FLUX.1-dev Model Profiles#

FLUX.1-dev is a collection of generative image AI models creating high quality, realistic images. FLUX.1-dev generates images from simple text prompts, while FLUX.1-Depth-dev and FLUX.1-Canny-dev enable greater control by combining the text prompt with an image input to guide the output image structure.

GPU

Backend

Resolution

Variant

Precision

Model Profile ID

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

base

FP4

1b2d236d5fa4e0425e80ff17c9480ed73f2a66a5190a102299b3c9b8936670ff

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

canny

FP4

8b42564dd5dc5dc021b47027fc25e8de3c3f20541b06643b80143facd338480b

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

depth

FP4

66188a8ebcad93374ef35c7fb89df3db16ea9176aee3515ad1a4d333d9fc8676

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

b44b6dbfc4d414f5b2d11c401606380d616939bf4f9470de78b9e25de6f143e3

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

base

FP4

ac727b88271b5dc493e23ade2568954e0deaa1d76a2227a6670d6ed821fb9953

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

canny

FP4

1907fccdb6a42689ee3d448d6a93ca911f8674c2aa1ebc81b7d1f7db436eecc1

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

depth

FP4

e9d0786a812eda295914d5c7e4e1a9c989324912af3f73eeaa9631eda616d78f

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

9bd6fd188f53bce2eb42f11f81fadd1d11c3823c506b7a1c96b705f6c5e41b3a

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

base

FP4

365d6883d978bb2f2c00f5af2678115e0d92c2d09f1fe4f8bcdd813b8d731a5f

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

canny

FP4

36c44753a9a188e8a36e717c4cd2d08c7c8cc4281f59c750cfda49bd9e72a0bf

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

depth

FP4

387b0d749f1f6c39f7dd9b57e1e6872f809c6bf0422c71cda164be32c0fb7d79

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

b454d497c90956b1bf546720c1df00c1888865050d72290191f36ada319ecc6c

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

base

FP4

9ad7ffac9b8260d15ab286637d444363a6899159e903e9cce3594a58be1489f9

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

canny

FP4

c8cfa63ee8cba592b3f52edefa18a5fda9e8f512ee3da8bc938a90336a0e75ea

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

depth

FP4

3bdeb471bb31950b3a7a759b5dea3aeb80083fd328a2cee445463fcf79141373

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

b912befc35951aa88e450d4b0ff7ec9576688c44f434cec624d814f954b16c10

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

base

FP4

34f18766736a8842a8248cffc18881bf850d04698ab1e25fa7e9fc65fae82688

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

canny

FP4

d65f03f4d849fd152ce78961bec9868652db607f7e7f8d02eeea68de9e964cfc

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

depth

FP4

48e32cc14e07205437fa4484893e017fe6ce7149de6ef3b935e61482cc43d3e7

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

094a53dd3d6b4a67e8ba8b215f996acb0f0114afc8b1a2503068ebd7e2dc4b67

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

base

FP8

9b8c05dd711ea235c7390c838a54730dd762466484996275c6b362ed3c87d4f7

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

canny

FP8

6a4f28dc7ce68a6f63cf4361cbe84341932d2c61acd6725e08fe222725be53b3

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

depth

FP8

e8bf15bd38e3766339517218899a9a0ec63f4ca9d6d7086f99115b617dcf71f2

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

5ec1a6c7284f4e55127ffdceae12684c0a50242cfdeff940f53e359dc636b267

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

base

FP8

93bd95c143ef54b2ad47c47ac3d5742f9fff5ff80baa8228ce19a8577a75ebc8

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

canny

FP8

75af532c3d833d82ad27fab8bb190f60fbb3a91b0cf70bea33d294a7c8ce5baf

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

depth

FP8

bdf9998149e94cdaf5221aa9baebc30f27449925da8eac6d1508cd945cdb643a

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

61070e912036a7a9140d5e64126bd623522293ea3076c2f167d963a94c863b13

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

base

FP8

96295541130ed46e3de0b25d1a95e7409784bb19da7b4b97cb586eac4e4ab778

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

canny

FP8

842dee095df8d6ad5a2b8678605e677fea46882ff1eba1ecde76a186e8b0d1c5

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

depth

FP8

b490222762872588294023feaecc384bbba054ae06256abd7b166d5e007cb764

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

2a0ff19006f215b4dbb2266240c12d91ac6a005c402124c3ca3916141096fd0a

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

base

FP8

9ed3f545f2316939af1984fb703115e9b706f8c7e9b4eb452f37f86a06df5bbc

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

canny

FP8

cc19715f2bd209a45773ec4131c346b4c88b44d3e8f67145e719d63f6bf512d4

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

depth

FP8

a02d1b01eb43980224ebc91a471d415be2886849bce69374e9c2a63289d8debe

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

cf766e0c4e718cccf1e771e27d7bb8181120ea21219533f8d9d166f1df1bbedd

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

base

FP4

c6d1fad563e06a49946adfa773b9117b0485ec7cd0640386f0a5884bb350a51a

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

canny

FP4

a1f563c2ce47feeff632d0306083ad45e05d268cffb080a34caf5f2ed14ebbcc

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

depth

FP4

9b45f1c8bb44d13e6d6067799e90f472001845bd76bbe4da9669214deda62eda

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP4

d4ffdd037cbdb279689bc6f5cd969de4cdf2e63b47edc055413b759cc25bdcff

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

base

FP8

2cf27ae9a70fb4d765e646530d14d26f380fb4cefe3c93555faaf2d84061e475

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

canny

FP8

24f330eafd299ac785cc72f70cfb8d64ec1c15e16766e55ab570e6e97ef57d8b

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

depth

FP8

8964aba253650b90dc4bf8cd24e4c139ebd54518a9b546cb05cc2e2f23155a39

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

base+canny+depth

FP8

f3036de58626350a45af7c1d24b77bed31feb35848685870bb0690d18310c178

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base

FP8

0376eb85528b177c914b3a435c6d34456f1ce16bd9287c7e9f22392d87de0441

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

canny

FP8

ea523d996ab2f281ca305f7de7f36f348f8203a8fe72e0bb7620931a50d82fb6

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

depth

FP8

2a971111162d9d9a60648fd97c3d5338501b538e017c302589b7c920fc81bde1

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base+canny+depth

FP8

1f9080e10c8ffc4ae59d15277171b0ee3fef9b987f9b45410920ad41f7c15cde

NVIDIA L40

TensorRT

768-1344x768-1344

base

FP8

fde1571bb1c3127b047f5e7ab37b48c893b055988473bab4fc5399874b964337

NVIDIA L40

TensorRT

768-1344x768-1344

canny

FP8

52035cc50f1e63c3cba7319f8e365f23e29442d11f768b6b87e11eea3de5cd38

NVIDIA L40

TensorRT

768-1344x768-1344

depth

FP8

1c55cd56fd15786b7729a3880defc5ef4284904f99f7bc6912c46e9620c43021

NVIDIA L40

TensorRT

768-1344x768-1344

base+canny+depth

FP8

a12aa1e722ccc7f7685ea7663009cea0d02c49d38fce981f8177eaa6ad8e1341

If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. Pytorch checkpoints are not quantized so they consume more GPU memory.

GPU

Backend

Resolution

Variant

Precision

Model Profile ID

Generic

PyTorch

768-1344x768-1344

base

BF16

f0d0d4ac2ea5b121defa3e82a1fe82f289856cf5db49aa99e670e8851d8f0305

Generic

PyTorch

768-1344x768-1344

canny

BF16

351a04dd6ca4e445f1ae4fe0da0190133c79ed4eedd2965e5da41cbb2b48826c

Generic

PyTorch

768-1344x768-1344

depth

BF16

7280cf728c45505c1a8def558d9c18534096c0fe9a976b138818e31b33e859b7

Generic

PyTorch

768-1344x768-1344

base+canny+depth

BF16

f02c296542632aef64d11cbb13026c2502da2c290cc5b05f507a4922eedd1dda

FLUX.1-schnell Model Profiles#

GPU

Backend

Resolution

Precision

Model Profile ID

GeForce RTX 5090 (Beta)

TensorRT

768-1344x768-1344

FP4

1b2d236d5fa4e0425e80ff17c9480ed73f2a66a5190a102299b3c9b8936670ff

GeForce RTX 5090 Laptop (Beta)

TensorRT

768-1344x768-1344

FP4

ac727b88271b5dc493e23ade2568954e0deaa1d76a2227a6670d6ed821fb9953

GeForce RTX 5080 (Beta)

TensorRT

768-1344x768-1344

FP4

365d6883d978bb2f2c00f5af2678115e0d92c2d09f1fe4f8bcdd813b8d731a5f

GeForce RTX 5080 Laptop (Beta)

TensorRT

768-1344x768-1344

FP4

9ad7ffac9b8260d15ab286637d444363a6899159e903e9cce3594a58be1489f9

GeForce RTX 5070TI (Beta)

TensorRT

768-1344x768-1344

FP4

34f18766736a8842a8248cffc18881bf850d04698ab1e25fa7e9fc65fae82688

GeForce RTX 4090 (Beta)

TensorRT

768-1344x768-1344

FP8

9b8c05dd711ea235c7390c838a54730dd762466484996275c6b362ed3c87d4f7

GeForce RTX 4090 Laptop (Beta)

TensorRT

768-1344x768-1344

FP8

93bd95c143ef54b2ad47c47ac3d5742f9fff5ff80baa8228ce19a8577a75ebc8

GeForce RTX 4080 (Beta)

TensorRT

768-1344x768-1344

FP8

96295541130ed46e3de0b25d1a95e7409784bb19da7b4b97cb586eac4e4ab778

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

768-1344x768-1344

FP8

9ed3f545f2316939af1984fb703115e9b706f8c7e9b4eb452f37f86a06df5bbc

GeForce RTX 5090D (Beta)

TensorRT

768-1344x768-1344

FP4

c6d1fad563e06a49946adfa773b9117b0485ec7cd0640386f0a5884bb350a51a

GeForce RTX 4090D (Beta)

TensorRT

768-1344x768-1344

FP8

ea9115a32e460d58aa89e79baee8fa1668305d5a74558d81ebfddb41a2fb3c28

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

FP8

0376eb85528b177c914b3a435c6d34456f1ce16bd9287c7e9f22392d87de0441

NVIDIA L40

TensorRT

768-1344x768-1344

FP8

fde1571bb1c3127b047f5e7ab37b48c893b055988473bab4fc5399874b964337

If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. Pytorch checkpoints are not quantized so they consume more GPU memory.

GPU

Backend

Resolution

Precision

Model Profile ID

Generic

PyTorch

768-1344x768-1344

BF16

f0d0d4ac2ea5b121defa3e82a1fe82f289856cf5db49aa99e670e8851d8f0305

FLUX.1-Kontext-dev Model Profiles#

GPU

Backend

Resolution

Precision

Model Profile ID

GeForce RTX 5090 (Beta)

TensorRT

672-1568x672-1568

FP4

a623d76701f895b250ad61eef210ad558deeca589d44d83e30b37075676f79f3

GeForce RTX 5090 Laptop (Beta)

TensorRT

672-1568x672-1568

FP4

3d9f14cd01d73e42ad3cb60ce8df5473de8263eb45e007528969712b0848f8c5

GeForce RTX 5080 (Beta)

TensorRT

672-1568x672-1568

FP4

bae3820c3c78f368301c5bbc9101cf96846caaa10a6a5205ba7ee742bf7c3564

GeForce RTX 5080 Laptop (Beta)

TensorRT

672-1568x672-1568

FP4

17eff85032b1c40910c756eeaf465067027387c25243123a8089108c6054e375

GeForce RTX 5070 TI (Beta)

TensorRT

672-1568x672-1568

FP4

53a3284a85134f8c8876ccf20679f821ef4a08f006c5a64eb4d6009260b78804

GeForce RTX 4090 (Beta)

TensorRT

672-1568x672-1568

FP8

aae60c464edafd9174c2daea072232985164fe50f389cfa16e61cf922a05f117

GeForce RTX 4090 Laptop (Beta)

TensorRT

672-1568x672-1568

FP8

84e34939ba4f4b1fe0156dd5bfe65ddbe2085ac05bc6e359adecda9069064ae4

GeForce RTX 4080 (Beta)

TensorRT

672-1568x672-1568

FP8

70e514f75dd04bfd0055ba6b41264729ca6ff1e1770023e343638a07cbdce475

NVIDIA RTX PRO 6000 Blackwell Workstation Edition (Beta)

TensorRT

672-1568x672-1568

FP4

53644af888d735870504a9cf7c8fee102174ee8a7d5ead4d9817f782c3209384

NVIDIA RTX PRO 6000 Blackwell Server Edition (Beta)

TensorRT

672-1568x672-1568

FP4

cd8e01f3b8e6fe80279a058a91bdaae21863325610e3e4df37bc15acfa599ac5

NVIDIA RTX 6000 Ada Generation (Beta)

TensorRT

672-1568x672-1568

FP8

284335f2acc80ef87c88c5a48c29085e7cac04e6dc15ad6f864dc6e59a22b52a

NVIDIA H100 SXM

TensorRT

672-1568x672-1568

FP8

66de937b2053d47cd7a508757fc3286c6e700815d746f8248b9c3541ed13fde5

NVIDIA L40

TensorRT

672-1568x672-1568

FP8

cf4230921dcf21f6ed9d1013e920eb4246cf693940295f248041745e68ec7a80

If your GPU model is not listed, use one of the generic model profiles below with the PyTorch backend, or create a custom model profile for your GPU by following these instructions. Because PyTorch checkpoints are not quantized, they consume more GPU memory.

GPU

Backend

Resolution

Precision

Model Profile ID

Generic

PyTorch

672-1568x672-1568

BF16

6ca915ecc7893f828bf55d1882f7b3e85469edffac70bee357ea23269a870a40

Stable Diffusion 3.5 Large Model Profiles#

GPU

Backend

Resolution

Variant

Precision

Model Profile ID

NVIDIA A100 SXM

TensorRT

768-1344x768-1344

base

BF16

693c545b76b1d00523fc565442e113d767d8128e15674ffd970f24b13e1bfdb2

NVIDIA A100 SXM

TensorRT

768-1344x768-1344

base+canny

BF16

45b4c6d2fe2be3e1fbf4d70ed6d378a4379e0c36cd9bdda53b9766cd163a16e6

NVIDIA A100 SXM

TensorRT

768-1344x768-1344

base+depth

BF16

52abc4ce7424ac2edcaf1c6b4498bbd2657b444c8cc46514b2e4044a2c33657a

NVIDIA A100 SXM

TensorRT

768-1344x768-1344

base+canny+depth

BF16

a5bd2e1d205c571b83f8e2ecf7ac35e29527b4c6f239f3fb8ea60787ae8c7515

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base

BF16

f6c6df4fbaa14cb58201c9acbb344607bd0e6e5ff94ca8414d9cb0fa9885df05

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base+canny

BF16

c1f289eaa4b12bf6e3e97a9f83d1a828a3329c6d92f91ad28f903f91b2b69665

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base+depth

BF16

a8c2c467570f53442215ad7cf4c83bd281f4d319e186bfd3470ee4556cff676b

NVIDIA H100 SXM

TensorRT

768-1344x768-1344

base+canny+depth

BF16

8ce9a981ce8c4310c4a04d647668af7075e1528f86195227dab31de4618afea4

NVIDIA L40S

TensorRT

768-1344x768-1344

base

BF16

7cb2b5e947e27f6f052a71c1e0aab137c2e81f18b3955cd896c3cca812b6feb0

NVIDIA L40S

TensorRT

768-1344x768-1344

base+canny

BF16

840a6ac59f9413b45c3e2bf448dbe5776121ca3044e7204f201b4ee7040efbfd

NVIDIA L40S

TensorRT

768-1344x768-1344

base+depth

BF16

da4df560774cf40f867ecf1bfa851262c10df0f9516d8c9c38bfb5aeacbc4e22

NVIDIA L40S

TensorRT

768-1344x768-1344

base+canny+depth

BF16

2a9061d7b3aad091acbddb74320268312d7b2fa203937c971387491d7bad107b

If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. TensorRT provides on average 1.75x speedup for all variants.

GPU

Backend

Resolution

Variant

Precision

Model Profile ID

Generic

PyTorch

768-1344x768-1344

base

BF16

8f23b2ce12d64905748147c73adbfe79fffabb0c2d9fa8dc95f4942dbb03d522

Generic

PyTorch

768-1344x768-1344

base+canny

BF16

90b353eb2436047c674431dce2075e1b1f934b0e96d5bcc45db891e70f50c2d7

Generic

PyTorch

768-1344x768-1344

base+depth

BF16

83803d2005e31831fd4bf4be11e49c5336ca103d046c62fcd7f15cbf991f9e80

Generic

PyTorch

768-1344x768-1344

base+canny+depth

BF16

e6fd83b7f23171dade4ab3605d49a206b55394cd0ad8d8f474b2189b32c51521