Models#
About Model Profiles#
The models for NVIDIA NIM microservices use model engines that are tuned for specific NVIDIA GPU models, number of GPUs, precision, and so on. NVIDIA produces model engines for several popular combinations and these are referred to as model profiles. Each model profile is identified by a unique 64-character string of hexadecimal digits that is referred to as a profile ID.
The available model profiles are stored in a file in the NIM container file system.
The file is referred to as the model manifest file and the default path is /opt/nim/etc/default/model_manifest.yaml
in the container.
FLUX.1-dev Model Profiles#
FLUX.1-dev is a collection of generative image AI models creating high quality, realistic images. FLUX.1-dev generates images from simple text prompts, while FLUX.1-Depth-dev and FLUX.1-Canny-dev enable greater control by combining the text prompt with an image input to guide the output image structure.
GPU |
Backend |
Resolution |
Variant |
Precision |
Model Profile ID |
---|---|---|---|---|---|
GeForce RTX 5090 (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP4 |
1b2d236d5fa4e0425e80ff17c9480ed73f2a66a5190a102299b3c9b8936670ff |
GeForce RTX 5090 (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP4 |
8b42564dd5dc5dc021b47027fc25e8de3c3f20541b06643b80143facd338480b |
GeForce RTX 5090 (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP4 |
66188a8ebcad93374ef35c7fb89df3db16ea9176aee3515ad1a4d333d9fc8676 |
GeForce RTX 5090 (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP4 |
b44b6dbfc4d414f5b2d11c401606380d616939bf4f9470de78b9e25de6f143e3 |
GeForce RTX 5090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP4 |
ac727b88271b5dc493e23ade2568954e0deaa1d76a2227a6670d6ed821fb9953 |
GeForce RTX 5090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP4 |
1907fccdb6a42689ee3d448d6a93ca911f8674c2aa1ebc81b7d1f7db436eecc1 |
GeForce RTX 5090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP4 |
e9d0786a812eda295914d5c7e4e1a9c989324912af3f73eeaa9631eda616d78f |
GeForce RTX 5090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP4 |
9bd6fd188f53bce2eb42f11f81fadd1d11c3823c506b7a1c96b705f6c5e41b3a |
GeForce RTX 5080 (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP4 |
365d6883d978bb2f2c00f5af2678115e0d92c2d09f1fe4f8bcdd813b8d731a5f |
GeForce RTX 5080 (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP4 |
36c44753a9a188e8a36e717c4cd2d08c7c8cc4281f59c750cfda49bd9e72a0bf |
GeForce RTX 5080 (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP4 |
387b0d749f1f6c39f7dd9b57e1e6872f809c6bf0422c71cda164be32c0fb7d79 |
GeForce RTX 5080 (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP4 |
b454d497c90956b1bf546720c1df00c1888865050d72290191f36ada319ecc6c |
GeForce RTX 5080 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP4 |
9ad7ffac9b8260d15ab286637d444363a6899159e903e9cce3594a58be1489f9 |
GeForce RTX 5080 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP4 |
c8cfa63ee8cba592b3f52edefa18a5fda9e8f512ee3da8bc938a90336a0e75ea |
GeForce RTX 5080 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP4 |
3bdeb471bb31950b3a7a759b5dea3aeb80083fd328a2cee445463fcf79141373 |
GeForce RTX 5080 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP4 |
b912befc35951aa88e450d4b0ff7ec9576688c44f434cec624d814f954b16c10 |
GeForce RTX 5070TI (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP4 |
34f18766736a8842a8248cffc18881bf850d04698ab1e25fa7e9fc65fae82688 |
GeForce RTX 5070TI (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP4 |
d65f03f4d849fd152ce78961bec9868652db607f7e7f8d02eeea68de9e964cfc |
GeForce RTX 5070TI (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP4 |
48e32cc14e07205437fa4484893e017fe6ce7149de6ef3b935e61482cc43d3e7 |
GeForce RTX 5070TI (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP4 |
094a53dd3d6b4a67e8ba8b215f996acb0f0114afc8b1a2503068ebd7e2dc4b67 |
GeForce RTX 4090 (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
9b8c05dd711ea235c7390c838a54730dd762466484996275c6b362ed3c87d4f7 |
GeForce RTX 4090 (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
6a4f28dc7ce68a6f63cf4361cbe84341932d2c61acd6725e08fe222725be53b3 |
GeForce RTX 4090 (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
e8bf15bd38e3766339517218899a9a0ec63f4ca9d6d7086f99115b617dcf71f2 |
GeForce RTX 4090 (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
5ec1a6c7284f4e55127ffdceae12684c0a50242cfdeff940f53e359dc636b267 |
GeForce RTX 4090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
93bd95c143ef54b2ad47c47ac3d5742f9fff5ff80baa8228ce19a8577a75ebc8 |
GeForce RTX 4090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
75af532c3d833d82ad27fab8bb190f60fbb3a91b0cf70bea33d294a7c8ce5baf |
GeForce RTX 4090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
bdf9998149e94cdaf5221aa9baebc30f27449925da8eac6d1508cd945cdb643a |
GeForce RTX 4090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
61070e912036a7a9140d5e64126bd623522293ea3076c2f167d963a94c863b13 |
GeForce RTX 4080 (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
96295541130ed46e3de0b25d1a95e7409784bb19da7b4b97cb586eac4e4ab778 |
GeForce RTX 4080 (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
842dee095df8d6ad5a2b8678605e677fea46882ff1eba1ecde76a186e8b0d1c5 |
GeForce RTX 4080 (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
b490222762872588294023feaecc384bbba054ae06256abd7b166d5e007cb764 |
GeForce RTX 4080 (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
2a0ff19006f215b4dbb2266240c12d91ac6a005c402124c3ca3916141096fd0a |
NVIDIA RTX 6000 Ada Generation (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
9ed3f545f2316939af1984fb703115e9b706f8c7e9b4eb452f37f86a06df5bbc |
NVIDIA RTX 6000 Ada Generation (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
cc19715f2bd209a45773ec4131c346b4c88b44d3e8f67145e719d63f6bf512d4 |
NVIDIA RTX 6000 Ada Generation (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
a02d1b01eb43980224ebc91a471d415be2886849bce69374e9c2a63289d8debe |
NVIDIA RTX 6000 Ada Generation (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
cf766e0c4e718cccf1e771e27d7bb8181120ea21219533f8d9d166f1df1bbedd |
GeForce RTX 5090D (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP4 |
c6d1fad563e06a49946adfa773b9117b0485ec7cd0640386f0a5884bb350a51a |
GeForce RTX 5090D (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP4 |
a1f563c2ce47feeff632d0306083ad45e05d268cffb080a34caf5f2ed14ebbcc |
GeForce RTX 5090D (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP4 |
9b45f1c8bb44d13e6d6067799e90f472001845bd76bbe4da9669214deda62eda |
GeForce RTX 5090D (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP4 |
d4ffdd037cbdb279689bc6f5cd969de4cdf2e63b47edc055413b759cc25bdcff |
GeForce RTX 4090D (Beta) |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
2cf27ae9a70fb4d765e646530d14d26f380fb4cefe3c93555faaf2d84061e475 |
GeForce RTX 4090D (Beta) |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
24f330eafd299ac785cc72f70cfb8d64ec1c15e16766e55ab570e6e97ef57d8b |
GeForce RTX 4090D (Beta) |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
8964aba253650b90dc4bf8cd24e4c139ebd54518a9b546cb05cc2e2f23155a39 |
GeForce RTX 4090D (Beta) |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
f3036de58626350a45af7c1d24b77bed31feb35848685870bb0690d18310c178 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
0376eb85528b177c914b3a435c6d34456f1ce16bd9287c7e9f22392d87de0441 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
ea523d996ab2f281ca305f7de7f36f348f8203a8fe72e0bb7620931a50d82fb6 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
2a971111162d9d9a60648fd97c3d5338501b538e017c302589b7c920fc81bde1 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
1f9080e10c8ffc4ae59d15277171b0ee3fef9b987f9b45410920ad41f7c15cde |
NVIDIA L40 |
TensorRT |
768-1344x768-1344 |
base |
FP8 |
fde1571bb1c3127b047f5e7ab37b48c893b055988473bab4fc5399874b964337 |
NVIDIA L40 |
TensorRT |
768-1344x768-1344 |
canny |
FP8 |
52035cc50f1e63c3cba7319f8e365f23e29442d11f768b6b87e11eea3de5cd38 |
NVIDIA L40 |
TensorRT |
768-1344x768-1344 |
depth |
FP8 |
1c55cd56fd15786b7729a3880defc5ef4284904f99f7bc6912c46e9620c43021 |
NVIDIA L40 |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
FP8 |
a12aa1e722ccc7f7685ea7663009cea0d02c49d38fce981f8177eaa6ad8e1341 |
If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. Pytorch checkpoints are not quantized so they consume more GPU memory.
GPU |
Backend |
Resolution |
Variant |
Precision |
Model Profile ID |
---|---|---|---|---|---|
Generic |
PyTorch |
768-1344x768-1344 |
base |
BF16 |
f0d0d4ac2ea5b121defa3e82a1fe82f289856cf5db49aa99e670e8851d8f0305 |
Generic |
PyTorch |
768-1344x768-1344 |
canny |
BF16 |
351a04dd6ca4e445f1ae4fe0da0190133c79ed4eedd2965e5da41cbb2b48826c |
Generic |
PyTorch |
768-1344x768-1344 |
depth |
BF16 |
7280cf728c45505c1a8def558d9c18534096c0fe9a976b138818e31b33e859b7 |
Generic |
PyTorch |
768-1344x768-1344 |
base+canny+depth |
BF16 |
f02c296542632aef64d11cbb13026c2502da2c290cc5b05f507a4922eedd1dda |
FLUX.1-schnell Model Profiles#
GPU |
Backend |
Resolution |
Precision |
Model Profile ID |
---|---|---|---|---|
GeForce RTX 5090 (Beta) |
TensorRT |
768-1344x768-1344 |
FP4 |
1b2d236d5fa4e0425e80ff17c9480ed73f2a66a5190a102299b3c9b8936670ff |
GeForce RTX 5090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
FP4 |
ac727b88271b5dc493e23ade2568954e0deaa1d76a2227a6670d6ed821fb9953 |
GeForce RTX 5080 (Beta) |
TensorRT |
768-1344x768-1344 |
FP4 |
365d6883d978bb2f2c00f5af2678115e0d92c2d09f1fe4f8bcdd813b8d731a5f |
GeForce RTX 5080 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
FP4 |
9ad7ffac9b8260d15ab286637d444363a6899159e903e9cce3594a58be1489f9 |
GeForce RTX 5070TI (Beta) |
TensorRT |
768-1344x768-1344 |
FP4 |
34f18766736a8842a8248cffc18881bf850d04698ab1e25fa7e9fc65fae82688 |
GeForce RTX 4090 (Beta) |
TensorRT |
768-1344x768-1344 |
FP8 |
9b8c05dd711ea235c7390c838a54730dd762466484996275c6b362ed3c87d4f7 |
GeForce RTX 4090 Laptop (Beta) |
TensorRT |
768-1344x768-1344 |
FP8 |
93bd95c143ef54b2ad47c47ac3d5742f9fff5ff80baa8228ce19a8577a75ebc8 |
GeForce RTX 4080 (Beta) |
TensorRT |
768-1344x768-1344 |
FP8 |
96295541130ed46e3de0b25d1a95e7409784bb19da7b4b97cb586eac4e4ab778 |
NVIDIA RTX 6000 Ada Generation (Beta) |
TensorRT |
768-1344x768-1344 |
FP8 |
9ed3f545f2316939af1984fb703115e9b706f8c7e9b4eb452f37f86a06df5bbc |
GeForce RTX 5090D (Beta) |
TensorRT |
768-1344x768-1344 |
FP4 |
c6d1fad563e06a49946adfa773b9117b0485ec7cd0640386f0a5884bb350a51a |
GeForce RTX 4090D (Beta) |
TensorRT |
768-1344x768-1344 |
FP8 |
ea9115a32e460d58aa89e79baee8fa1668305d5a74558d81ebfddb41a2fb3c28 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
FP8 |
0376eb85528b177c914b3a435c6d34456f1ce16bd9287c7e9f22392d87de0441 |
NVIDIA L40 |
TensorRT |
768-1344x768-1344 |
FP8 |
fde1571bb1c3127b047f5e7ab37b48c893b055988473bab4fc5399874b964337 |
If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. Pytorch checkpoints are not quantized so they consume more GPU memory.
GPU |
Backend |
Resolution |
Precision |
Model Profile ID |
---|---|---|---|---|
Generic |
PyTorch |
768-1344x768-1344 |
BF16 |
f0d0d4ac2ea5b121defa3e82a1fe82f289856cf5db49aa99e670e8851d8f0305 |
FLUX.1-Kontext-dev Model Profiles#
GPU |
Backend |
Resolution |
Precision |
Model Profile ID |
---|---|---|---|---|
GeForce RTX 5090 (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
a623d76701f895b250ad61eef210ad558deeca589d44d83e30b37075676f79f3 |
GeForce RTX 5090 Laptop (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
3d9f14cd01d73e42ad3cb60ce8df5473de8263eb45e007528969712b0848f8c5 |
GeForce RTX 5080 (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
bae3820c3c78f368301c5bbc9101cf96846caaa10a6a5205ba7ee742bf7c3564 |
GeForce RTX 5080 Laptop (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
17eff85032b1c40910c756eeaf465067027387c25243123a8089108c6054e375 |
GeForce RTX 5070 TI (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
53a3284a85134f8c8876ccf20679f821ef4a08f006c5a64eb4d6009260b78804 |
GeForce RTX 4090 (Beta) |
TensorRT |
672-1568x672-1568 |
FP8 |
aae60c464edafd9174c2daea072232985164fe50f389cfa16e61cf922a05f117 |
GeForce RTX 4090 Laptop (Beta) |
TensorRT |
672-1568x672-1568 |
FP8 |
84e34939ba4f4b1fe0156dd5bfe65ddbe2085ac05bc6e359adecda9069064ae4 |
GeForce RTX 4080 (Beta) |
TensorRT |
672-1568x672-1568 |
FP8 |
70e514f75dd04bfd0055ba6b41264729ca6ff1e1770023e343638a07cbdce475 |
NVIDIA RTX PRO 6000 Blackwell Workstation Edition (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
53644af888d735870504a9cf7c8fee102174ee8a7d5ead4d9817f782c3209384 |
NVIDIA RTX PRO 6000 Blackwell Server Edition (Beta) |
TensorRT |
672-1568x672-1568 |
FP4 |
cd8e01f3b8e6fe80279a058a91bdaae21863325610e3e4df37bc15acfa599ac5 |
NVIDIA RTX 6000 Ada Generation (Beta) |
TensorRT |
672-1568x672-1568 |
FP8 |
284335f2acc80ef87c88c5a48c29085e7cac04e6dc15ad6f864dc6e59a22b52a |
NVIDIA H100 SXM |
TensorRT |
672-1568x672-1568 |
FP8 |
66de937b2053d47cd7a508757fc3286c6e700815d746f8248b9c3541ed13fde5 |
NVIDIA L40 |
TensorRT |
672-1568x672-1568 |
FP8 |
cf4230921dcf21f6ed9d1013e920eb4246cf693940295f248041745e68ec7a80 |
If your GPU model is not listed, use one of the generic model profiles below with the PyTorch backend, or create a custom model profile for your GPU by following these instructions. Because PyTorch checkpoints are not quantized, they consume more GPU memory.
GPU |
Backend |
Resolution |
Precision |
Model Profile ID |
---|---|---|---|---|
Generic |
PyTorch |
672-1568x672-1568 |
BF16 |
6ca915ecc7893f828bf55d1882f7b3e85469edffac70bee357ea23269a870a40 |
Stable Diffusion 3.5 Large Model Profiles#
GPU |
Backend |
Resolution |
Variant |
Precision |
Model Profile ID |
---|---|---|---|---|---|
NVIDIA A100 SXM |
TensorRT |
768-1344x768-1344 |
base |
BF16 |
693c545b76b1d00523fc565442e113d767d8128e15674ffd970f24b13e1bfdb2 |
NVIDIA A100 SXM |
TensorRT |
768-1344x768-1344 |
base+canny |
BF16 |
45b4c6d2fe2be3e1fbf4d70ed6d378a4379e0c36cd9bdda53b9766cd163a16e6 |
NVIDIA A100 SXM |
TensorRT |
768-1344x768-1344 |
base+depth |
BF16 |
52abc4ce7424ac2edcaf1c6b4498bbd2657b444c8cc46514b2e4044a2c33657a |
NVIDIA A100 SXM |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
BF16 |
a5bd2e1d205c571b83f8e2ecf7ac35e29527b4c6f239f3fb8ea60787ae8c7515 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
base |
BF16 |
f6c6df4fbaa14cb58201c9acbb344607bd0e6e5ff94ca8414d9cb0fa9885df05 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
base+canny |
BF16 |
c1f289eaa4b12bf6e3e97a9f83d1a828a3329c6d92f91ad28f903f91b2b69665 |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
base+depth |
BF16 |
a8c2c467570f53442215ad7cf4c83bd281f4d319e186bfd3470ee4556cff676b |
NVIDIA H100 SXM |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
BF16 |
8ce9a981ce8c4310c4a04d647668af7075e1528f86195227dab31de4618afea4 |
NVIDIA L40S |
TensorRT |
768-1344x768-1344 |
base |
BF16 |
7cb2b5e947e27f6f052a71c1e0aab137c2e81f18b3955cd896c3cca812b6feb0 |
NVIDIA L40S |
TensorRT |
768-1344x768-1344 |
base+canny |
BF16 |
840a6ac59f9413b45c3e2bf448dbe5776121ca3044e7204f201b4ee7040efbfd |
NVIDIA L40S |
TensorRT |
768-1344x768-1344 |
base+depth |
BF16 |
da4df560774cf40f867ecf1bfa851262c10df0f9516d8c9c38bfb5aeacbc4e22 |
NVIDIA L40S |
TensorRT |
768-1344x768-1344 |
base+canny+depth |
BF16 |
2a9061d7b3aad091acbddb74320268312d7b2fa203937c971387491d7bad107b |
If your GPU model is not listed, you can use the below generic model profiles with Pytorch backend or create the model profile for your GPU using these instructions. TensorRT provides on average 1.75x speedup for all variants.
GPU |
Backend |
Resolution |
Variant |
Precision |
Model Profile ID |
---|---|---|---|---|---|
Generic |
PyTorch |
768-1344x768-1344 |
base |
BF16 |
8f23b2ce12d64905748147c73adbfe79fffabb0c2d9fa8dc95f4942dbb03d522 |
Generic |
PyTorch |
768-1344x768-1344 |
base+canny |
BF16 |
90b353eb2436047c674431dce2075e1b1f934b0e96d5bcc45db891e70f50c2d7 |
Generic |
PyTorch |
768-1344x768-1344 |
base+depth |
BF16 |
83803d2005e31831fd4bf4be11e49c5336ca103d046c62fcd7f15cbf991f9e80 |
Generic |
PyTorch |
768-1344x768-1344 |
base+canny+depth |
BF16 |
e6fd83b7f23171dade4ab3605d49a206b55394cd0ad8d8f474b2189b32c51521 |