Usage Credits#
Important
Credits are in very early stages and based on product needs all provisions and definitions with respect to credits are subject to change.
In order to provide fair access for all users, TensorRT-Cloud uses credits to manage GPU resources. The credits provided are currently free and refreshed per-day at 00:00 UTC.
Currently, each credit represents one minute of utilization on a GPU enabled node in the TensorRT-Cloud cluster.
What consumes credits:
Any utilization time of a GPU node.
What doesn’t consume credits:
Queue waiting time.
Model upload and download time on the user’s machine.
Other system overhead.
For example, a build job that takes 10 minutes to complete end to end might only consume 7 GPU minutes if the actual GPU processing time was 7.
GPU-Specific Credit Allocation#
TensorRT-Cloud manages GPU resources through two types of credit pools.
General “any” GPU credits:
Most GPU types use the general credit pool.
This shared pool appears as “any” GPU type in credit reports.
Specialized GPU credits:
Some high-demand GPU types will have dedicated credit limits.
These specialized allocations ensure fair access in case of limited resources.
To view your available credit balances for each GPU type, run:
$ trt-cloud credits
┌──────────────────────────────────────────────────────────────────────────────│ GPU: Any (these credits can be used with any GPU) │
│ Available Credits: 2999.3 GPU minutes │
│ Credits per Day: 3000.0 GPU minutes │
└──────────────────────────────────────────────────────────────────────────────
[I] The table above shows your available usage credit balance for builds and sweeps.
Consuming Credits#
Although a build/sweep will always eventually consume credits equal to the GPU minutes that were used, the time that a TensorRT build takes can vary significantly depending on the size of the model, the configuration parameters that were used, and more. For this reason, the exact credit usage of a build/sweep is not precisely known upfront. However, as an example, one engine build of Llama-3.1-8B consumes around 2 GPU minutes. For a sweep, the credits consumed would scale linearly with the number of trials that were benchmarked.
Since the credit consumption is unknown upfront, a build/sweep will reserve a certain number of credits when started. This amount will be displayed in the confirmation prompt:
$ trt-cloud sweep ...
[I] You currently have 2999.3 available GPU minutes which can be used with any GPU.
192 GPU minutes will be reserved up front for this engine build.
...
This reserved amount will be immediately deducted from your available credit balance. As the build/sweep progresses, the actual GPU usage will be tracked and any usage beyond the reserved amount will be further deducted from your credit balance. If the build/sweep ends up using less than the reserved GPU minutes, the unused reserved minutes will be refunded to your balance. This means that after completion, the build/sweep will have only consumed exactly the number of GPU minutes that were actually used.