NVIDIA NVLink is a high-speed interconnect technology that allows for memory-sharing between GPUs. Sharing is allowed between all GPUs in an NVLink Partition. An NVLink Partition must consist of GPUs within the same NVLink Domain, which can be a single NVL72 rack or two NVL36 racks cabled together.
NVIDIA Infra Controller (NICo) allows you to do the following with NVLink:
NICo extends the concept of an NVLink Partition with the NVLink Logical Partition, which allows users to manage NVLink Partitions without having to learn the datacenter topology.
Note: NVLink Partitioning is only supported for GB200 compute nodes.
NVLink splits between operator site setup against NMX-M / NMX-C and tenant
partition management. Notably, several operator steps (NMX-C endpoint
registration, the GPU-mapping populate step) are not exposed via the REST
API and are therefore driven through nico-admin-cli over gRPC. See
Network Isolation → Who configures what, and how
for the role and interface model.
The operator NMX setup (the first four rows) is detailed under
Enabling NMX-C-based NVLink Partitioning
and Enabling NMX-M-based NVLink Partitioning.
The tenant rows are the REST / nicocli flow described next.
NICo users can create NVLink Logical Partitions and plan GPU assignments using NVLink Interfaces for Instances (as described in steps 1-2). NICo can also automatically generate NVLink Interfaces and assign them to Instances (as described in step 3).
In general, the steps are:
The user creates a NVLink Logical Partition using the POST /v2/org/{org}/nico/nvlink-logical-partition REST API endpoint. NICo creates an entry in the database and returns an NVLink Logical Partition ID. At this point, there is no underlying NVLink Partition associated with the NVLink Logical Partition.
When creating an Instance, the user specifies NVLink Interface configuration for each GPU by referencing their preferred NVLink Logical Partition ID in the POST /v2/org/{org}/nico/instance REST API endpoint request.
a. If this is the first Instance to be added to specified NVLink Logical Partitions, NICo Core will create and assign NVLink Partitions for them and add the Instance GPUs to the NVLink Partitions.
Note: To ensure that machines in the same Rack are assigned to the same NVLink Partition, an Instance Type can be created for the Rack and all Machines in the Rack assigned to the same Instance Type. Alternatively users can use the Batch Instance creation REST API endpoint and set
topologyOptimizedtotrue.
If the user does not want to specify NVLink Interfaces for each GPU when creating an Instance, they can:
a. Create a new VPC by specifying a value for nvLinkLogicalPartitionId or update an existing VPC with no Instances to set the nvLinkLogicalPartitionId attribute. We will refer to this as the default NVLink Logical Partition for the VPC.
b. When creating an Instance in this VPC, user does not need to specify NVLink Interfaces, NICo will automatically generate NVLink Interfaces for the Instance and assign them to the VPC’s NVLink Logical Partition.
c. All Instances created within this VPC will have their GPUs assigned to the same NVLink Partition as long as the Instances end up in the same Rack.
d. If there is no space in the Rack where the NVLink Partition for an NVLink Logical Partition is located, NICo will create a new NVLink Partition in a different Rack for the same NVLink Logical Partition and continue to assign the Instance GPUs to it.
Important: If Instances are in different Racks, they will not be able to share memory with each other despite having the same NVLink Logical Partition.
If a NICo user wants to update an Instance to change NVLink Logical Partition assignment for its GPUs, they can do so by calling the PATCH /v2/org/{org}/nico/instance/{instance-id} REST API endpoint
The user can specify the NVLink Logical Partition ID for each GPU in the Instance by passing the nvLinkInterfaces list.
If Instance’s VPC has a default NVLink Logical Partition, no changes to the NVLink Logical Partition assignment are allowed.
If a user de-provisions an Instance, NICo will remove the Instance GPUs from the NVLink Partition.
A NICo user can call DELETE /v2/org/{org}/nico/nvlink-logical-partition/{nvLinkLogicalPartitionId} to delete an NVLink Logical Partition. This call will only succeed if there are no active Instances associated with the NVLink Logical Partition.
A NICo user can call GET /v2/org/{org}/nico/instance/{instance-id} to retrieve information about an Instance. As part of the 200 response body, NICo will return a nvLinkInterfaces list that includes both the nvLinkLogicalPartitionId and nvLinkDomainId for each GPU in the Instance.
It’s an optional default, not a constraint. VPCs can be created with or without a default NVLink Logical Partition.
It is optional on both POST .../vpc (Create VPC) and PATCH .../vpc/{vpcId} (Update VPC).
What setting it on a VPC actually does
It’s a convenience default for instance creation. When nvLinkLogicalPartitionId is set on the VPC, you don’t have to specify nvLinkInterfaces on POST .../instance (Create Instance) or POST .../instance/batch (Batch Create Instances) — the API will auto-populate the per-GPU NVLink Interfaces to reference that VPC’s NVLink Logical Partition.
That’s the entire effect. It does not reserve or lock the NVLink Logical Partition to the VPC.
No exclusivity between VPCs
We intentionally don’t restrict an NVLink Logical Partition to a single VPC. The same nvLinkLogicalPartitionId may be set on multiple VPCs. This is deliberate, to preserve flexibility in how you plan networking and NVLink partitioning.
You can also manage NVLink at the Instance level
If you want finer control, leave nvLinkLogicalPartitionId unset on the VPC and specify nvLinkInterfaces directly on Create Instance — each entry binds a specific deviceInstance (GPU) to an explicit nvLinkLogicalPartitionId, so different GPUs in the same instance (or across Instances in the same VPC) can operate in different NVLink Logical Partitions.
Summary
NICo runs a periodic reconciler against NMX-M and NMX-C to keep the actual
NVLink partition topology aligned with the desired state implied by tenant
instance configurations. The behaviour matters whenever an operator is
diagnosing latency between an API call and an instance becoming Ready.
Each reconciliation pass does the following:
configs_synced.nvlink field is derived from these
observations and is what gates the instance’s Ready state.Cadence is set by nvlink_config.monitor_run_interval (default 60s).
The reconciler exposes metrics under the
nico_nvlink_partition_monitor_* namespace. Useful ones:
When an instance is released (via ReleaseInstance):
When a Logical Partition is deleted, every underlying NVLink Physical Partition on each NMX-M / NMX-C endpoint backing it is also deleted. The deletion is rejected if any instance still references the Logical Partition.
When a host is force-deleted, the instance running on it is implicitly released and the above cleanup path runs. Operators do not need to detach NVLink configuration manually before force-deleting.
NMX-C is the gRPC control path for NVLink partition management and is the current default for new deployments. NMX-M remains supported and is covered in the next section; the two are not mutually exclusive and a single site may use both.
The TOML toggles live alongside the NMX-M ones under [nvlink_config]:
Unlike NMX-M, where a single endpoint URL is set in TOML, NMX-C endpoints
are per-chassis and stored in the NICo database. Register them with
nico-admin-cli, keyed by the chassis serial:
update and delete subcommands follow the same pattern. The reconciler
picks up new endpoints on the next iteration; no restart is required.
The TLS material in TOML applies uniformly to every NMX-C endpoint NICo talks to. Per-endpoint credential overrides are not currently supported; deploy a uniform trust posture across the site’s NMX-C control plane.
This section describes how to enable NVLink support via the NMX-M platform.
Enable NVLink Partitioning in nico-core config. Add or update the configmap nico-api-site-config-files consumed by nico-core:
Restart nico-core
Configure the NMX-M credentials. Store the NMX-M username and password in vault through nico admin CLI:
Populate the NVLink GPU mapping. After enabling NVLink in the site config, for already discovered machines, populate the machine-to-NMX-M GPU mapping. Partitioning will not work until this step is complete.
Validate the NVLink configuration for NMX-M:
nico_nvlink_partition_monitor_nmxm_connect_error_count is 0.After an instance has been created and the reconciler has had at least one opportunity to run, an operator can confirm correct placement with the following checks. There is no single all-in-one health command; the steps below should be repeatable as a checklist.
Reconciler is running.
nico_nvlink_partition_monitor_iteration_latency is being recorded
and both nico_nvlink_partition_monitor_nmxc_connect_error_count
and nico_nvlink_partition_monitor_nmxm_connect_error_count are
flat.
Logical-partition count matches expectation.
nico_nvlink_partition_monitor_num_logical_partitions reflects
the partitions a site planner expects to exist. A sudden change is
worth correlating with recent tenant API activity.
Per-instance configuration has converged. The instance’s
InstanceStatus reports configs_synced.nvlink = true and the
nvLinkInterfaces list on the instance shows the expected
nvLinkLogicalPartitionId and nvLinkDomainId for each GPU.
Per-machine GPU placement.
Returns the machine’s NVLink GPU status observations, including the Domain each GPU is currently assigned to. Use this to confirm that two instances expected to share an NVLink Logical Partition have actually landed in the same NVLink Domain — instances in different Domains cannot share GPU memory regardless of having the same Logical Partition ID.
Cleanup after release. After releasing an instance, the same
machine nvlink-info output should show an empty Domain assignment
on the affected GPUs within one or two reconcile intervals. Failure
to clear indicates the reconciler could not remove the GPU from its
NMX-M / NMX-C partition; investigate the corresponding connect-error
and op-latency metrics.