Security Restrictions and Cluster Limitations#

Security Restrictions for Kubernetes#

DGX Cloud Create is designed to ensure that security best practices are adhered to by customer workloads. This is for the benefit of customers as well as the platform itself. Note that a set of policies implemented in DGX Cloud Create that will block customer workloads that attempt to use the following Kubernetes features:

  • Access to container runtime Unix socket

  • Access to host networking

  • Access to sysctl

  • ClusterRoleBinding modification

  • Custom Security-Enhanced Linux (SELinux) security options

  • HostPath mounts

  • Privileged containers

  • Privilege escalation

These restrictions also assume that any interaction with Kubernetes resources or APIs is performed within the scope of the permissions granted to each user, as described in Kubernetes Usage for Researchers and Advanced Kubernetes Usage for Admins. Installation of additional components in system namespaces is not permitted.

Cluster Limitations#

NVIDIA DGX Cloud Create is a managed service. Because of this, the NVIDIA Run:ai offering on DGX Cloud Create is slightly restricted, and some NVIDIA Run:ai functionality is disabled. If you have questions about your NVIDIA DGX Cloud Create cluster, reach out to your TAM.

The following restrictions are in place on your DGX Cloud Create Cluster:

  • Security controls and policies restricting access to Kubernetes cluster resources and capabilities as described in Security Restrictions for Kubernetes.

    Note

    As a result of these security restrictions, certain components, Helm charts, or blueprints included as a part of AI Enterprise may not be supported or compatible with DGX Cloud Create at this time. For more information and guidance, contact your TAM.

  • NVIDIA Nsight Systems / nsys profiling capabilities are limited due to the security restrictions described in Security Restrictions for Kubernetes. The CAP_SYS_ADMIN capability is not allowed, and perf_event_paranoid is limited to a value of 4 as described here. As a result, we cannot currently meet the requirements described in the Nsight Systems Installation Guide.

  • Limited capabilities in NVIDIA Run:ai user roles as described in Customer Admin Roles and NVIDIA Admin Roles.

    Note

    As a result of these limited user roles, customer admins and users are not permitted to create/edit clusters, nodes and node pools, edit NVIDIA Run:ai control plane configurations or settings, or any additional permissions granted by NVIDIA Run:ai’s System Administrator role. For more information and guidance, contact your TAM.

  • NVIDIA Blueprints are not supported at this time.

  • Multi-node inference is not supported at this time.

  • GPU splitting or allocation of partial GPUs is not available.

  • The deprecated ‘jobs’ workload type is not supported and has not been tested.

  • The use of multiple GPU types in a single cluster is not supported.

  • The DGX Cloud Create cluster is not SOC2 Type 2 nor HIPAA compliant.

  • For GCP-based DGX Cloud Create clusters, there is a limitation of a maximum of 100 TB per PVC available for the Zonal storage class described in PVC.

  • The Block volume mode for PVCs is unsupported.

  • PVC Data Sources created at the cluster or department level do not currently replicate data across projects or namespaces. Each project or namespace will be provisioned as a separate PVC replica with different underlying PVs; therefore, the data in each PVC is not replicated.