Application Scaling Guide

Scaling the Deployment

In this guide, we will walk you through scaling the reference Metropolis workflows. We will cover horizontal scaling, which involves adding more streams to the deployment, and vertical scaling, which involves increasing computational resources by adding more powerful GPU nodes to the deployment.

RTLS Horizontal Scaling

  • To add more streams for the RTLS analytics streaming pipeline within a Kubernetes (k8s) cluster, ensure that the system has spare GPUs available to scale horizontally for handling the increased workload.

  • Once GPU availability is confirmed, adjust the Helm configuration files to accommodate the additional streams for the DeepStream pipeline. The necessary configuration file tuning is explained below.

  • Another approach to scaling the analytics pipeline is to add more GPU-equipped nodes to the Kubernetes cluster, which allows the system to handle more streams efficiently.

- WDM_MAX_REPLICAS - This parameter needs to be adjusted to the number of total spare GPUs available in the cluster - 1 (1x GPU will be required for media service - VST).

Note

  • Config files can be downloaded following the steps from here application-helm-configs-. DeepStream helm config override values needs to be updated, example file - wdm-deepstream-mtmc-values.yaml.

  • After updating the config file, users need to re-install or upgrade the deployment with the new helm config file. The helm install command can be found here here.

  • Live deployments can be updated using the kubectl set env command to adjust the WDM_MAX_REPLICAS parameter. Here is the set env command for the same - kubectl set env deployment/metropolis-wdm-agent-deployment -n default WDM_MAX_REPLICAS=<new-value>. The default value is set to 3.

CSP Horizontal Scaling

  • Horizontal scaling in CSP (Cloud Service Provider) environments within a Kubernetes (k8s) cluster involves adding more GPUs to existing nodes or deploying additional GPU-equipped nodes across the cluster. This increases the total GPU resources available for handling workloads, enabling more efficient parallel processing and workload distribution.

  • To scale horizontally within the k8s cluster, deploy additional GPU instances or configure existing nodes to utilize multiple GPUs. Ensure that Kubernetes is configured to schedule workloads appropriately across the expanded GPU resources, using node labels, taints, and resource requests.

Configuration Changes for Additional Nodes
  • CSP environments are set up using the deploy script. Detailed instructions for deploying the CSP environment can be found here.

  • The deploy script utilizes a configuration template file to set up the infrastructure based on predefined settings. A sample configuration template can be viewed here.

  • To add multiple nodes, include additional worker keys under the nodes section in the configuration file. Each key represents a new node that will be added to the Kubernetes cluster based on the specified instance type.

nodes:
  worker:
    type: 'g5.48xlarge' ### Example for adding an additional of A10-GPU-based EC2 instance.
    labels: {}
    taints: []

Note

  • Add multiple keys as per the requirements for provisioning a multi-node K8s cluster. Example worker-1, worker-2 … worker-n each key needs base config like instance type as manadatory to avoid provisioning issues using Nv-One-Click deploy scripts.

Configuration Changes for additional GPU(s) on an Exisitng Node
  • Scaling streams with additional spares GPU can be done using below WDM config parameter. WDM available config parameters can be found here.

- WDM_MAX_REPLICAS - This parameter needs to be adjusted to the number of total spare GPUs available in the cluster - 1 (1x GPU will be required for media service - VST).
  • Based on GPU type streams per single GPU for DeepStream can be adjusted using below configs.

- WDM_WL_THRESHOLD - This parameter needs to be adjusted based on the GPU type available in the cluster.

A matrix comparing different GPUs can be found here. This matrix is useful for determining the optimal stream count to achieve better accuracy while maximizing GPU utilization during stream scaling.

On-prem Horizontal Scaling

  • In on-premises environments, horizontal scaling within a Kubernetes (k8s) cluster involves adding more GPUs to existing nodes or deploying additional GPU-equipped nodes within the cluster. This increases the total GPU resources available for handling workloads within the cluster.

  • Ensure that the new GPUs or nodes are properly integrated into the Kubernetes cluster. Update the k8s configurations, such as node labels and taints, to ensure that workloads are effectively scheduled across the added GPU resources.

Configuration Changes for adding additional nodes
  • Details about creating the on-prem Kubernetes cluster can be found in this section.

  • To add additional nodes, users need to update the hosts file with the new nodes to be included in the Metropolis application Kubernetes cluster. For an example, refer to this guide.

  • Please refer to the example provided for updating the hosts file.

nano hosts

[master]
<master-IP> ansible_ssh_user=nvidia ansible_ssh_pass=nvidipass ansible_sudo_pass=nvidiapass ansible_ssh_common_args='-o StrictHostKeyChecking=no'
[nodes]
<worker-IP> ansible_ssh_user=nvidia ansible_ssh_pass=nvidiapass ansible_sudo_pass=nvidiapass ansible_ssh_common_args='-o StrictHostKeyChecking=no'

Note

  • Multiple master/nodes can be added to the cluster. We recommend adding additional under nodes list for cluster creation with multiple nodes.

RTLS Vertical Scaling

  • Vertical scaling within a Kubernetes (k8s) cluster involves upgrading to larger GPUs to enhance processing power for the RTLS analytics pipeline.

  • To scale vertically, replace existing GPUs on the nodes with larger GPU models. This process may require adjustments to node configurations and Kubernetes settings to ensure compatibility with the upgraded GPUs. Configuration-related changes are explained in detail below to achieve better results.

  • Validate that the Kubernetes cluster and nodes can support the larger GPUs, and update any related Helm charts or deployment settings to fully utilize the enhanced GPU capabilities.

CSP Vertical Scaling

  • Vertical scaling in CSP environments within a Kubernetes (k8s) cluster involves upgrading to larger, more powerful GPUs within the same instance or node (e.g., using instance types with A100/H100 GPUs from your preferred CSP). This approach enhances the processing capability of individual instances by utilizing GPUs with greater performance specifications.

  • To scale vertically within the k8s cluster, replace existing GPUs with larger models or adjust node configurations to utilize these more powerful GPUs, ensuring that the instance or node can handle the increased performance requirements.

  • Users can follow same instructions for scaling streams based on the upgrade to more powerful GPUs, allowing the system to accommodate additional streams effectively.

Configuration Changes

<TBD if really required> same instructions

On-prem Vertical Scaling

  • In on-premises environments within a Kubernetes (k8s) cluster, vertical scaling involves upgrading existing GPUs to larger, more powerful models (e.g., H100/A100 GPUs). This process enhances the performance of individual nodes by equipping them with GPUs that offer higher processing power and capability.

  • When upgrading GPUs, ensure that the new hardware is compatible with the existing system and that any necessary adjustments are made to support the increased performance. Verify that the infrastructure can accommodate the new GPUs’ power and cooling requirements.

  • Users can follow same instructions for scaling streams based on the GPU upgrade, allowing the system to accommodate additional streams by leveraging the increased GPU compute capacity.

Configuration Changes

<TBD if really required> same instructions

SDG Scaling

  • To enhance IsaacSim simulation performance, we recommend utilizing more compute resources, such as additional GPUs or multi-GPU nodes. This configuration will generate SDG output faster than using a single GPU.

  • Below is a matrix detailing SDG performance results based on different L40S GPU types.

- GPU Type: L40S
- Performance Metrics: [Include specific metrics or performance data here]
- Configuration Details: [Specify configurations or setups for each GPU type]
- <TBD - Krishna to provide links if exists>