Amazon Web Services Setup Guide
Overview
In this guide, we will go through the steps needed to:
Understand the architecture of the infrastructure we will be setting up to host Metropolis applications on the Amazon Web Services (AWS).
Perform the necessary steps to procure the pre-requisite access and information to use the automated deployment scripts.
Create one or more deployments of Metropolis applications using the automated deployment scripts.
Verify the deployment.
Tear down the created infrastructure when no longer required.
Infrastructure Layout
Metropolis application setup on AWS requires several AWS resources to be created such as EC2 instances, Security Groups, Application load balancer, etc., While there are several patterns that can be followed to bring up infrastructure needed for Metropolis, here is one way we will be working to achieve.

In addition to bringing up AWS resources, we also will have to work on downloading the Metropolis application and its dependency artifacts, configure them and install.
These automation scripts will help you simplify that by abstracting out the complexity and allowing user to work with majorly 2 files viz., deploy-template.yml
and secrets.sh
deploy-template.yml
is an abstraction of infra specification we need for bringing up Metropolis applications. At a high level, we define the base infrastructure specifications (e.g. VPC); then add COTURN infrastructure (e.g. EC2 instance) and Application infrastructure specifications (e.g. GPU instance).
COTURN infrastructure and Application infrastructure will be established on base infrastructure specified on this deploy-template.yml
.
To manage multiple environments with single deploy-template.yml
, you could add multiple base infrastructure elements and respective COTURN and application infrastructure elements for each base. However, need to map each of COTURN and Application infrastructure elements to base using correct reference mapping as discussed in later part of documentation.
secrets.sh
will be used as mechanism for user to provide secrets of two categories.
Secrets such as ‘AWS secret access key’ so that automation program can interact with AWS account.
Note
We will skip some optional features such as Auto Scaling in this reference setup.
Important
Many of the resources in this setup may not fall in Free tier, you can check AWS billing reference pages for understanding cost implications.
Prerequisites
This setup guide assumes you have the following conditions met:
AWS access Keys for IAM user
On your AWS account, procure access key ID and secret access key for programmatic access to your AWS resources.
Prefer to obtain a non root IAM user with administrator access.
Refer to AWS documentation here
S3 Bucket for Backend state
This script uses S3 buckets to store the references to the resources that it spins up.
Create an S3 bucket to be used to store the deployment state.
Ensure the bucket is not public accessible but rather only to your account (such as using the keys procured in the previous step).
Reference to AWS documentation here.
DynamoDB Table for Backend state
This script uses DynamoDB tables to prevent concurrent access to the same deployment as they are being spun up.
Create a DynamoDB table to be used to manage access to the deployment state.
Define the Partition key as LockID and type String.
The Sort key need not be defined.
Reference AWS documentation here.
Access to an Ubuntu 20.04 based machine, on a user with sudo privileges to run the automated deployment scripts.
Setup Nvidia GPU Cloud (NGC) API Key by following instructions here.
SSH key pair to access the instances we are going to setup.
You may use existing SSH Key pair for this access or create a new pair.
Reference documentation to create a public private ssh key pair is available here.
Note
The same pre-requisites provisioned here can be used for multiple projects, and can be considered as a one time setup for most scenarios unless the parameters are not acceptable for any deployment.
Prepare deployment config
Download & extract deployment artifact
Setting up NGC CLI
Setup NGC cli tool on Ubuntu 20.04 machine by following instructions from this page.
Select ‘AMD64 Linux Install’ tab for Ubuntu installation.
During NGC config set command, select ‘nfgnkvuikvjm’ as Org and ‘mdx-v2-0’ as team.
Download Oneclick Setup Scripts
Using below commands, download and extract the contents of deployment artifact and navigate to the deployment directory:
# download the artifact $ ngc registry resource download-version "nfgnkvuikvjm/mdx-v2-0/metropolis-aws-one-click-script:0.0.5" $ cd metropolis-aws-one-click-script_v0.0.5/ # verify necessary files required for Installing Infra on AWS CSP $ ls README.md deploy-spec examples modules working-deploy-template.yml deploy-template.yml mtmc-app-deploy secrets.sh $
Download & extract application helm values
Refer to this section to download and extract the contents of reference application helm values & sample data.
Prepare secrets
The file secrets.sh can be setup as below so as not have to commit and push sensitive data as part of deploy-template.yml
secrets.sh
#!/bin/bash # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: LicenseRef-NvidiaProprietary # # NVIDIA CORPORATION, its affiliates and licensors retain all intellectual # property and proprietary rights in and to this material, related # documentation and any modifications thereto. Any use, reproduction, # disclosure or distribution of this material and related documentation # without an express license agreement from NVIDIA CORPORATION or # its affiliates is strictly prohibited. # _aws_access_key_id -> AWS access key id to create resources export _aws_access_key_id='<replace_content_between_quotes_with_your_value>' # _aws_secret_access_key -> AWS secret access key to create resources export _aws_secret_access_key='<replace_content_between_quotes_with_your_value>' # _ssh_public_key -> Your public ssh key's content export _ssh_public_key='<replace_content_between_quotes_with_your_value>' # _ngc_api_key -> Your ngc api key value export _ngc_api_key='<replace_content_between_quotes_with_your_value>' # _turnserver_password -> Password for turn server export _turnserver_password='<replace_content_between_quotes_with_your_value>' # _mlops_aws_access_key_id -> AWS access key id to access mlops bucket
Important
You may want to be careful on whether or not to commit this file to your version control system as it contains secrets.
Prepare deploy template
- Deploy Template Schema & Configuration
Deploy template
deploy-template.yml
is used to compile the infrastructure needed to setup your project/environment(s). It has separate sections to capture details for different needs such as provider config, coturn-infra etc., As shown in below layout diagram, you can choose to create one or more environments and infrastructure(s) under single project name.Override the content of
deploy-template.yml
file with your environment/application specific values. This will drive the configuration of Infrastructure and application being installed.deploy-template.yml
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: LicenseRef-NvidiaProprietary # # NVIDIA CORPORATION, its affiliates and licensors retain all intellectual # property and proprietary rights in and to this material, related # documentation and any modifications thereto. Any use, reproduction, # disclosure or distribution of this material and related documentation # without an express license agreement from NVIDIA CORPORATION or # its affiliates is strictly prohibited. # NOTE: Refer to examples for various configuration options project_name: '<replace-with-unique-name-to-identify-your-project>' description: '<add-a-brief-description-about-this-project>' template_version: '0.1.1' csp: 'aws' backend: encrypt: true dynamodb_table: '<replace-with-pre-created-deployment-state-dynamo-db-table-name>' bucket: '<replace-with-pre-created-deployment-state-bucket-name>' region: '<replace-with-aws-region-where-pre-created-deployment-state-bucket-exists>' access_key: '${_aws_access_key_id}' secret_key: '${_aws_secret_access_key}' provider: region: '<replace-with-aws-region-to-create-resources-in>' access_key: '${_aws_access_key_id}' secret_key: '${_aws_secret_access_key}' base_infra: # NOTE: Repeat below section for as many base setups as necessary <your-base-env-name>: spec: vpc_cidr_block: '<replace-with-an-available-cidr-range>' ssh_public_key: '${_ssh_public_key}' dev_access_ipv4_cidr_blocks: - '<replace-with-list-of-dev-ip-cidrs>' user_access_ipv4_cidr_blocks: - '<replace-with-list-of-user-ip-cidrs>' base_domain: '<replace-with-the-dns-hosted-zone-under-which-apps-will-be-registered>' coturn_infra: # set as {} in case no COTURN environment is needed # NOTE: Repeat below section for as many COTURN environments as necessary <your-coturn-env-name>: base_ref: '<replace-with-your-base-env-name>' # NOTE: should match the name of a base env defined in the above section spec: turnserver_realm: '<replace-with-a-realm-to-use-for-the-turnserver>' turnserver_username: '<replace-with-a-username-to-use-for-the-turnserver>' turnserver_password: '${_turnserver_password}' # NOTE: value of _turnserver_password assumed to be provided in secrets.sh app_infra: # NOTE: Repeat below section for as many app environments as necessary <your-application-env-name>: base_ref: '<replace-with-your-base-env-name>' # NOTE: should match the name of a base env defined in the above section # NOTE: Uncomment below line in case app environment should use one of the setup COTURN environments #coturn_ref: '<replace-with-your-coturn-env-name>' # NOTE: should match the name of a COTURN env with same base ref defined in the above section spec: ngc_api_key: '${_ngc_api_key}' # NOTE: value of _ngc_api_key assumed to be provided in secrets.sh # NOTE: Uncomment below section in case use_twilio is true # NOTE: Uncomment any of the below lines based on the need to override # --- OPTIONAL CONFIGURATION START --- app_instance_type: 'g4dn.12xlarge' app_instance_data_disk_size_gb: 1024 foundational_chart: org: "nfgnkvuikvjm" team: "mdx-v2-0" name: "mdx-foundation-sys-svcs" version: "v1.3" app_chart: org: "nfgnkvuikvjm" team: "mdx-v2-0" name: "mdx-mtmc-app" version: "1.0.36" app_override_values_file: <abs_path>/mtmc_app_values.yaml nvstreamer_app_chart: org: "rxczgrvsg8nx" team: "vst-1-0" name: "nvstreamer" version: "0.2.23" nvstreamer_app_override_values_file: <abs_path>/nvstreamer_app_values.yaml vst_app_chart: org: "rxczgrvsg8nx" team: "vst-1-0" name: "vst" version: "1.0.24" vst_app_override_values_file: <abs_path>/vst_app_values.yaml ds_app_chart: org: "nfgnkvuikvjm" team: "mdx-v2-0" name: "mdx-wdm-ds-app" version: "0.0.32" wdm_ds_app_override_values_file: <abs_path>/wdm_ds_app_values.yaml ### Note: We dont support in this version of aws one-click scripts to spin up additional LBs for node-ports. # ---- OPTIONAL CONFIGURATION END ----
Explanation of each and every entry of this yml file are explained in below table:
Deploy Template Parameter name
Type
Optional
Description
project_name
string
A unique name to identify the project. This is important to tear down resources later.
description
string
A brief description of the project.
template_version
string
0.1.1
backend
map
Backend configuration.
backend > encrypt
bool
Whether to encrypt the state while stored in S3 bucket.
backend > dynamodb_table
string
Name of the AWS Dynamo DB table used to manage concurrent access to the state.
backend > bucket
string
Name of the AWS S3 bucket in which state of the resources provisioned is stored.
backend > region
string
AWS region where state S3 bucket and Dynamo DB table are created.
backend > access_key
string
AWS access key ID used to access the backend bucket and table. Prefer to provide via variable in secrets.sh.
backend > secret_key
string
AWS secret access key used to access the backend bucket and table. Prefer to provide via variable in secrets.sh.
provider
map
Provider configuration.
provider > region
string
AWS region where resources of the application will be deployed.
provider > access_key
string
AWS access key ID used to provision resources. Prefer to provide via variable in secrets.sh.
provider > secret_key
string
AWS secret access key used to provision resources. Prefer to provide via variable in secrets.sh.
base_infra
map
Base for app configuration.
base_infra > KEY
map
An instance of base configuration. There can be 1 or more instances.
base_infra > KEY > spec
map
Configuration specifications of this base instance.
base_infra > KEY > spec > vpc_cidr_block
string
Private CIDR range in which base coturn and app resources will be created.
base_infra > KEY > spec > ssh_public_key
string
Content of the public key of the ssh key-pair used for instance access. Prefer to provide via variable in secrets.sh.
base_infra > KEY > spec > dev_access_ipv4_cidr_blocks
array
CIDR ranges from where SSH access should be allowed.
base_infra > KEY > spec > user_access_ipv4_cidr_blocks
array
CIDR ranges from where application UI and API will be allowed access.
coturn_infra
map
COTURN instance used in app configuration.
coturn_infra > KEY
map
An instance of COTURN configuration. There can be 0 or more instances.
coturn_infra > KEY > base_ref
string
The key name of the base instance that should be used to set up this COTURN instance.
coturn_infra > KEY > spec > turnserver_realm
string
Realm name used during COTURN setup.
coturn_infra > KEY > spec > turnserver_username
string
Username used to connect to COTURN server.
coturn_infra > KEY > spec > turnserver_password
string
Password used to connect to COTURN server. Prefer to provide via variable in secrets.sh.
app_infra
map
Application configuration.
app_infra > KEY
map
An instance of application configuration. There can be 1 or more instances.
app_infra > KEY > base_ref
string
The key name of the base instance that should be used to set up this app instance.
app_infra > KEY > coturn_ref
string
yes
The key name of the COTURN instance that should be used to set up this app instance.
app_infra > KEY > spec > ngc_api_key
string
NGC API key with access to deployment artifacts. Prefer to provide via variable in secrets.sh.
app_infra > KEY > spec > gpu_driver_version
string
yes
Deploy the gpu operator with desired GPU Driver version using CNS. Defaults to 535.104.12.
app_infra > KEY > spec > api_vm_size
string
yes
The AWS VM size on which the API would run. Defaults to g5.48xlarge.
app_infra > KEY > spec > api_vm_data_disk_size_gb
number
yes
The data disk size in GB on which the API would run. Defaults to 1024.
app_infra > KEY > spec > sdg_enable
boolean
yes
Defaults to false. Please make sure to set to true if simulated data needs to be pre-populated for RTLS app UI workflow.
app_infra > KEY > spec > app_sdg_data_ngc_resource_url
string
yes
NGC aritfact url RTLS app with simulated videos data loaded. Defaults latest release version.
app_infra > KEY > spec > app_ngc_k8s_values_res_url
string
yes
NGC aritfact url for RTLS app with images and calibration file. Defaults to latest release version.
app_infra > KEY > spec > foundational_chart
map
yes
Configuration to change the default APP chart used
app_infra > KEY > spec > foundational_chart > org
string
NGC Org of the foundational_chart to be used
app_infra > KEY > spec > foundational_chart > team
string
NGC Team of the foundational_chart to be used
app_infra > KEY > spec > foundational_chart > name
string
NGC Resource Name of the foundational_chart to be used
app_infra > KEY > spec > foundational_chart > version
string
NGC Resource Version of the foundational_chart to be used
app_infra > KEY > spec > foundational_override_values_file
string
absolute path to foundational_override_values_file
app_infra > KEY > spec > app_chart
map
yes
Configuration to change the default APP chart used
app_infra > KEY > spec > app_chart > org
string
NGC Org of the APP chart to be used
app_infra > KEY > spec > app_chart > team
string
NGC Team of the APP chart to be used
app_infra > KEY > spec > app_chart > name
string
NGC Resource Name of the APP chart to be used
app_infra > KEY > spec > app_chart > version
string
NGC Resource Version of the APP chart to be used
app_infra > KEY > spec > app_override_values_file
string
absolute path to app_override_values_file
app_infra > KEY > spec > nvstreamer_app_chart
map
yes
Configuration to change the default NvStreamer App Chart used
app_infra > KEY > spec > nvstreamer_app_chart > org
string
NGC Org of the NvStreamer App Chart to be used
app_infra > KEY > spec > nvstreamer_app_chart > team
string
NGC Team of the NvStreamer App Chart to be used
app_infra > KEY > spec > nvstreamer_app_chart > name
string
NGC Resource Name of the NvStreamer App Chart to be used
app_infra > KEY > spec > nvstreamer_app_chart > version
string
NGC Resource Version of the NvStreamer App Chart to be used
app_infra > KEY > spec > nvstreamer_app_override_values_file
string
absolute path to nvstreamer_app_override_values_file
app_infra > KEY > spec > vst_app_chart
map
yes
Configuration to change the default VST App chart used
app_infra > KEY > spec > vst_app_chart > org
string
NGC Org of the VST App chart to be used
app_infra > KEY > spec > vst_app_chart > team
string
NGC Team of the VST App chart to be used
app_infra > KEY > spec > vst_app_chart > name
string
NGC Resource Name of the VST App chart to be used
app_infra > KEY > spec > vst_app_chart > version
string
NGC Resource Version of the VST App chart to be used
app_infra > KEY > spec > vst_app_override_values_file
string
absolute path to vst_app_override_values_file
app_infra > KEY > spec > ds_app_chart
map
yes
Configuration to change the default WDM-DS App chart used
app_infra > KEY > spec > ds_app_chart > org
string
NGC Org of the WDM-DS App chart to be used
app_infra > KEY > spec > ds_app_chart > team
string
NGC Team of the WDM-DS App chart to be used
app_infra > KEY > spec > ds_app_chart > name
string
NGC Resource Name of the WDM-DS App chart to be used
app_infra > KEY > spec > ds_app_chart > version
string
NGC Resource Version of the WDM-DS App chart to be used
app_infra > KEY > spec > wdm_ds_app_override_values_file
string
absolute path to wdm_ds_app_override_values_file
app_infra > KEY > spec > target_group_configs
map
yes
AWS Target Group configuration to deploy additional endpoint based on application needs.
app_infra > KEY > spec > target_group_configs > kibana-tg
map
yes
Configuration need to make kibana dashboard endpoint available.
app_infra > KEY > spec > target_group_configs > kibana-tg > port
string
yes
Kibana dashboard node port to enable endpoint. MDX apps uses the 31560 as default port to view kibana dashabord UI.
app_infra > KEY > spec > target_group_configs > kibana-tg > health_check
map
yes
Health Check configuration to make kibana dashboard is healthy and is valid endpoint.
app_infra > KEY > spec > target_group_configs > kibana-tg > health_check > port
string
yes
Health Check port to validate the endpoint probe check.
app_infra > KEY > spec > target_group_configs > kibana-tg > health_check > port > path
string
yes
Health Check kibana path to validate the endpoint probe check for AWS LB healthcheck probe.
app_infra > KEY > spec > target_group_configs > grafana-tg
map
yes
Configuration need to make grafana dashboard endpoint available.
app_infra > KEY > spec > target_group_configs > grafana-tg > port
string
yes
Grafana dashboard node port to enable endpoint. MDX apps uses the 32300 as default port to view grafana dashabord UI.
app_infra > KEY > spec > target_group_configs > grafana-tg > health_check
map
yes
Health Check configuration to make grafana dashboard is healthy and is valid endpoint.
app_infra > KEY > spec > target_group_configs > grafana-tg > health_check > port
string
yes
Health Check port to validate the endpoint probe check.
app_infra > KEY > spec > target_group_configs > grafana-tg > health_check > port > path
string
yes
Health Check grafana path to validate the endpoint probe check for AWS LB healthcheck probe.
Setup logs backup
Audit logs for any changes made via the script will be captured in a directory named logs
at the same level as the deploy-template.yml
.
Take necessary measures to ensure these are backed up in the event they are needed for debugging.
Note
Any values defined in secrets.sh
will be masked in the logs.
Deploy infrastructure and application
Use the below commands to Install / Update / Uninstall Metropolis applications along with its infrastructure as per specs provided in deploy-template.
# To view available options bash mtmc-app-deploy # To preview changes based on deploy-template.yml without actually applying the changes bash mtmc-app-deploy preview # To install changes showed in preview option based on deploy-template.yml bash mtmc-app-deploy install # To show results/information about the project installed bash mtmc-app-deploy show-results # To uninstall the deployed infra and application bash mtmc-app-deploy uninstall
Important
Both install
and uninstall
Options needs to be run with care. We recommend preview
option to see the changes before install
.
If you are looking for an option to print the details of your past installation, use show-results
option.
Warning
Any attempts to suspend Ctrl + Z the running deployment will result in an inability to make changes to the project via the scripts as well as the need to manually cleanup resources created via the web console. Prefer terminating the process using Ctrl + C in case it has to absolutely be exited.
Verify Deployment
On successful deployment of Infra, you will be displayed output in a format as shown below.
Apply complete! Resources: <nn> added, <nn> changed, <nn> destroyed. Outputs: S3_Bucket_details = { "<bastion_infra key>" = "<S3_Bucket_Name>" } app_infra = { "<app_infra key>" = { "private_ips" = [ "<private_ip_of_app_instace>", ] } } app_infra = { "<app_infra key>" = { alb_dns_name = <dns_name_for_aws_lb> } } bastion_infra = { "<bastion_infra key>" = { "private_ip" = "<bastion_instance_private_ip>" "public_ip" = "<bastion_instance_public_ip>" } } coturn_infra = { "<coturn_infra key" = { "port" = 3478 "private_ip" = "<coturn_instance_private_ip>" "public_ip" = "<coturn_instance_public_ip>" } }
Use ssh command in below format to log into Application instance.
Replace content between '<' and '>' with its appropriate values. #pem file refered here must the key associated to the public key used in initial steps of setup. ssh -i <path-to-pem-file> -o StrictHostKeyChecking=no -o ProxyCommand="ssh -i <path-to-pem-file> -W %h:%p -o StrictHostKeyChecking=no ubuntu@<bastion-vm-public-ip>" ubuntu@<app-vm-private-ip> #To connect to Isaac Sim via SSH: ssh -i state/<deployment-name>/key.pem -o StrictHostKeyChecking=no ubuntu@<AWS-VM-Public-IP>
Once logged into the terminal, run below command to see the Kubernetes Pods’ statuses. All the pods should turn into Running
state eventually.
$ kubectl get pods ## If, for some reason, pods are failing to start or run healthily, please check the logs to identify failure or pod crash issues. The `-f` flag can be used to follow the logs. $ kubectl logs <pod-name> or kubectl logs -f <pod-name>
Note
Based on several conditions, Pods may take up to 30-40 mins to turn into
Running
state.To check the logs for pods that are not in the
Running
state, usekubectl logs <pod-name>
.
Once all the pods are in Running
state, access the UI using the alb_dns_name
attribute from the printed output. The App UI can be found at http://<alb_dns_name>/ui/<app-name>/ (supported app names are mtmc, rtls and people-analytics)
.
When you try your URL on browser, you should be able to see the Metropolis application coming up at this point.
Teardown infrastructure and application
To teardown all the infrastructure along with application that we created thru above scripts, run bash mtmc-app-deploy uninstall
command.
Important
Both install
and uninstall
Options needs to be run with care. We recommend preview
option to see the changes before install
.
If you are looking for an option to print the details of your past installation, use show-results
option.