Azure#
Introduction#
Purpose#
This document serves as a comprehensive resource for deploying Tokkio Pipeline on the AZURE CSP using OneClick scripts. This guide aims to streamline the deployment process by providing detailed instructions from preparing necessary configuration templates to invoking OneClick scripts.
Scope#
While there are several possible ways to setup Tokkio Pipeline on the AZURE CSP, this document covers setting up Tokkio Pipeline on the AZURE with necssary infrastructure in it.
Prerequisites#
Azure#
This setup guide assumes you have the following conditions met:
Access to the Azure portal via a user with Admin access to at least one Subscription.
An Azure service principal to enable the automated deployment scripts to authenticate themselves with.
An Azure storage account and container to host the state of the automated deployment scripts, so that the created infrastructure may be modified or torn down at a later date or time.
Registration of a domain on which the Tokkio Application can be hosted.
An Azure app certificate to enable SSL support for the deployed application.
Note
The same pre-requisites provisioned here can be used for multiple projects, and can be considered as a one time setup for most scenarios unless the parameters are not acceptable for any deployment.
Login to the Azure portal#
Log into azure portal as user with admin access.
Click on More Services to get to the page listing all services.
For all subsequent steps, navigate back to this page to find and create a new resource.
Service Principal Setup#
App Registration Service Principal#
From the All Services page:
Select the Identity from the category (on the left).
Select the Azure Active Directory service.
Select App registrations in the service configuration panel (on the left).
Click on the +New registration button to create a new registration.
In the wizard:
Name: Provide an appropriate name. (eg: <my-org>-tokkio-automation)
Supported account types: Select Accounts in this organizational directory only.
Ignore all other fields.
Click Register.
You will be automatically taken to the created App registration. If not:
From All Services, navigate to Azure Active Directory under Identity category
Select App registrations from the service configuration panel
Select the created App Registration.
Select Certificated & secrets from the resource configuration panel (on the left).
Click on the +New client secret button to create a new client secret.
In the wizard:
Description: Provide an appropriate description.
Expires: Provide the period for which this secret will be valid.
Click Add.
Copy the Value of the created client secret. Note: Once you exit this screen, the value will no longer be visible.
Subscription Access grant to App Registration#
From the All Services page:
Select the General from the category (on the left).
Select the Subscriptions service.
Select the name of the subscription under which the Tokkio Application will be deployed.
Select Access control (IAM) from the resource configuration panel (on the left).
Select the Role Assignments tab.
Click on the +Add and then the Add role assignment option to add a new role assignment.
In the wizard:
Role: Select Owner.
Members:
Assign access to: Select User, group, or service principal.
Members: Click +Select members and select the name of the App Registration.
Click on the +Add and then the Add role assignment option to add a new role assignment.
In the wizard:
Role: Select Contributor.
Members:
Assign access to: Select User, group, or service principal.
Members: Click +Select members and select the name of the App Registration.
Resource Group#
From the All Services page:
Select the General from the category (on the left).
Select the Resource groups service.
Click on the +Create button to create a new resource group.
In the wizard:
Subscription: Choose the subscription under which resources will be created.
Resource group: Provide an appropriate name (eg: <<my-org>-tokkio-automation-pre-requisites>)
Region: Choose a region (preferably closest to the users of the application) in which to create the Resource Group.
Navigate Next to optionally configure tags (we will be skipping this through the remainder of this setup).
Click Review + create > Create.
Deployment State Storage#
From the All Services page:
Select the Storage from the category (on the left).
Select the Storage accounts service.
Click on the +Create button to create a new storage account.
In the wizard:
In the Basics section:
Subscription: Subscription of the Resource Group created earlier.
Resource Group: The Resource Group created earlier.
Storage account name: Provide an appropriate name (eg: <myorg>tokkiodeploymentstate)
Region: Region chosen for the Resource Group created earlier.
Performance: Choose Standard.
Redundancy: Choose Locally-redundant storage (LRS).
Leave the remaining fields as is.
Leave all other sections as is.
Optionally add tags under the Tags section.
Click Review + create > Create.
Navigate to the created storage account by clicking on Go to resource or:
From All Services, navigate to Storage accounts under Storage category
Select the created Storage account.
Select Containers from the resource configuration panel (on the left).
Click on the +Container to create a new container.
In the wizard:
Name: Provide an appropriate name (eg: deployment-state)
Public access level: Select Private.
Click Create.
Base Domain#
From the All Services page:
Select the Web from the category (on the left).
Select the App Service Domains service..
Click on the +Create button to create a new domain.
In the wizard:
In the Basics section:
Subscription: Subscription of the Resource Group created earlier.
Resource Group: The Resource Group created earlier.
Domain: Base domain which will be used for subsequent app deployments.
In the Contact information section - Provide the relevant Contact information. Not the email needs to be a valid email ID.
Leave the Advanced section as is.
Optionally add tags under the Tags section.
Click Review + create > Create.
Certificates#
Key Vault#
From the All Services page:
Select the Security from the category (on the left).
Select the Key Vaults service.
Click on the +Create button to create a new key vault.
In the wizard:
In the Basics section:
Subscription: Subscription of the Resource Group created earlier.
Resource Group: The Resource Group created earlier.
Key vault name: Provide an appropriate name (eg: <my-org>-certificates-vault).
Region: Region chosen for the Resource Group created earlier.
Leave remaining fields as is.
In the Access policy section:
Add an additional Access Policy.
Key Permissions: Select Select All.
Secret Permissions: Select Select All.
Certificate Permissions: Select Select All.
Rotation Policy Operations: Select Select All.
Add the App Registration Service Principal created earlier as Principal.
Let all other configurations remain as is.
Click Review + create > Create.
Leave the Networking section as is.
Optionally add tags under the Tags section.
Click Review + create > Create.
Wildcard Certificate#
From the All Services page:
Select the Web from the category (on the left).
Select the App Service Certificates service..
Click on the +Create button to create a new certificate.
In the wizard:
In the Basics section:
Subscription: Subscription of the Resource Group created earlier.
Resource Group: The Resource Group created earlier.
SKU: Wildcard.
Naked domain hostname: Enter *.<base-domain> where <base-domain> is the name of the Base Domain created earlier.
Certificate name: Provide an appropriate name (eg: <my-base-domain>-wildcard-certificate).
Enable auto renewal: Optionally select Disable.
Optionally add tags under the Tags section.
Click Review + create > Create.
Navigate to the created certificate by clicking on Go to resource or:
From All Services, navigate to App Service Certificates under Web category
Select the created Certificate.
Select Certificate Configuration from the resource configuration panel (on the left).
Select Store and then the Select from Key Vault link.
In the wizard:
Subscription: Subscription of the Resource Group created earlier.
Key vault: Name of the Key Vault created earlier.
Navigate back to the created certificate by:
From All Services, navigate to App Service Certificates under Web category
Select the created Certificate.
Select Certificate Configuration from the resource configuration panel (on the left).
Select Verify and then click on Verify.
Wait for Domain Verification to complete.
Increase quota for GPU VM type#
From the All Services page:
Select the Other from the category (on the left).
Select Quotas service.
Click on Compute.
Validate sufficient quota is present to create the application virtual machine by doing the following:
Update the Region filter to match the region where the virtual machines will be created.
In the search box, enter NCASv3_T4.
Check if available usage is at least 64 times the number of instance you wish to run under this setup, if not edit to request for additional quota.
Wait for confirmation that the quota has increased before proceeding.
Hardware#
Controller instance#
The Controller instance is where you will launch OneClick scripts from. Here are the necessary steps and requirements:
Operating System: Ensure that the instance is running Ubuntu 22.04.
SSH key Pair: Generate an SSH key pair to be used in later phase of OneClick scripts. You can follow these steps:
Open a terminal on your Controller instance
Run ssh-keygen to generate a new SSH key pair. You can specify the bit size for added security. e.g., ssh-keygen -b 4096 for a 4096-bit key.
Save the keys to the default location (~/.ssh).
Passwordless sudo access: Ensure the user on Control instance is enabled with password less sudo access.
You can test this by running sudo ls /root and this command shouldn’t prompt you for password.
If this is not setup, reach out to the system administrator to setup one.
Access#
Access to Tokkio artifacts#
Ensure that you have access to all the artifacts used during bring up of Tokkio Pipeline Application. For e.g. Tokkio Application Helm chart on NGC.
Essential Skills and Background#
Familiarity with Command-Line-Interface (CLI)#
Basic Commands: Users should be comfortable with basic command-line operations, such as navigating directories, executing scripts, and managing files.
Environment Configuration: Understanding how the environment variables and how PATH setup works on Linux will greatly help operating OneClick script.
Scripting Basics: Basic scripting knowledge (e.g., shell scripting) is beneficial for understanding how the OneClick script operates and for troubleshooting any issues that may arise.
Familiarity with YAML#
YAML Syntax and Structure: YAML is often used for configuration files in cloud-native applications due to its readability and flexibility. The Configuration templates used in OneClick scipt uses YAML format. Users should be familiar with YAML syntax and structure.
Familiarity with Kubernetes eco system#
Tokkio pipeline is a Cloud native application, and uses concepts like Containerization, Kubernetes, helm etc. Users need to be familiar with these to get the best results from using OneClick scripts and the app.
Kubernetes Basics: Users should have basic understanding of Kubernetes core concepts such as pods, services and deployments
kubectl: Familiarity with the kubectl the command line tool used to interact with Kubernetes clusters including querying the status or logs of Runing application pods etc will.
Helm: Understanding Helm package manager for Kubernetes that simplifies application deployment by managing charts (Collections of pre-configured Kubernetes resource definitions). And how to use helm with override values will help configuring the templates appropriately.
General troubleshooting techniques#
Log analysis & troubleshooting: Users should be able to analyze logs generated by OneClick scripts to identify any errors or warnigns and remediate the issue.
Overall Security considerations#
The security of Tokkio in production environments is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment including the containers up to date, ensure the containers are secure and free of vulnerabilities.
Infrastructure Layout#
Tokkio Application setup on Azure requires several Azure resources to be created such as Virtual Machines, Network Security Groups, Application Gateways, FrontDoor CDN for hosting UI content ..etc. While there are several patterns that can be followed to bring up infrastructure needed for Tokkio, here is one way we will be working to achieve.
In addition to bringing up Azure resources, we also will have to work on downloading Tokkio application and its dependency artifacts, configure them and install.
These OneClick scripts will help you simplify that by abstracting out the complexity and allowing user to work with majorly 1 file viz., config-template.yml
.
config-template.yml
is an abstraction of infra specification we need for bringing up Tokkio Application. At a high level, we define the infrastructure specifications (e.g. Instance types,CIDR to allow access,etc); then add application specifications(e.g. helm charts,secrets,etc).
Important
Many of the resources in this setup may not fall in Free tier, you can check AZURE billing reference pages for understanding cost implications.
Tokkio pipeline installation#
Download Oneclick scripts#
Once you clone ACE Github repo from NVIDIA/ACE.git, navigate to azure directory.
$ cd workflows/tokkio/scripts/one-click/azure
You should be able see envbuild.sh file at the root of this directory. We will be using this command to interact with OneClick scripts. General options of this command can be seen by running ./envbuild.sh
$ ./envbuild.sh Usage: ./envbuild.sh (-v|--version) or: ./envbuild.sh (-h|--help) or: ./envbuild.sh (install/uninstall) (-c|--component <component>) [options] or: ./envbuild.sh (info) [options] install/uninstall components: -c, --component one or more of all/infra/platform/app, pass arg multiple times for more than one install/uninstall options: -f, --config-file path to file containing config overrides, defaults to config.yml -i, --skip-infra skip install/uninstall of infra component -p, --skip-platform skip install/uninstall of platform component -a, --skip-app skip install/uninstall of app component -d, --dry-run don't make any changes, instead, try to predict some of the changes that may occur -h, --help provide usage information info options: -f, --config-file path to file containing config overrides, defaults to config.yml -h, --help provide usage information
Note
envbuild.sh with –component all option installs infra, platform and app components.
infra component is responsible for
Provisioning of Infra on Azure CSP.
Installation & Configuration of Kubernetes Cluster on App VM and Turn Server flavour (Coturn/RP/twilio) on turn vm.
platform component is responsible for
Installing local-path-provisioner chart.
Installing metrics and logging related charts.
Installing ingress-controller chart.
app component is responsible for
Installing required kubernetes namespace for tokkio chart.
Installing required kubernetes secrets for tokkio chart.
Installing tokkio chart.
Installing tokkio UI as static website using Azure CDN and Azure storage account.
With help of envbuild.sh you can uninstall and re-install app component using below commands.
#Uninstall app component using below command
./envbuild.sh uninstall --component app --config-file ./<my-l40-config.yml>
#Install app component using below command
./envbuild.sh install --component app --config-file ./<my-l40-config.yml>
Prepare config-template file#
Make a copy of config-template.yml of your own choice. e.g. cp config-template.yml my-l40-config.yml You can populate the config file as based on definition of each attribute.
All the attributes of the config-template.yml are explained in below table.
# Parameter name
Type
Optional
Description
schema_version
string
Config-Template schema version
name
string
A unique name to identify the infrastructure resources being created by.
spec
map
Infrastructure and Application configuration.
spec > infra
string
Infrastructure configuration.
spec > infra > csp
string
cloud service provider name, in this case azure
spec > infra > backend
map
terraform backend configuration to store state of infrastructure.
spec > infra > backend > tenant_id
string
Azure tenant id of the state storage account.
spec > infra > backend > subscription_id
string
Azure subscription id of the state storage account.
spec > infra > backend > client_id
string
Azure client id of the app registration with access to subscription.
spec > infra > backend > client_secret
string
Azure client secret for the above client id.
spec > infra > backend > resource_group_name
string
Azure resource group name of the state storage account.
spec > infra > backend > storage_account_name
string
Azure storage account name of the state storage account.
spec > infra > backend > container_name
string
Azure storage account container name of the container for state storage in the state storage account.
spec > infra > provider > tenant_id
string
Azure tenant id where application will be deployed.
spec > infra > provider > subscription_id
string
Azure subscription id where application will be deployed.
spec > infra > provider > client_id
string
Azure client id of the app registration with access to subscription.
spec > infra > provider > client_secret
string
Azure client secret for the above client id.
spec > infra > configs
map
Additional infrastructure configuration.
spec > infra > configs > cns
map
yes
Nvidia Cloud Native Stack configuration.More details on Cloud Native Stack can be found here NVIDIA/cloud-native-stack.
spec > infra > configs > cns > version
string
yes
The version of Nvidia Cloud Native Stack to install on the clusters. Defaults to 12.2.
spec > infra > configs > cns > git_ref
string
yes
The git commit hash of Nvidia Cloud Native Stack.by default take master branch’s latest commit hash.
spec > infra > configs > cns > override_values
map
yes
Nvidia Cloud Native Stack values to override while setting up a cluster.
spec > infra > configs > cns > override_values > cns_nvidia_driver
bool
yes
set to yes if want to install nvidia driver using runfile method otherwise no.
spec > infra > configs > cns > override_values > gpu_driver_version
string
yes
Config to override gpu_driver_version while installing Nvidia Cloud Native Stack.
spec > infra > configs > user_access_cidrs
list
CIDR ranges from where SSH access should be allowed.
spec > infra > configs > dev_access_cidrs
list
CIDR ranges from where application UI and API will be allowed access.
spec > infra > configs > region
string
AZURE region where resources of the application will be deployed.
spec > infra > configs > ssh_private_key_path
string
Absolute path of the private key to be used to SSH the hosts.
spec > infra > configs > ssh_public_key
string
Absolute path of the public key counterpart of private key used to SSH the hosts.
spec > infra > configs > additional_ssh_public_keys
list
yes
List of contents of public counterparts to the additional keys that will be used to SSH the hosts.
spec > infra > configs > dns_and_certs_configs
map
DNS and certificate configuration.
spec > infra > configs > dns_and_certs_configs > resource_group
string
Resource group of the DNS zone and key vault containing the certificate.
spec > infra > configs > dns_and_certs_configs > dns_zone
string
DNS zone name to be used as the base domain for the API and optionally the UI.
spec > infra > configs > dns_and_certs_configs > wildcard_cert
string
Name of the wildcard certificate that can be used against the various deployments.
spec > infra > configs > api_sub_domain
string
yes
Sub-domain of the app API endpoint.
spec > infra > configs > ui_sub_domain
string
yes
Sub-domain of the app UI endpoint.
spec > infra > configs > elastic_sub_domain
string
yes
Sub-domain of the Elastic endpoint.
spec > infra > configs > kibana_sub_domain
string
yes
Sub-domain of the Kibana endpoint.
spec > infra > configs > grafana_sub_domain
string
yes
Sub-domain of the Grafana endpoint.
spec > infra > configs > include_ui_custom_domain
bool
true if UI needs custom base domain. false if azure managed base domain acceptable.
spec > infra > configs > turn_server_provider
string
yes
Either rp or coturn or twilio. Defaults to coturn.
spec > infra > configs > clusters
map
Definition of clusters to be created.
spec > infra > configs > clusters > app
map
Definition of App cluster to be created.
spec > infra > configs > clusters > app > private_instance
bool
Always true as app instance to be created is private.
spec > infra > configs > clusters > app > master
map
Definitions of the master node of the app cluster.
spec > infra > configs > clusters > app > master > size
string
AZURE GPU vm size for the app master node.
spec > infra > configs > clusters > app > features
map
Definitions of features flag of the app cluster.
spec > infra > configs > clusters > app > features > cns
bool
cns feature flag is always true as used to install Nvidia Cloud Native Stack.
spec > infra > configs > clusters > app > features > app
bool
app feature flag is always true as used to install tokkio app & other components.
spec > infra > configs > clusters > turn
map
Definition of master node of turn cluster.
spec > infra > configs > clusters > turn > private_instance
bool
Always false as turn instance to be created is public.
spec > infra > configs > clusters > turn > master
map
Definitions of the master node of the turn cluster.
spec > infra > configs > clusters > turn > master > type
string
AZURE vm size for the turn master node.
spec > infra > configs > clusters > turn > features
map
Definitions of features flag of the turn cluster.
spec > infra > configs > clusters > turn > features > cns
bool
true when turn_server_provider = rp otherwise false.
spec > infra > configs > clusters > turn > features > rp
bool
true when turn_server_provider = rp otherwise false.
spec > infra > configs > clusters > turn > features > coturn
bool
true when turn_server_provider = coturn otherwise false.
spec > platform
map
Configuration to change the default foundational config to be used.
spec > platform > configs
map
Foundational configuration.
spec > platform > configs > k8s_namespace
string
yes
Kubernetes namespace for foundational charts to be deployed,defaults to platform
spec > platform > configs > k8s_secrets
list
List of kubernetes secrets needed for foundational chart.
spec > platform > secrets > ngc_cli_api_key
string
NGC cli_api_key used to download helm chart to be used.
spec > app > configs > app_settings
map
Configuration to change the default App setting to be used.
spec > app > configs > app_settings > k8s_namespace
string
yes
Kubernetes namespace for app chart to be deployed, defaults to app.
spec > app > configs > app_settings > helm_chart
map
yes
Helm chart config for app chart to be deployed.
spec > app > configs > app_settings > helm_chart > repo
map
yes
Configuration of remote repo used for app helm chart to be deployed.
spec > app > configs > app_settings > helm_chart > repo > enable
bool
yes
Flag to use app helm chart from remote repo, defaults to true.
spec > app > configs > app_settings > helm_chart > repo > repo_url
string
yes
Repo_url for app helm chart to be deployed,defaults to https://helm.ngc.nvidia.com/nvidia/ace.
spec > app > configs > app_settings > helm_chart > repo > chart_name
string
yes
App helm chart name to be fetched from remote repo, defaults to ucs-tokkio-app-base-3-stream-llm-rag-3d-ov.
spec > app > configs > app_settings > helm_chart > repo > chart_version
string
yes
App helm chart version to be fetched from remote repo,defaults to 4.1.4.
spec > app > configs > app_settings > helm_chart > repo > release_name
string
yes
Release name for app to be deployed using helm chart, defaults to tokkio-app.
spec > app > configs > app_settings > helm_chart > repo > user_value_override_files
list
yes
Absolute path of user override values.yml to be used for app chart deployment
spec > app > configs > app_settings > helm_chart > local
map
yes
Configuration to change app helm chart deployment using locally present chart.
spec > app > configs > app_settings > helm_chart > local > enable
bool
yes
true if want to use locally present app helm chart.
spec > app > configs > app_settings > helm_chart > local > path
string
yes
Absolute path of helm chart present locally
spec > app > configs > app_settings > helm_chart > local > release_name
string
yes
Release name for app to be deployed using helm chart, defaults to tokkio-app.
spec > app > configs > app_settings > helm_chart > local > user_value_override_files
list
yes
Absolute path of user override values.yml to be used for app chart deployment
spec > app > configs > app_settings > k8s_secrets
list
List of kubernetes secrets to be deployed.
spec > app > configs > turn_server_settings
map
Configuration to change to setup turn server to be used for app.
spec > app > configs > turn_server_settings > rp
map
yes
Configuration of rp as turn server to be used for app.
spec > app > configs > turn_server_settings > rp > k8s_namespace
string
yes
Kubernetes namespace to be used for rp chart deployment,defaults to rp
spec > app > configs > turn_server_settings > rp > helm_chart
map
yes
Helm chart config for rp chart to be deployed.
spec > app > configs > turn_server_settings > rp > helm_chart > repo
map
yes
Configuration of remote repo used for rp helm chart to be deployed.
spec > app > configs > turn_server_settings > rp > helm_chart > repo_url
string
yes
Repo_url for rp helm chart to be deployed,defaults to https://helm.ngc.nvidia.com/nvidia/ace.
spec > app > configs > turn_server_settings > rp > helm_chart > chart_name
string
yes
RP helm chart name to be fetched from remote repo, defaults to rproxy.
spec > app > configs > turn_server_settings > rp > helm_chart > chart_version
string
yes
RP helm chart version to be fetched from remote repo,defaults to 0.0.8.
spec > app > configs > turn_server_settings > rp > helm_chart > release_name
string
yes
Release name for rp to be deployed using helm chart, defaults to rp.
spec > app > configs > turn_server_settings > rp > k8s_secrets
list
List of kubernetes secrets to be deployed.
spec > app > configs > turn_server_settings > coturn
map
yes
Configuration of coturn as turn server to be used for app.
spec > app > configs > turn_server_settings > coturn > username
string
yes
Coturn server username used while setting up coturn,defaults to foo
spec > app > configs > turn_server_settings > coturn > password
string
yes
Coturn server password used while setting up coturn,defaults to bar
spec > app > configs > turn_server_settings > coturn > realm
string
yes
Realm name for coturn server,defaults to mydummyt.org
spec > app > configs > turn_server_settings > twilio
map
yes
Configuration details of twilio as turn server to be used for app.
spec > app > configs > turn_server_settings > twilio > account_sid
string
yes
account_sid from twilio account,defaults to empty string
spec > app > configs > turn_server_settings > twilio > auth_token
string
yes
auth_token from twilio account,defaults to empty string
spec > app > configs > ui_settings
map
yes
Configuration to change to override default UI
spec > app > configs > ui_settings > resource
map
yes
Configuration for UI resource to be used
spec > app > configs > ui_settings > resource > ngc
map
yes
Configuration of NGC to download UI resource from
spec > app > configs > ui_settings > resource > ngc > org
string
yes
NGC Organization of the UI resource to be used.
spec > app > configs > ui_settings > resource > ngc > team
string
yes
NGC Team of the UI resource to be used.
spec > app > configs > ui_settings > resource > ngc > name
string
yes
NGC Resource Name of the UI resource to be used.
spec > app > configs > ui_settings > resource > ngc > version
string
yes
NGC Resource Version of the UI resource to be used.
spec > app > configs > ui_settings > resource > ngc > file
string
yes
NGC Resource File Name of the UI resource to be used.
spec > app > configs > ui_settings > user_env_vars
map
yes
Configuration to override default UI settings
spec > app > secrets > ngc_cli_api_key
string
NGC cli_api_key used to download UI resource and helm chart to be used.
Note
It is recommended to set spec > infra > configs > cns > override_values > cns_nvidia_driver to yes to support installation of Nvidia driver with all latest kernel version.
You can refer sample examples of config-template.yml from folder dist/config-template-examples
Prepare environment variables#
The config template yml file contains several inputs about the infrastructure and application’s needs. For ease of use, some of these are wired to lookup environment varialbes. For example {{ lookup(‘env’, ‘NGC_CLI_API_KEY’) }} are doing lookup of environment to using lookup function. What this means is, we can set an environment variable for NGC_CLI_API_KEY with its value and OneClick Script can access it automatically.
Prepare a file to hold these environment variables and their values by vi my-env-file.env and populate them with actual values. Example shown below. config-template.yml
cat my-env-file.env export OPENAI_API_KEY="<replace-with-actual-value>" export NGC_CLI_API_KEY="<replace-with-actual-value>" export NVIDIA_API_KEY="<replace-with-actual-value>" export ARM_TENANT_ID="<replace-with-actual-value>" export ARM_SUBSCRIPTION_ID="<replace-with-actual-value>" export ARM_CLIENT_ID="<replace-with-actual-value>" export ARM_CLIENT_SECRET='"<replace-with-actual-value>"
Use the source command to load these variables into your current shell session. The source command reads and executes commands from the specified file in the current shell environment, making the variables defined in the file available in the shell.
Caution
If you modify your <my-env-file.env> file or started a new shell, you will have to run source <my-env-file.env> again before running ./envbuild.sh command.
Installing#
Running OneClick script
./envbuild.sh install --component all --config-file ./<my-l40-config.yml>
Once the script installation completes, capture the access_urls section for future refernce. This specifies various URLs that have been configured as part of installation. Example output is shown below.
access_urls: api_endpoint: "https://<api_sub_domain>.<base_domain>" elasticsearch_endpoint: "https://elastic-<name>.<base_domain>" grafana_endpoint: "https://grafana-<name>..<base_domain>" kibana_endpoint: "https://kibana-<name>..<base_domain>" ui_endpoint: "https://<ui_sub_domain>.<base_domain>" ssh_command: app: bastion: ssh -i /home/my-user/.ssh/id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null <username>@<bastion-instance-ip-address> master: ssh -i /home/my-user/.ssh/id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o ProxyCommand="ssh -i /home/my-user/.ssh/id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -W %h:%p <username>@<bastion-instance-ip-address>" <username>@<app-instance-ip-address> turn: master: ssh -i /home/my-user/.ssh/id_rsa -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null <username>@<turn-instance-ip-address>
Verifiying installation: Once the installation steps complete, it may take several munutes (~60 mins) depending on Model initialization and other Application specific initialization activites and network speed. Use the below steps to check if the application is up before accessing the UI. On Application instance, run kubectl get pods -n <application-namespace>. Example output of this command is shown below.
$ kubectl get po -n app NAME READY STATUS RESTARTS AGE a2f-a2f-deployment-98c7fb777-lnp6w 1/1 Running 0 164m ace-agent-chat-controller-deployment-0 1/1 Running 0 164m ace-agent-chat-engine-deployment-5f8db599f6-6mkh9 1/1 Running 0 164m ace-agent-plugin-server-deployment-645cf865b5-vvcvw 1/1 Running 0 164m anim-graph-sdr-envoy-sdr-deployment-5b7cc55b6b-cbbz6 3/3 Running 0 164m catalog-rag-deployment-67547fd54-lrcf2 1/1 Running 0 164m chat-controller-sdr-envoy-sdr-deployment-78b54b7f86-fnvql 3/3 Running 0 164m ds-sdr-envoy-sdr-deployment-85bbfdb4c4-lbksh 3/3 Running 0 164m ds-visionai-ds-visionai-deployment-0 1/1 Running 0 164m ia-animation-graph-microservice-deployment-0 1/1 Running 0 164m ia-omniverse-renderer-microservice-deployment-0 1/1 Running 0 164m ia-omniverse-renderer-microservice-deployment-1 1/1 Running 0 164m ia-omniverse-renderer-microservice-deployment-2 1/1 Running 0 164m mongodb-mongodb-666765487c-tzxxw 1/1 Running 0 164m occupancy-alerts-api-app-84576db5c9-qb7dr 1/1 Running 0 164m occupancy-alerts-app-5cfcc9f75-b84xb 1/1 Running 0 164m redis-redis-79c99cdd97-kl77x 1/1 Running 0 164m redis-timeseries-redis-timeseries-69bb884965-fnn44 1/1 Running 0 164m renderer-sdr-envoy-sdr-deployment-55df9ccc6f-v66rr 3/3 Running 0 163m riva-speech-547fb9b8c5-7w8h4 1/1 Running 0 164m tokkio-cart-manager-deployment-65d8547cbc-vplnq 1/1 Running 0 164m tokkio-ingress-mgr-deployment-65dfdc79f6-4pvcg 3/3 Running 0 164m tokkio-ui-server-deployment-79c9749686-ztxrq 1/1 Running 0 164m tokkio-umim-action-server-deployment-674cccc898-qgz26 1/1 Running 0 164m triton0-bbd77d78f-rz62w 1/1 Running 0 164m vms-vms-67876bcb9b-q655s 1/1 Running 0 164m
Validating#
- Access the app: Once all the pods have come to Ready status, state, try accessing the UI with the help of URL printed in output attribute
ui_endpoint
. Exampleui_endpoint
https://AZUREdemoui.csptokkiodemo.com. When you try your URL on browser, you should be able to see Tokkio application coming up at this point. Granting permissions to browser. For the first time, browser should prompt permissions such as Mic, Speaker or Camera necessary for UI to operate. Upon accepting the permissions, UI should load.
- Access the app: Once all the pods have come to Ready status, state, try accessing the UI with the help of URL printed in output attribute
Un-installing#
In case if you choose to uninstall the Application and UI the OneClick script has installed, you may choose to run the uninstall command with appropriate options. Example shown below to uninstall all components.
./envbuild.sh uninstall --component all --config-file ./my-l40-config.yml
Known-Issues#
Sometimes re-installation of UI with config changes or reinstallation of new UI on existing UI, does not reflect when browsed through UI endpoint.
This happens because Azure CDN caches UI content, causing old UI content still visible on browsing, If you want to forcefully clear cache of Azure CDN, We need to invalidate cache of Azure CDN using below commands.
source my-env-file.env az cdn endpoint purge --resource-group '<replace-with-actual-rg-name>' --profile-name '<replace-with-cdn-profile-name>' --name '<replace-with-actual-cdn-endpoint-hostname>' --content-paths '/*'
Once above command runs successfully, now you can try accessing UI endpoint and it will show correct UI.
If you are setting spec.app.configs.app_settings.k8s_namespace = default, uninstallation of app component using envbuild.sh throws below error.
fatal: [app-master]: FAILED! => {"changed": false, "error": 403, "msg": "Namespace default: Failed to delete object: b'{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"namespaces \\\\\"default\\\\\" is forbidden: this namespace may not be deleted\",\"reason\":\"Forbidden\",\"details\":{\"name\":\"default\",\"kind\":\"namespaces\"},\"code\":403}\\n'", "reason": "Forbidden", "status": 403}
To prevent this, avoid using default kubernetes namespace for spec.app.configs.app_settings.k8s_namespace