Remote Client Overview and Examples#
TAO Remote Client is a command line tool that allows you to create experiments using the command line instead of relying on API calls.
Note
Datasets provided in these examples are subject to the following license Dataset License.
Installation#
Setting up Your Python Environment#
We recommend setting up a Python environment using miniconda
. The following instructions
show how to setup a Python conda
environment.
Follow the instructions in this link to set up a Conda environment using Miniconda.
After you have installed
miniconda
, create a new environment and set the Python version to 3.12.conda create -n tao python=3.12
Activate the
conda
environment that you have just created.conda activate tao
Verify that the command prompt shows the name of your Conda environment.
(tao) desktop:
When you are done with your session, you can deactivate your conda
environment using the
deactivate
command:
conda deactivate
You may re-instantiate this conda
environment using the following command:
conda activate tao
Install the TAO Client#
After you setup and activate the python environment, you could install the TAO Client using the following command:
python -m pip install nvidia-tao-client
CLI Specs#
User authentication is based on the NGC Personal Key and can be done with the following command:
BASE_URL=https://IP_of_machine_deployed/api/v1
NGC_API_KEY=nvapi-zZYtczM5amdtdDcwNjk0cnA2bGU2bXQ3bnQ6NmQ4NjNhMDItMTdmZS00Y2QxLWI2ZjktNmE5M2YxZTc0OGyS
$ tao-client login --ngc-key $NGC_API_KEY --ngc-org-name ea-tlt
After authentication, the command line syntax is:
$ tao-client <network> <action> <args>
For example:
$ tao-client dino experiment-run-action --action train --id 042559ec-ab3e-438d-9c94-2cab38f76efc --specs '<json_loadable_specs_string_from_get_spec_action>'
Note
You can always use the --help
argument to retrieve the command usage information.
To list supported networks:
$ tao-client -–help
To list supported Dino actions:
$ tao-client dino --help
Each network have the following actions:
Command
Description
workspace-create
Create a workspace and return the id
dataset-create
Create a dataset and return the id
dataset-delete
Delete a dataset
dataset-run-action
Run a dataset action
experiment-create
Create an experiment and return the id
experiment-delete
Delete an experiment
experiment-run-action
Run an experiment action
get-action-status
Get action status
get-job-logs
Get the logs of a job
get-metadata
Get the metadata of the mentioned artifact
get-spec
Return default spec of an action
job-cancel
Cancel a running job
job-pause
Pause a running job
job-resume
Resume a paused job
list-base-experiments
Return the list of base experiments
list-datasets
Return the list of datasets
list-experiments
Return the list of experiments
list-job-files
List the files, specs and logs of a job
download-entire-job
Download all files w.r.t to the job
download-selective-files
Download job files based on the arguments passed
model-automl-defaults
Return default automl parameters
patch-artifact-metadata
Patch the metadata of the mentioned artifact
publish-model
Publish model
remove-published-model
Remove published model
workspace-backup
Backup mongo database to cloud storage
workspace-restore
Restore a workspace from cloud storage
Workspaces#
In TAO 6.0, cloud workspaces are used to pull datasets and store experiment results in popular cloud storage providers.
AWS -
cloud_type
: aws;cloud_specific_details
needed: access_key, secret_key, aws_region, s3_bucket_nameAzure -
cloud_type
: azure;cloud_specific_details
needed: account_name, access_key, azure_region, azure_blob_nameHuggingFace datasets -
cloud_type
: huggingface;cloud_specific_details
needed: token (Not applicable for experiment results storage)
Datasets#
User can either use datasets stored in the cloud workspace with cloud_file_path or public dataset with an https url
Object Detection Use Case Example with CLI#
Creating a Cloud Workspace
This returns a UUID representing the workspace ID that will be used during the creation of
datasets
andworkspaces
. The following are CLI argument parameters: *name
- a identifiable name for the workspace *cloud_type
- one of the cloud storage types supported by TAO (aws, azure, huggingface) *cloud_details
- dictionary of required cloud storage values like bucket name, access credentials with write permissions, etc.WORKSPACE_ID=$(tao-client dino workspace-create \ --name 'AWS Public' \ --cloud_type 'aws' \ --cloud_details '{ "cloud_region": "us-west-1", "cloud_bucket_name": "bucket_name", "access_key": "access_key", "secret_key": "secret_key" }') echo $WORKSPACE_ID
Creating the training dataset
This returns a UUID representing the train dataset ID for other steps like
dataset_convert
andtrain
. Examples include options for the private cloud dataset usingworkspace_id
and public cloud dataset usingurl
. The following are CLI argument parameters:dataset_type
- one of TAO’s supported dataset typesdataset_format
- one of the format supported for the type chosen aboveuse_for
- a list of use cases for the dataset. Choose from [“training”, “evaluation”, “testing”]
For private cloud dataset:
workspace
- the ID from Step 1.cloud_file_path
- the path to the dataset folder in cloud.
For public cloud dataset:
url
- the url of the dataset.
TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --url https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/data/object_detection_train/ --use_for '["training"]') TRAIN_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --workspace $WORKSPACE_ID --cloud_file_path <path to daataset folder in cloud> --use_for '["training"]') echo $TRAIN_DATASET_ID
To monitor the status of the train dataset download, run (pull) the get-metadata
:
TRAIN_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $TRAIN_DATASET_ID --job_type dataset)
echo $TRAIN_DATASET_PULL_STATUS
Creating the validation dataset
This returns a UUID representing the eval dataset id for other steps like
dataset_convert
andevaluate
. Examples include options for the private cloud dataset usingworkspace_id
and public cloud dataset usingurl
. The possible CLI arguments are the same as for the train dataset.EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --url https://tao-detection-synthetic-dataset-dev.s3.us-west-2.amazonaws.com/data/object_detection_val/ --use_for '["evaluation"]') EVAL_DATASET_ID=$(tao-client dino dataset-create --dataset_type object_detection --dataset_format coco --workspace $WORKSPACE_ID --cloud_file_path /data/tao_od_synthetic_subset_val_no_convert --use_for '["evaluation"]') echo $EVAL_DATASET_ID
To monitor the status of the validation dataset download, run (pull) the get-metadata
:
EVAL_DATASET_PULL_STATUS=$(tao-client dino get-metadata --id $EVAL_DATASET_ID --job_type dataset)
echo $EVAL_DATASET_PULL_STATUS
Finding a base experiment
The following command lists the base experiments available for use. Pick one that corresponds to Dino and use it in step 5.
BASE_EXP_RESPONSE=$(tao-client dino list-base-experiments --filter_params '{"network_arch": "dino"}') # Post Processing to convert bash output to a json string BASE_EXP_RESPONSE="${BASE_EXP_RESPONSE:1:-1}" BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed "s/'/\"/g") BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed -e "s/None/null/g" -e "s/True/true/g" -e "s/False/false/g") BASE_EXP_RESPONSE=$(echo "$BASE_EXP_RESPONSE" | sed 's/}, {/},\n{/g') BASE_EXP_RESPONSE="[$BASE_EXP_RESPONSE]" BASE_EXPERIMENT_ID=$(echo "$BASE_EXP_RESPONSE" | jq . | jq -r '[.[] | select(.network_arch == "dino") | select(.ngc_path | endswith("pretrained_dino_nvimagenet:resnet50"))][0] | .id') echo $BASE_EXPERIMENT_ID
Creating an experiment
This returns a UUID representing the experiment id for all of the future steps like
train
,evaluate
, andinference
. CLI arguments for creating an experiment are:network_arch
- one of TAO’s supported network architecturesworkspace
- the ID of workspaceencryption_key
- encryption key for loading the Base Experiment
EXPERIMENT_ID=$(tao-client dino experiment-create --network_arch dino --encryption_key nvidia_tlt --workspace $WORKSPACE_ID) echo $EXPERIMENT_ID
Assign datasets and a base experiment to the experiment
UPDATE_INFO=$(cat <<EOF { "base_experiment": ["$BASE_EXPERIMENT_ID"], "train_datasets": ["$TRAIN_DATASET_ID"], "eval_dataset": "$EVAL_DATASET_ID", "inference_dataset": "$EVAL_DATASET_ID", "calibration_dataset": "$TRAIN_DATASET_ID" } EOF ) EXPERIMENT_METADATA=$(tao-client dino patch-artifact-metadata --id $EXPERIMENT_ID --job_type experiment --update_info "$UPDATE_INFO") echo $EXPERIMENT_METADATA | jq
Key-value pairs:
base_experiment
: Base Experiment ID of the base_experiment from Step 3.train_datasets
: The train dataset IDseval_dataset
: The eval dataset IDinference_dataset
: The test dataset IDcalibration_dataset
: The train dataset IDdocker_env_vars
: Key value pairs of MLOPs settings:wandbApiKey
,clearMlWebHost
,clearMlApiHost
,clearMlFilesHost
,clearMlApiAccessKey
,clearMlApiSecretKey
.
Training an experiment
Get specs:
Returns a JSON loadable string of specs to be used in the train step.
TRAIN_SPECS=$(tao-client dino get-spec --action train --job_type experiment --id $EXPERIMENT_ID) echo $TRAIN_SPECS | jqModify specs:
Modify the specs from the previous step, if necessary.
TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_epochs=10') TRAIN_SPECS=$(echo $TRAIN_SPECS | jq -r '.train.num_gpus=2') echo $TRAIN_SPECS | jqRun the train action:
CLI arguments for running the train action are:
actions
- action to be executed
specs
- spec dictionary of the action to be executedTRAIN_ID=$(tao-client dino experiment-run-action --action train --id $EXPERIMENT_ID --specs "$TRAIN_SPECS") echo $TRAIN_IDCheck status of training Job:
To monitor the status of training job, run (pull) the
get-action-status
:CLI arguments for geting action metadata are:
id
- ID of the experiment
job
- Job for which action metadata is to be retrieved
job_type
- experimenttao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $TRAIN_ID | jq
Evaluating an experiment
Get specs:
Returns a JSON loadable string of specs to be used in the evaluate step.
EVALUATE_SPECS=$(tao-client dino get-spec --action evaluate --job_type experiment --id $EXPERIMENT_ID) echo $EVALUATE_SPECS | jqModify specs:
Modify the specs from the previous step, if necessary.
Run the evaluate action.
CLI arguments for running the evaluate action are:
parent_job_id
- ID of the parent job if any
actions
- action to be executed
specs
- spec dictionary of the action to be executedEVALUATE_ID=$(tao-client dino experiment-run-action --action evaluate --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$EVALUATE_SPECS") echo $EVALUATE_IDCheck the status of the evaluate job:
To monitor the status of the evaluation job, run (pull) the
get-action-status
:CLI arguments for geting action metadata:
id
- ID of the experiment
job
- Job for which action metadata is to be retrieved
job_type
- experimenttao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $EVALUATE_ID | jq
Inference for an experiment
Get specs:
Returns a JSON loadable string of specs to be used in the inference step.
INFERENCE_SPECS=$(tao-client dino get-spec --action inference --job_type experiment --id $EXPERIMENT_ID) echo $INFERENCE_SPECS | jqModify specs:
Modify the specs from the previous step, if necessary.
Run the inference action:
CLI arguments for running evaluate action:
parent_job_id
- ID of the parent job if any
actions
- action to be executed
specs
- spec dictionary of the action to be executedINFERENCE_ID=$(tao-client dino experiment-run-action --action inference --id $EXPERIMENT_ID --parent_job_id $TRAIN_ID --specs "$INFERENCE_SPECS") echo $INFERENCE_IDCheck the status of the inference job:
To monitor the status of inference job, run (pull) the
get-action-status
:CLI arguments for geting action metadata:
id
- ID of the experiment
job
- Job for which action metadata is to be retrieved
job_type
- experimenttao-client dino get-action-status --job_type experiment --id $EXPERIMENT_ID --job $INFERENCE_ID | jq