NVIDIA Clara Train 3.1
v3.1

Federated learning administrator commands

For an example of operating the federated learning environment with these administrator commands and more information, see the Federated learning user guide.

Admin commands and functions

Typing “help” or “?” will display a list of the commands and a brief description for each. Typing “? ” before a command like “? check_status” or “?ls” will provide additional details for the usage of a command. Provided below is a list of commands shown as examples of how they may be run with a description.

Command

Example

Description

bye

bye

Exit from the client

help

help

Get command help information

lpwd

lpwd

Print local workspace root directory of the admin client

info

info

Show folder setup info (upload and download sources and destinations)

check_status

check_status server

The FL run number, FL server status, and the registered clients with their names and tokens are displayed. If training is running, the round information is also displayed.

check_status client

The name, token, and status of each connected client are displayed.

check_status client clientname

The name, token, and status of the specified client with clientname are displayed.

upload_folder

upload_folder mmarfolderpathandname

Uploads the MMAR folder provided to the FL server. Note that mmarfolderpathandname is relative to the “transfer” directory which is at the same level as the “startup” directory containing the script running the admin client.

set_run_number

set_run_number 1

Creates a folder “run_1” on the server at the same level as the “startup” directory to contain all of the MMARs for deployment.

deploy

deploy mmarname server

Deploys the MMAR specified by mmarname to the server. Note that mmarname is expected to be an MMAR which has been uploaded to the server already and resides in the transfer directory on the server (which is at the same level as the startup directory by default). mmarname can be a relative path if the MMAR is contained in any parent directories, for example mmars/segmentation_ct_spleen.

deploy mmarname client

Deploys the MMAR specified by mmarname to each client. This can also be done per client by specifying a specific client name for this command. Please note that the deployed MMARs are also in their own workspace named after the run number set by set_run_number above.

start

start server

Starts the server training.

start client

Starts all of the clients. Individual clients can be started by specifying the client instance name after the start client command.

start_mgpu

start_mgpu client 2 clientname

Starts training with multiple GPUs. The number of GPUs to be used must be specified in the command.

abort

abort client clientname

Aborts the client specified by clientname. Please note that this may not be instant but may take time for the command to take effect.

abort server

Aborts the server training

shutdown

shutdown client clientname

Shuts down the client specified by clientname. Please note that this may not be instant but may take time for the command to take effect.

shutdown server

Shuts down the server. Clients must be shut down first before the server is shut down.

cat

cat server startup/fed_server.json -ns

Show content of a file (-n: number all output lines; -s: suppress repeated empty output lines)

cat clientname startup/docker.sh -bT

Show content of a file (-b: number nonempty output lines; -T: display TAB characters as ^I)

env

env server

Show environment variables

env clientname

Show environment variables

grep

grep server "info" -i log.txt

Search for a pattern in a file (-n: print line number; -i: ignore case)

head

head clientname log.txt

Print the first 10 lines of a file

head server log.txt -n 15

Print the first 15 lines of a file (-n: print the first N lines instead of the first 10)

tail

tail clientname log.txt

Print the last 10 lines of a file

tail server log.txt -n 15

Print the last 15 lines of a file (-n: output the last N lines instead of the last 10)

ls

ls server -alt

List files in workspace root directory (-a: all; -l: use a long listing format; -t: sort by modification time)

ls clientname -SR

List files in workspace root directory (-S: sort by file size; -R: list subdirectories recursively)

pwd

pwd server

Print the name of workspace root directory

pwd clientname

Print the name of workspace root directory

Tip

Outputs of any command can be redirected into a file by using the greater-than symbol “>”, however there must be no whitespace before the filename. For example, you may run sys_info server >serverinfo.txt. To only save the file output without printing it, use two greater-than symbols “>>” instead: sys_info server >>serverinfo.txt.

Note

To continue training from the previous model in the case the server is interrupted, set “MMAR_CKPT”: “FL_global_model.ckpt”, in environment.json of the server’s MMAR. Without this parameter, even if you are using a previously used run number with an existing model, training will start from scratch.`

validate source_client dest_client

this gets the performance metrics for validation of a model

validate –all (all source clients with all dest. clients)

this gets the performance metrics for validation of all models

© Copyright 2020, NVIDIA. Last updated on Feb 2, 2023.