NVIDIA Docs Hub NVIDIA Clara NVIDIA Clara Train 3.1 Federated learning provisioning tool

Federated learning provisioning tool

This page contains details about the FL provisioning tool to create packages for the server, client, and administrators. For more information on how this is used, see the Federated learning user guide.

Contents of FL startup provisioning tool

File	Description
requirements.txt	Required dependencies to run this provisioning tool
Readme.md	Brief description of this provisioning tool
project.yml	The project setup file to configure for describing each participant, more detail on this is in this next section below
provision.py	Main code
cert_utils.py	Helper code to provision.py
clara_hci-3.1.0-py3-none-any.whl	Wheel package for Federated Learning Administration Client
fed_client.template, fed_server.template	Template files for configuration, used by the FL server and FL clients
readme.txt	Information for users receiving the startup packages to know how to install/run the three types of packages for server, clients, and admins

Project yaml file

Edit the project.yml configuration file to meet your project requirements:

“name” is used to identify this project.
The “server” section describes the FL server.
- “server”: “org” is for the name of the owner of this server.
- “server”: “cn” is the “Fully Qualified Domain Name” and it is very important that this is correct. If this information is not completely correct, the security handshake between the server and clients will fail. Please note that this cannot just be an IP address.
- “server”: “fed_learn_port” is the port number for communication between the FL server and FL clients
- “server”: “admin_port” is the port number for communication between the FL server and FL administration client
- “server”: “admin_storage” is directory name, related to the WORKSPACE, to store files by admin process on server
- “server”: “email” is the contact email
- “server”: “min_num_clients” is the minimum number of clients for federated learning to begin
- “server”: “max_num_clients” is the maximum number of clients allowed in this instance of federated learning
The “fl_clients” section describes the FL clients, with one “org”, “client_name”, and “email” for each client. Please note that each “client_name” must be unique. It will show in the admin console.
The “admin_clients” section describes the FL admin clients. The “email” for each must be unique.

Attention

Please make sure that the FL server port number is accessible by all participating sides.

Default project.yml file

The following is an example of the default project.yml file:

Copy
Copied!

            
            # org is to describe each participant's organization and is optional

# the name of this project
name: project_name

server:
  org: server_org

  # set cn to the server's fully qualified domain name
  # never set it to example.com
  cn: example.com

  # replace the number with that all clients can reach out to, and that the server can open to listen to
  fed_learn_port: 8002

  # again, replace the number with that all clients can reach out to, and that the server can open to listen to
  # the value must be different from fed_learn_port
  admin_port: 8003

  # admin_storage is the mmar upload folder name on the server
  admin_storage: transfer

  min_num_clients: 1
  max_num_clients: 100


# The following values under fl_clients and admin_clients are for demo purpose only.
# Please change them according to the information of actual project.
fl_clients:
  # client_name must be unique
  # email is optional
  - org: fl_client_org1
    client_name: flclient1
    email: optional.email@flclient.org
  - org: fl_client_org1
    client_name: flclient2

admin_clients:
  # email is the user name for admin authentication.  Hence, it must be unique within the project
  - org: adm_client_org1
    email: email@hello.world.com
  - org: adm_client_org2
    email: email@foo.bar.com

Overriding configurations in MMARs

The MMARs that are deployed to the server can also have an FL server configuration, config_fed_server.json under the startup directory by default. In the following settings are configured in this file within the MMAR, they will override the provisioned configurations:

wait_after_min_clients
heart_beat_timeout
min_num_clients
max_num_clients

Adding clients and regenerating packages

Running python3 provision.py again without changing project.yml will output the same set of zip files with the previously generated passwords.

To add more clients, just add the client in the “fl_clients” section in project.yml. Additional zip files will be generated while other zip files remain the same. This way, existing clients do not need to worry about changing anything.

To regenerate all zip files from scratch, delete audit.pkl. Note this will make all existing packages and the certificates inside them invalid. This means that you have to send new packages to all participants with new passwords.