NVIDIA Clara Train 4.0
v4.0

Federated learning provisioning tool

This page contains details about the FL provisioning tool to create packages for the server, client, and administrators. For more information on how this is used, see the Federated learning user guide.

File

Description

requirements.txt

Required dependencies to run this provisioning tool

Readme.md

Brief description of this provisioning tool

project.yml

The project setup file to configure for describing each participant, more detail on this is in this next section below

provision.py

Main code

cert_utils.py

Helper code to provision.py

clara_hci-3.1.0-py3-none-any.whl

Wheel package for Federated Learning Administration Client

fed_client.template, fed_server.template

Template files for configuration, used by the FL server and FL clients

readme.txt

Information for users receiving the startup packages to know how to install/run the three types of packages for server, clients, and admins

Project yaml file

Edit the project.yml configuration file to meet your project requirements:

  • “name” is used to identify this project.

  • The “he” section is new for generating homomorphic encryption keys so all the clients will have them by default. The parameters are adjustable if needed, and homomorphic encryption will only be used if it is also configured in the MMAR.

  • The “config_folder” is now required because other configurations can now potentially be allowed, however, for Clara train, this should usually remain “config”

  • The “server” section describes the FL server.
    • “server”: “org” is for the name of the owner of this server.

    • “server”: “cn” is the “Fully Qualified Domain Name” and it is very important that this is correct. If this information is not completely correct, the security handshake between the server and clients will fail. Please note that this cannot just be an IP address.

    • “server”: “fed_learn_port” is the port number for communication between the FL server and FL clients

    • “server”: “admin_port” is the port number for communication between the FL server and FL administration client

    • “server”: “admin_storage” is directory name, related to the WORKSPACE, to store files by admin process on server

    • “server”: “email” is the contact email

    • “server”: “min_num_clients” is the minimum number of clients for federated learning to begin

    • “server”: “max_num_clients” is the maximum number of clients allowed in this instance of federated learning

    • “server”: “auth”: false can be set to disable the auth functions

  • The “fl_clients” section describes the FL clients, with one “org”, “client_name”, and “email” for each client. Please note that each “client_name” must be unique. It will show in the admin console.

  • The “admin_clients” section describes the FL admin clients. The “email” for each must be unique.

Attention

Please make sure that the FL server port number is accessible by all participating sides.

Default project.yml file

The following is an example of the default project.yml file:

Copy
Copied!
            

# org is to describe each participant's organization and is optional # the name of this project name: project_name # homomorphic encryption he: lib: tenseal config: poly_modulus_degree: 8192 coeff_mod_bit_sizes: [60, 40, 40] scale_bits: 40 scheme: CKKS config_folder: config server: org: server_org # set cn to the server's fully qualified domain name # never set it to example.com cn: example.com # replace the number with that all clients can reach out to, and that the server can open to listen to fed_learn_port: 8002 # again, replace the number with that all clients can reach out to, and that the server can open to listen to # the value must be different from fed_learn_port admin_port: 8003 # admin_storage is the mmar upload folder name on the server admin_storage: transfer min_num_clients: 1 max_num_clients: 100 # The following values under fl_clients and admin_clients are for demo purpose only. # Please change them according to the information of actual project. fl_clients: # client_name must be unique # email is optional - org: fl_client_org1 client_name: flclient1 email: optional.email@flclient.org - org: fl_client_org1 client_name: flclient2 admin_clients: # email is the user name for admin authentication. Hence, it must be unique within the project - org: adm_client_org1 email: email@hello.world.com - org: adm_client_org2 email: email@foo.bar.com

Overriding configurations in MMARs

The MMARs that are deployed to the server can also have an FL server configuration, config_fed_server.json under the startup directory by default. In the following settings are configured in this file within the MMAR, they will override the provisioned configurations:

  • wait_after_min_clients

  • heart_beat_timeout

  • min_num_clients

  • max_num_clients

Adding clients and regenerating packages

Running python3 provision.py again without changing project.yml will output the same set of zip files with the previously generated passwords.

To add more clients, just add the client in the “fl_clients” section in project.yml. Additional zip files will be generated while other zip files remain the same. This way, existing clients do not need to worry about changing anything.

To regenerate all zip files from scratch, delete audit.pkl. Note this will make all existing packages and the certificates inside them invalid. This means that you have to send new packages to all participants with new passwords.

© Copyright 2020, NVIDIA. Last updated on Feb 2, 2023.