Changes and New Features
Added support for NVIDIA Quantum-2 switches with NDR speed
|Adapter Cards||Added support for NVIDIA ConnectX-7 adapter card with 400 Gb/s speed|
sharpd daemon process has been removed. sharpd-related activity is now performed from the user application process
Upon restart of AM, it no longer needs to wait for all concurrent jobs to finish before being able to accept new jobs
|AM||Added a mechanism that periodically checks for errors in Aggregation Trees and attempts to fix them|
|General||Added support for new data types BFLOAT16, INT8 and UNIT8 for performing reduction operations|
New parameter: A timeout in seconds for trees recovery retries. A value of 0 means do not try to recover trees.
New parameter: A boolean flag. If enabled, AM tries to recover state from last AM run and continue the operation of the current jobs.
New parameter: Set the SHARP trees file used in Seamless restart. Need to mention only the file name, full path is constructed using ‘dump_dir’.
New parameter: Set the number of consecutive retries of seamless restart. If seamless restart fails more times in a row, it will be disabled in the next run.
Update: Change default to 252
Update: Change default to 5, to support MAD value that represents 4K MTU.
Update: Changed default to 3 instead of 20, enabling more SAT jobs to take place in parallel on each switch.