NVIDIA UFM 4.0 Cyber-AI Appliance Hardware User Manual
NVIDIA UFM 4.0 Cyber-AI Appliance Hardware User Manual

Overview

Data centers host many users and applications and have become the competitive advantage for research organizations and manufacturing companies. Keeping the data center intact and healthy is critical as a data center shutdown means the loss of millions of dollars. Moreover, malicious users often exploit data center access to misuse compute resources by running prohibited applications, for example, resulting in higher operating costs.

The NVIDIA® UFM® Cyber-AI Appliance (Gen 4.0) solution enhances the benefits of UFM Telemetry and UFM Enterprise, providing scale-out of preventive maintenance for lowering supercomputing OPEX. UFM Cyber-AI Appliance comes with NVIDIA GPU-accelerated deep learning frameworks to significantly speed up deep learning training, which could otherwise take days and weeks, to just hours and days.

Part Number

MUA9652H-2SF

Form factor

2U rackmount - 19″

GPU

NVIDIA® A30 24GB - accelerated deep learning frameworks

PCIe cards

2x NVIDIA® ConnectX®-6 VPI dual-port network interface cards

Port speed

InfiniBand: SDR/QDR/HDR100/HDR
Ethernet: 25/50/100/200 Gb/s

Bandwidth

Up to 100Gb/s bi-directional per port

Power supplies

2x AC power supply units (PSUs)

Component

Description

Qty

SKY-6200

SKY-6200MLX-02UFM4
Up to 24’’ depth / 2U height and 19’’(typical rack) width

1

GPU

NVIDIA® A30 24GB

1

CPU

Silver 4214R Processor (16.5M Cache, 2.40 GHz)

2

TPM

TPM 2.0 module by LPC

1

Secure boot

Secure boot based on Intel boot guard technology with RSA-2K secured key

RAM

8GB 2666MHz DDR4 ECC
8 per CPU

16

Disk HDD

2.5" 2.0TB SATA 7200RPM Enterprise

6

Disk SSD

2.5" 3.84TB, SATA 6Gb/s, 3D2, TLC

2

RAID

The server must support, via BIOS, three RAID configuration simultaneals

  • RAID 1 for 2 Disk HDD

  • RAID 10 for 4 Disk HDD

  • RAID 1 for 2 Disk SSD

OOB Networking

2x1GbE management ports IPv4/6 & 2x10GbE3

4

Serial Port

DB9 RS232 port male

1

PCIe

PCI Express 3.0 x16

2

BMC

Baseboard management controller for device health monitoring

1

PSU

Hot-swappable power supply units for reliability (1+1 redundancy)

  • 1000 W @ 100 ~ 127 V

  • 2000 W @ 200 ~ 240 V

2

Fans

1x fan per power supply

2

6x internal cooling fans for CPU, GPU, and expansion card

6

USB ports

On front panel: 2 X USB 2.0

6

On back panel: 4 X USB 3.0

Lights-out management

For remote shutdown and serial access

UFM Cyber-AI Appliance system populates one GPU, two ConnectX-6 InfiniBand/VPI adapter cards, fans, and two PSUs in the system's rear panel.

Network Interface Cards

UFM Cyber-AI Appliance is populated with two ConnectX-6 dual-port network interface cards (NICs) which enable the hardware-based forwarding of IP packets from InfiniBand to Ethernet, and vice versa.

Power Supply Units

UFM Cyber-AI Appliance is equipped with two redundant, load-sharing PSUs at the rear side of the system. The PSUs are housed in a 2U container. Each PSU has an extraction handle, status LED, and a power socket.

For power supply unit LED operation, please refer to "System Monitoring".

The system enables hot swapping which enables components to be exchanged while the system is online without affecting operational integrity.

Warning

Only removed these PSUs from the system if they are being replaced.

Warning

If one of the two PSUs is extracted from the UFM Cyber-AI Appliance, the Tensor Reading screen of the GUI will still show OK under the Healthy column and "Not presence" under the Status column. This behavior is normal.


Fans

Power Supply Fans

UFM Cyber-AI Appliance is equipped with one fan per PSU on the rear panel of the appliance.

Internal Fans

UFM Cyber-AI Appliance is equipped with six internal cooling fans for the CPU, GPU, and expansion cards. When the system is operating normally, the fans operate at a constant speed. If the system module fails, or one of the temperature thresholds is exceeded, the fans automatically raise their rotation speeds to draw in more air.

Hardware Requirements

Unless otherwise specified, NVIDIA Networking products are designed to work in an environmentally controlled data center with low levels of gaseous and dust (particulate) contamination.

The operating environment should meet severity level G1 as per ISA 71.04 for gaseous contamination and ISO 14644-1 class 8 for cleanliness level.

Airflow Requirements

NVIDIA UFM Cyber-AI appliance is offered with one airflow pattern: From the front panel to the rear panel. Please refer to the Technical Specifications section for airflow numbers.

Software Requirements

The UFM Cyber-AI software offers enhanced and real-time network telemetry, combined with AI-powered intelligence and advanced analytics. It enables IT managers to discover operational anomalies and even predict network failures. This improves both security and data center uptime while decreasing overall operating expenses.

  • UFM Telemetry and UFM Enterprise inside

  • Detects performance degradations

  • Detects usage profile changes over time

  • Detects abnormal cluster behavior

  • Correlates between seemingly unrelated phenomena powered by artificial intelligence

  • Alerts when preventive maintenance is needed

  • Continuous system data collection to optimize predictability

Country

Certification

EU/Morocco

CE

USA

FCC

Canada

ICES

Japan

VCCI

Australia/New Zealand

RCM

Brazil

ANATEL

Taiwan

BSMI

China

CCC

Korea

KCC

Worldwide

CB

USA/Canada

cTUVus

Argentina

S-mark

Russia/Belarus/Kazakhstan

CU

Taiwan

BSMI

Type

Details

Acoustic Noise

ISO-7779:1999
ETS 300 753

Shock & Vibration

According to industrial spec

WEEE

ROHS 2011/65/EU

RoHS 6

ROHS 2011/65/EU

MTBF/ MTBCF

According to Telcordia SR-332

© Copyright 2023, NVIDIA. Last updated on Sep 7, 2023.