Data centers host many users and applications and have become the competitive advantage for research organizations and manufacturing companies. Keeping the data center intact and healthy is critical as a data center shutdown means the loss of millions of dollars. Moreover, malicious users often exploit data center access to misuse compute resources by running prohibited applications, for example, resulting in higher operating costs.
The NVIDIA® UFM® Cyber-AI Appliance (Gen 4.0) solution enhances the benefits of UFM Telemetry and UFM Enterprise, providing scale-out of preventive maintenance for lowering supercomputing OPEX. UFM Cyber-AI Appliance comes with NVIDIA GPU-accelerated deep learning frameworks to significantly speed up deep learning training, which could otherwise take days and weeks, to just hours and days.
UFM Cyber-AI Appliance Highlights
|Form factor||2U rackmount - 19″|
|GPU||NVIDIA® A30 24GB - accelerated deep learning frameworks|
|PCIe cards||2x NVIDIA® ConnectX®-6 VPI dual-port network interface cards|
|Port speed||InfiniBand: SDR/QDR/HDR100/HDR|
Ethernet: 25/50/100/200 Gb/s
|Bandwidth||Up to 100Gb/s bi-directional per port|
|Power supplies||2x AC power supply units (PSUs)|
List of Hardware Features
|GPU||NVIDIA® A30 24GB||1|
|CPU||Silver 4214R Processor (16.5M Cache, 2.40 GHz)||2|
|TPM||TPM 2.0 module by LPC||1|
|Secure boot||Secure boot based on Intel boot guard technology with RSA-2K secured key|
8GB 2666MHz DDR4 ECC
|Disk HDD||2.5" 2.0TB SATA 7200RPM Enterprise||6|
2.5" 3.84TB, SATA 6Gb/s, 3D2, TLC
The server must support, via BIOS, three RAID configuration simultaneals
|OOB Networking||2x1GbE management ports IPv4/6 & 2x10GbE3||4|
|Serial Port||DB9 RS232 port male||1|
|PCIe||PCI Express 3.0 x16||2|
|BMC||Baseboard management controller for device health monitoring||1|
Hot-swappable power supply units for reliability (1+1 redundancy)
|Fans||1x fan per power supply||2|
|6x internal cooling fans for CPU, GPU, and expansion card||6|
|USB ports||On front panel: 2 X USB 2.0||6|
|On back panel: 4 X USB 3.0|
|Lights-out management||For remote shutdown and serial access|
Main System Components
UFM Cyber-AI Appliance system populates one GPU, two ConnectX-6 InfiniBand/VPI adapter cards, fans, and two PSUs in the system's rear panel.
Network Interface Cards
UFM Cyber-AI Appliance is populated with two ConnectX-6 dual-port network interface cards (NICs) which enable the hardware-based forwarding of IP packets from InfiniBand to Ethernet, and vice versa.
Power Supply Units
UFM Cyber-AI Appliance is equipped with two redundant, load-sharing PSUs at the rear side of the system. The PSUs are housed in a 2U container. Each PSU has an extraction handle, status LED, and a power socket.
For power supply unit LED operation, please refer to "System Monitoring".
The system enables hot swapping which enables components to be exchanged while the system is online without affecting operational integrity.
Only removed these PSUs from the system if they are being replaced.
If one of the two PSUs is extracted from the UFM Cyber-AI Appliance, the Tensor Reading screen of the GUI will still show OK under the Healthy column and "Not presence" under the Status column. This behavior is normal.
Power Supply Fans
UFM Cyber-AI Appliance is equipped with one fan per PSU on the rear panel of the appliance.
UFM Cyber-AI Appliance is equipped with six internal cooling fans for the CPU, GPU, and expansion cards. When the system is operating normally, the fans operate at a constant speed. If the system module fails, or one of the temperature thresholds is exceeded, the fans automatically raise their rotation speeds to draw in more air.
UFM Cyber-AI Appliance Requirements
Unless otherwise specified, NVIDIA Networking products are designed to work in an environmentally controlled data center with low levels of gaseous and dust (particulate) contamination.
The operating environment should meet severity level G1 as per ISA 71.04 for gaseous contamination and ISO 14644-1 class 8 for cleanliness level.
NVIDIA UFM Cyber-AI appliance is offered with one airflow pattern: From the front panel to the rear panel. Please refer to the Technical Specifications section for airflow numbers.
The UFM Cyber-AI software offers enhanced and real-time network telemetry, combined with AI-powered intelligence and advanced analytics. It enables IT managers to discover operational anomalies and even predict network failures. This improves both security and data center uptime while decreasing overall operating expenses.
- UFM Telemetry and UFM Enterprise inside
- Detects performance degradations
- Detects usage profile changes over time
- Detects abnormal cluster behavior
- Correlates between seemingly unrelated phenomena powered by artificial intelligence
- Alerts when preventive maintenance is needed
- Continuous system data collection to optimize predictability
Shock & Vibration
According to industrial spec
According to Telcordia SR-332