Introduction to the NVIDIA DGX-1 Deep Learning System

The NVIDIA® DGX-1™ Deep Learning System is the world’s first purpose-built system for deep learning with fully integrated hardware and software that can be deployed quickly and easily.



Using the DGX-1: Overview

The NVIDIA DGX-1 is designed to operate in one of two modes — Base OS mode, and Cloud Managed mode. Cloud Management is currently not available, but will be available at a future date. Availability will vary by region.

Base OS mode provides the base operating system on the DGX-1 for customers who want to use their own on-site scheduling and management software and who will build and run their own applications.

Hardware Specifications

Components

Component Qty Description
Base Server 1 Dual Intel® Xeon® CPU motherboard with x2 9.6 GT/s QPI, 8 Channel with 2 DPC DDR4, Intel®X99 Chipset, AST2400 BMC
1 GPU Baseboard supporting 8 SXM2 modules (Cube Mesh) and 4 PCIE x16 slots for InfiniBand NICs
1 Chassis with 3+1 1600W Power supply and support for up to five 2.5 inch drives
1 10/100BASE-T IPMI Port
1 RS232 Serial Port
2 USB 3.0 Ports
Power Supply 4 1600 W each.
CPU 2 Intel® Xeon® E5-2698 v4, 20-core, 2.2GHz, 135W
GPU 8 (Option 1) Tesla P100, featuring
  • 170 teraflops, FP16
  • 16 GB memory per GPU
  • 28,672 NVIDIA CUDA® Cores
(Option 2) Tesla V100, featuring
  • 960 teraflops, FP16
  • 16 GB memory per GPU
  • 40,960 NVIDIA CUDA® Cores
  • 5120 NVIDIA Tensor Cores
System Memory 16 32 GB DDR4 LRDIMM (512 GB total)
SAS Raid Controller 1 8 port LSI SAS 3108 RAID Mezzanine
Storage (RAID 0) (Data) 4 1.92 TB, 6 Gb/s, SATA 3.0 SSD
Storage (OS) 1 480 GB, 6 Gb/s, SATA 3.0 SSD
10 GbE NIC 1 Dual port, 10GBASE-T, network adapter Mezzanine
InfiniBand EDR NIC 4 Single port, x16 PCIe, Mellanox ConnectX-4 VPI MCX455A-ECAT

Mechanical

Feature Description
Form Factor 3U Rackmount
Height 5.16” (13.1 cm)
Width 17.5" (44.4 cm)
Depth 34.1" (86.6 cm)
Gross Weight 134 lbs (61 kg)

Power

Input Specification for Each Power Supply Comments
200-240 V (ac) 3200 W max.

1600 W @ 200-240 V,

8 A, 50-60 Hz

The DGX-1 contains four load-balancing power supplies, with 3+1 redundancy.

Connections and Controls

ID Type Qty Description
1 Power button 1

Press to turn the DGX-1 on or off.

Blue: System power on

Off: System power off

Amber (blinking): DC Off and fault

Amber and blue (blinking): DC On and fault

2 ID button 1 Press to cause an LED on the back of the unit to flash as an identifier during servicing.
3 InfiniBand 4 QSFP28 port; Mellanox ConnectX-4 VPI MCX455A-ECAT, EDR IB (100Gb), x16 PCIe
4 USB 2 USB 3.0 ports are available to connect a keyboard.
5 VGA 1 The VGA port connects to a VGA capable monitor for local viewing of the DGX-1 setup console or base OS.
6 DB9 1 RS232 serial port for internal debugging
7 AC input 4 Power supply inputs
8 Ethernet (RJ45) 2 10GBASE-T dual port network adapter Mezzanine
9

IPMI (RJ45)

1 10/100BASE-T Intelligent Platform Management Interface (IPMI) port

Rear Panel Power Controls

ID Type Qty Description
1 Power button 1

Press and immediately release the power button for a graceful shutdown of the host OS.

Press and hold the power button for at least four seconds to shut down the system immediately. The BMC remains live.

2 Power LED 1

Off: Power off

Blue (steady): Power on

Blue (blinking): BMC reports system health fault.

3 Main Board Status LED 1

Off: Normal

Amber (blinking): BMC reports system health fault.

LAN LEDs

LEDs next to each Ethernet port indicate the connection status as described in the table below:

LED Status Description

1

(Port 1 Link/Activity)

Amber (steady) LAN link
Amber (blinking) LAN access (off when there is traffic)
Off Disconnected

2

(Port 1 Speed)

Green 10 Gb/s
Amber 1 Gb/s
Off 100 Mb/s

3

(Port 0 Link/Activity)

Amber (steady) LAN link
Amber (blinking) LAN access (off when there is traffic)
Off Disconnected

4

(Port 0 Speed)

Green 10 Gb/s
Amber 1 Gb/s
Off 100 Mb/s

IPMI Port LEDs

LEDs on the IPMI port indicate the connection status as described in the table below:

Link Activity Description
Off Off Unplugged
Green (steady) Green (blinking) 100M active link
Off Green (blinking) 10M active link

Hard Disk Indicators

ID Feature Description
1 Button and release lever for removing the HDD
2 HDD present LED

Blue (Steady): Drive present

Blue (Blinking twice/sec): Identification (such as when initializing or locating through the SBIOS)

Blue (Blinking once/sec): Rebuilding (such as when creating a RAID array)

Amber (Steady): Warning/failure

Off: Slot empty

3 HDD activity LED Blue: Access

Power Supply Unit (PSU) LED

The PSU LED indicates the operation status of the PSU as described in the table below:

Activity Description
Green Normal operation
Amber (blinking) Power off; Fault
Green (blinking) Power on; Standby mode