Overview

NeMo Megatron is a new feature/version in the NeMo framework that enables developers to effectively train and scale language models to billions of parameters. With NeMo Megatron, you can train different variants of the GPT-3 models and scale them to multiple nodes on NVIDA DGX SuperPOD™ configurations. This deep learning (DL) software stack is optimized for DGX SuperPOD using NVIDIA InfiniBand technology to provide efficient on-premises compute for training and inferring complex workloads.

Early access to NeMo Megatron is limited to enterprises that want to train and deploy GPT-3 style models on NVIDIA SuperPOD to perform zero-shot tasks such as answering deep domain questions, translating languages, comprehending, and summarizing complex documents.

The state-of-the-art parallelism techniques of NeMo Megatron, that is data parallelism, tensor parallelism, and pipeline parallelism, enable efficient data preprocessing training of large models, and inference deployment.

The state-of-the-art parallelism techniques of NeMo Megatron enable the efficient training of large models that do not fit in the memory of a single GPU. In the training tasks, tensor (intra-layer) model parallelism is adopted within a multi-GPU server. With tensor model parallelism, individual transformer layers of the model are partitioned over multiple devices. For more details, refer to this paper.

© Copyright 2022-2023, NVIDIA. Last updated on Jan 10, 2023.