Important

NeMo 2.0 is an experimental feature and currently released in the dev container only: nvcr.io/nvidia/nemo:dev. Please refer to NeMo 2.0 overview for information on getting started.

Falcon

Released in April 2023, TII’s Falcon is an Apache 2.0 license model based on the transformer decoder framework with key adjustments such as using multi-group attention, RoPE, parallel attention and MLP blocks, and removal of bias from linear layers. More information is available in the companion paper: “The Falcon Series of Open Language Models”. Falcon is offered in sizes 7B, 40B, and 180B.

Feature

Status

Data parallelism

Tensor parallelism

Pipeline parallelism

Interleaved Pipeline Parallelism Sched

N/A

Sequence parallelism

Selective activation checkpointing

Gradient checkpointing

Partial gradient checkpointing

FP32/TF32

AMP/FP16

BF16

TransformerEngine/FP8

Multi-GPU

Multi-Node

Inference

N/A

Slurm

Base Command Manager

Base Command Platform

Distributed data preprcessing

NVfuser

P-Tuning and Prompt Tuning

IA3 and Adapter learning

Distributed Optimizer

Distributed Checkpoint

Fully Shared Data Parallel

N/A