Natural Machine Translation(NMT)#

Transformer based Seq2Seq#

The Transformer-based encoder-decoder Neural Machine Translation models in Riva are based on the original Transformer paper. The main modification is to use the pre-layernorm transformer variant. For more information, refer to the NeMo Machine Translation Documentation.

The 24x6 models provided with Riva have 500M parameters with 24 encoder and 6 decoder layers.