Advanced Features# Advanced feature guides for key Megatron Core capabilities. Mixture of Experts Multi-Token Prediction (MTP) Multi-Latent Attention Megatron Core MoE What’s New Overview of MCore MoE Supported Features and Architectures Quick Start Guide Best Practices to achieve high performance on MoE training Feature Documentation Training Optimizations MoE Arguments Reference Examples Contributing Support Citation context_parallel package Context parallelism overview Context parallelism benefits Enabling context parallelism Megatron FSDP How to use ? Key Features Configuration Recommendations Design of Custom FSDP References Distributed Optimizer Data flow Sharding scheme Key steps Optimizer CPU Offload How to use ? Configuration Recommendations Custom Pipeline Model Parallel Layout Tokenizers Overview Key Features Basic Usage Advanced Usage Integration with Megatron-LM Supported Tokenizer Libraries Common Tokenizer Types Best Practices Next Steps Megatron Energon Overview Installation Key Features Basic Usage Multimodal Example Dataset Blending Configuration Integration with Megatron-LM Resources Next Steps Megatron RL Overview Key Features Architecture Use Cases Resources