Advanced Features# Advanced feature guides for key Megatron Core capabilities. Mixture of Experts Multi-Token Prediction (MTP) Multi-Latent Attention Megatron Core MoE What’s New Parallelism Router and Load Balancing Performance Optimizations Token Dispatch Mechanism Ease of use User Guide Usage MoE training example: Performance Best Practice Tuning Guide of Parallel Mappings MoE Parallel Folding End-to-End Training Practice Reference Best Parallel Mapping context_parallel package Context parallelism overview Context parallelism benefits Enabling context parallelism Megatron FSDP How to use ? Key Features Configuration Recommendations Design of Custom FSDP References Distributed Optimizer Data flow Sharding scheme Key steps Optimizer CPU Offload How to use ? Configuration Recommendataions Custom Pipeline Model Parallel Layout Tokenizers Overview Key Features Basic Usage Advanced Usage Integration with Megatron-LM Supported Tokenizer Libraries Common Tokenizer Types Best Practices Next Steps Megatron Energon Overview Installation Key Features Basic Usage Multimodal Example Dataset Blending Configuration Integration with Megatron-LM Resources Next Steps Megatron RL Overview Key Features Architecture Use Cases Resources