Megatron-Core is a self contained, light weight PyTorch library that packages everything essential for training large scale transformer. It offer rich collection of GPU techniques to optimize memory, compute and communication inherited from Megatron-LM and Transformer Engine with cutting-edge innovations on system-level efficiency. By abstracting these GPU optimized techniques into composable and modular APIs, Megatron Core allows full flexibility for developers and model researchers to train custom transformers at-scale and easily facilitate developing their own LLM framework on NVIDIA accelerated computing infrastructure.

Developer documentation for Megatron Core covers API documentation, quickstart guide as well as deep dives into advanced GPU techniques needed to optimize LLM performance at scale.