Megatron Energon
Type: Library Tags: NVIDIA, Megatron Energon, data loading, multimodal data, WebDataset, JSONL, distributed training, PyTorch, Megatron Core, Megatron-LM Related: Megatron-Core, Megatron-LM, NVIDIA-NeMo, NeMo-Megatron-Bridge, Nemotron-Training-Recipes, PyTorch, NVIDIA-DALI, NeMo-Curator, NVIDIA-Optimized-Frameworks, NVIDIA-Resiliency-Extension Sources: https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/features/megatron_energon.html, https://nvidia.github.io/Megatron-Energon/, https://nvidia.github.io/Megatron-Energon/basic/quickstart.html, https://nvidia.github.io/Megatron-Energon/basic/data_prep.html, https://github.com/NVIDIA/Megatron-Energon Last Updated: 2026-04-29
Summary
Megatron Energon is NVIDIA’s multimodal data-loading library for Megatron-scale training. Current NVIDIA docs describe it as an advanced dataloader for efficient loading of text, image, video, and audio data at scale, with distributed loading, dataset blending, WebDataset/JSONL-oriented formats, resumability, packing, grouping, joining, object-storage streaming, and command-line data-preparation tools.
Detail
Purpose
Large multimodal training jobs are often bottlenecked by data movement, decoding, shuffling, blending, and resumability. Megatron Energon addresses that data-input layer for Megatron-Core and Megatron-LM workflows, while remaining usable outside Megatron when a project needs large-scale multimodal dataset loading.
This page is the canonical wiki target for Megatron Energon. Do not split the quickstart, data-preparation tutorials, WebDataset layout, remote dataset guide, packing/grouping/joining features, CLI commands, or API module pages into separate wiki pages unless NVIDIA publishes a distinct durable product/topic around them.
Current capabilities
- Multimodal sample loading for text, images, video, and audio.
- Distributed loading across workers, processes, and multi-node clusters.
- Dataset blending with configurable weights and metadataset support.
- WebDataset-oriented storage with Energon metadata, plus JSONL support for simpler cases.
- Save/restore of data-loading state so training can resume reproducibly.
- Packing, grouping, joining, subsets, epochized blending, custom sample loaders, and reproducible scaling features.
- Remote dataset access, including S3-compatible object storage patterns in current docs.
- CLI utilities such as
energon prepare,energon info,energon lint,energon mount, andenergon preview.
NVIDIA stack context
Megatron Energon complements, rather than replaces, other NVIDIA data tools:
- NeMo-Curator prepares and filters large training datasets before training.
- NVIDIA-DALI accelerates decode/augmentation pipelines, especially image/video/audio preprocessing.
- Megatron Energon focuses on dataset format, multimodal sample loading, distributed sharding, blending, and resumable iteration for large training jobs.
Connections
- Megatron-Core - current Megatron Core docs surface Megatron Energon as a feature for large-scale multimodal training data loading.
- Megatron-LM - reference implementation that can use Megatron Energon in multimodal training flows.
- NVIDIA-NeMo and NeMo-Megatron-Bridge - adjacent training stack where Megatron data-loading and checkpointing workflows matter.
- Nemotron-Training-Recipes - long-running recipe jobs need reproducible data iteration and resume behavior.
- PyTorch - Megatron Energon is a Python package used in PyTorch-style training loops.
- NVIDIA-DALI and NeMo-Curator - complementary data loading/preprocessing and data curation layers.
- NVIDIA-Optimized-Frameworks - container context for running Megatron/Energon training environments.
- NVIDIA-Resiliency-Extension - related resiliency package for restart/checkpoint behavior around long-running jobs.
Source Excerpts
- NVIDIA docs describe Megatron Energon as an advanced multimodal dataloader for text, image, video, and audio at scale.
- Current docs list WebDataset with extra metadata and JSONL support as dataset formats.
- The GitHub README describes Energon as the multi-modal data loader of Megatron that can also be used independently.