NeMo Megatron Bridge

Summary

NeMo Megatron Bridge is NVIDIA’s PyTorch-native library within NeMo Framework for training, fine-tuning, and converting LLM/VLM checkpoints between Hugging Face and Megatron Core formats. Current NVIDIA docs frame it as both a bidirectional bridge/conversion layer and a performant training loop for pretraining, SFT, LoRA/PEFT, mixed precision, and large-scale parallel training.

Detail

Purpose

Large-model teams often need the usability and ecosystem reach of Hugging Face checkpoints plus the training throughput and parallelism of Megatron-Core. NeMo Megatron Bridge provides conversion, verification, recipes, and training infrastructure so models can move between formats without losing correctness or deployment options.

Current scope

Bidirectional Hugging Face to Megatron conversion with verification and checkpoint integrity checks.
Training and customization paths for pretraining, SFT, LoRA, DoRA, PEFT, packed sequences, distillation, and mixed precision.
Parallelism-aware conversion and training for tensor, pipeline, virtual pipeline, context, expert, and data-parallel styles.
Supported model docs that include Llama, Qwen, DeepSeek, Mistral, Gemma, GPT-OSS, Nemotron-3-Nano, Nemotron-3-Super, Nemotron H/Nano variants, and multiple VLM families.
Dedicated Nemotron 3 Nano and Nemotron 3 Super pages cover Hugging Face/Megatron conversion, pretraining, fine-tuning, LoRA, and quantization workflows for the current Nemotron 3 text reasoning models.
Current Nemotron-Training-Recipes docs use Megatron Bridge as the pretraining and SFT implementation layer for Nano3 and Super3 recipes.
Current resiliency docs incorporate NVIDIA-Resiliency-Extension features for fault tolerance, straggler detection, preemption, async checkpoint save, local checkpointing, and restart workflows.
Integration with NeMo-RL as a high-scale training backend and with NeMo-Export-Deploy for downstream inference export.

NVIDIA context

Megatron Bridge is the current NVIDIA-authored bridge between open model checkpoints and the large-scale Megatron training stack. It should be treated as separate from Megatron-LM because it has its own current docs surface, conversion workflow, model-support matrix, and NeMo Framework role.

Connections

NVIDIA-NeMo - parent framework family for training, customization, evaluation, and deployment.
Megatron-Core - composable library Bridge targets for Megatron model definitions, parallelism, and training APIs.
Megatron-LM - reference implementation and Megatron ecosystem adjacency.
Transformer-Engine - low-precision transformer acceleration library adjacent to Bridge’s mixed-precision and high-scale training paths.
NVIDIA-Resiliency-Extension - resilient training layer incorporated by current Bridge docs.
Nemotron-Training-Recipes - public recipe surface that uses Bridge for Nemotron pretraining and supervised fine-tuning.
NeMo-AutoModel - Hugging Face-compatible PyTorch training path; Megatron Bridge covers higher-scale Megatron Core conversion and recipes.
NeMo-RL - current RL docs use Megatron Bridge for optimal acceleration on top model families.
NeMo-Run - cluster execution layer for large-model jobs.
NeMo-Export-Deploy - downstream export/deployment path for Megatron Bridge checkpoints.
NeMo-Customizer and NeMo-Evaluator - managed customization and evaluation services adjacent to code-level training.
TensorRT-LLM, vLLM, and Triton-Inference-Server - downstream serving/export targets.
Nemotron - current docs list Nemotron/Nemotron Nano variants among supported NVIDIA model families.
Nemotron-3-Nano - current Megatron Bridge page for the 30B-A3B text reasoning model.
Nemotron-3-Super - current Megatron Bridge page for the 120B-A12B text reasoning model.
Nemotron-3-Nano-Omni - adjacent Nemotron 3 model, distinct from the text-only Nano/Super Bridge pages.
NVIDIA-AI-Enterprise - enterprise software context for NeMo training and deployment.

Source Excerpts

NVIDIA docs describe Megatron Bridge as a NeMo Framework library for pretraining, SFT, LoRA, and Hugging Face/Megatron conversion.
Current docs list Nemotron 3 Nano, Nemotron 3 Super, and Llama-Nemotron model families in model support and RL acceleration context.

AIPS BOOM

Explorer

NeMo-Megatron-Bridge

NeMo Megatron Bridge

Summary

Detail

Purpose

Current scope

NVIDIA context

Connections

Source Excerpts

Resources

Graph View

Table of Contents

Backlinks