NVIDIA ASR NIM

Type: Microservice Tags: NVIDIA, NIM, ASR, automatic speech recognition, speech-to-text, Parakeet, Canary, Whisper, Nemotron, Riva Related: NVIDIA-Speech-NIM-Microservices, NVIDIA-NIM, NVIDIA-Riva, Nemotron-ASR-Streaming, Nemotron-3-VoiceChat, Parakeet-ASR, NVIDIA-Canary, Nemotron, NVIDIA-NeMo, NVIDIA-TTS-NIM, NVIDIA-NMT-NIM, NVIDIA-Background-Noise-Removal-NIM, TensorRT, Triton-Inference-Server Sources: https://docs.nvidia.com/nim/speech/latest/asr/index.html, https://docs.nvidia.com/nim/speech/latest/about/how-it-works.html Last Updated: 2026-04-29

Summary

NVIDIA ASR NIM is the Speech NIM microservice for automatic speech recognition. Current NVIDIA docs describe it as converting spoken audio into text, packaging pre-trained NeMo models with TensorRT and Triton in self-contained containers that handle model download, optimization, and serving.

Detail

Purpose

ASR NIM gives developers a deployable speech-to-text endpoint for voice assistants, live captions, contact center transcription, media indexing, and multilingual speech pipelines without directly serving each ASR model.

Current scope

Streaming mode returns partial transcripts as audio arrives.
Offline mode processes complete audio and returns a full transcript.
Current docs list Parakeet CTC, Parakeet TDT, Parakeet RNNT Multilingual, Nemotron-ASR-Streaming, Conformer CTC, Whisper Large v3, and Canary 1B options.
Model selection guidance covers language coverage, inference mode, timestamps, streaming latency, and translation behavior.
Customization docs cover word boosting, custom vocabularies, fine-tuned NeMo checkpoints, and pipeline configuration.

NVIDIA context

ASR NIM is where Parakeet-ASR, NVIDIA-Canary, and Nemotron ASR deployment meet NVIDIA-NIM. It is also the first stage in common speech pipelines that chain ASR, NVIDIA-NMT-NIM, and NVIDIA-TTS-NIM services.

Connections

NVIDIA-Speech-NIM-Microservices - parent docs surface for Speech NIMs.
NVIDIA-Riva - broader NVIDIA speech AI platform and historical ASR runtime context.
Nemotron-ASR-Streaming - model-specific page for NVIDIA’s 600M-parameter English streaming ASR model.
Parakeet-ASR - current ASR NIM docs list multiple Parakeet model families.
NVIDIA-Canary - current ASR NIM docs list Canary 1B for transcription and bidirectional translation.
Nemotron - broader model-family context for Nemotron speech models.
NVIDIA-Background-Noise-Removal-NIM - audio cleanup can improve speech intelligibility and ASR accuracy in noisy environments.
NVIDIA-TTS-NIM and NVIDIA-NMT-NIM - downstream services for voice response and translation pipelines.
Nemotron-3-VoiceChat - unified full-duplex speech-to-speech model that avoids a separate ASR-to-LLM-to-TTS cascade for some voice-agent workflows.
TensorRT and Triton-Inference-Server - acceleration and serving layers packaged inside the NIM.

Source Excerpts

NVIDIA docs say ASR NIM converts spoken audio into text and packages pre-trained NeMo models with TensorRT and Triton.
The docs list both streaming and offline inference modes and multiple ASR model families.

AIPS BOOM

Explorer

NVIDIA-ASR-NIM

NVIDIA ASR NIM

Summary

Detail

Purpose

Current scope

NVIDIA context

Connections

Source Excerpts

Resources

Graph View

Table of Contents

Backlinks