NVIDIA ASR NIM

Type: Microservice Tags: NVIDIA, NIM, ASR, automatic speech recognition, speech-to-text, Parakeet, Canary, Whisper, Nemotron, Riva Related: NVIDIA-Speech-NIM-Microservices, NVIDIA-NIM, NVIDIA-Riva, Nemotron-ASR-Streaming, Nemotron-3-VoiceChat, Parakeet-ASR, NVIDIA-Canary, Nemotron, NVIDIA-NeMo, NVIDIA-TTS-NIM, NVIDIA-NMT-NIM, NVIDIA-Background-Noise-Removal-NIM, TensorRT, Triton-Inference-Server Sources: https://docs.nvidia.com/nim/speech/latest/asr/index.html, https://docs.nvidia.com/nim/speech/latest/about/how-it-works.html Last Updated: 2026-04-29

Summary

NVIDIA ASR NIM is the Speech NIM microservice for automatic speech recognition. Current NVIDIA docs describe it as converting spoken audio into text, packaging pre-trained NeMo models with TensorRT and Triton in self-contained containers that handle model download, optimization, and serving.

Detail

Purpose

ASR NIM gives developers a deployable speech-to-text endpoint for voice assistants, live captions, contact center transcription, media indexing, and multilingual speech pipelines without directly serving each ASR model.

Current scope

  • Streaming mode returns partial transcripts as audio arrives.
  • Offline mode processes complete audio and returns a full transcript.
  • Current docs list Parakeet CTC, Parakeet TDT, Parakeet RNNT Multilingual, Nemotron-ASR-Streaming, Conformer CTC, Whisper Large v3, and Canary 1B options.
  • Model selection guidance covers language coverage, inference mode, timestamps, streaming latency, and translation behavior.
  • Customization docs cover word boosting, custom vocabularies, fine-tuned NeMo checkpoints, and pipeline configuration.

NVIDIA context

ASR NIM is where Parakeet-ASR, NVIDIA-Canary, and Nemotron ASR deployment meet NVIDIA-NIM. It is also the first stage in common speech pipelines that chain ASR, NVIDIA-NMT-NIM, and NVIDIA-TTS-NIM services.

Connections

Source Excerpts

  • NVIDIA docs say ASR NIM converts spoken audio into text and packages pre-trained NeMo models with TensorRT and Triton.
  • The docs list both streaming and offline inference modes and multiple ASR model families.

Resources