Parakeet ASR

Type: Model Tags: NVIDIA, ASR, Speech Recognition, NeMo, CTC, RNN-T, Audio AI Related: NVIDIA-NeMo, NVIDIA-Riva, NVIDIA-Speech-NIM-Microservices, NVIDIA-ASR-NIM, NVIDIA-Canary, TensorRT, NVIDIA-NIM Sources: NVIDIA official documentation, https://docs.nvidia.com/nim/speech/latest/asr/index.html Last Updated: 2026-04-29

Summary

Parakeet is NVIDIA’s family of state-of-the-art automatic speech recognition (ASR) models developed in collaboration with Apple, built on the NeMo framework. The Parakeet models achieve top-tier word error rates (WER) on standard English ASR benchmarks and are optimized for real-time and batch transcription workloads. They use CTC and RNN-T decoding architectures built on the FastConformer encoder, and are available as open-weight models on Hugging Face.

Detail

Purpose

High-accuracy, low-latency automatic speech recognition is a critical building block for voice assistants, meeting transcription, call center analytics, medical dictation, and accessibility tools. Parakeet addresses the need for a production-quality open-weight English ASR model that outperforms prior state-of-the-art systems while remaining efficient enough for real-time deployment.

Key Features

FastConformer-based encoder: efficient hybrid CNN-Transformer architecture for long-context audio
Multiple decoding modes: CTC (Parakeet-CTC), RNN-T (Parakeet-RNNT), and TDT (Token-and-Duration Transducer)
Model sizes: 0.6B (Parakeet-CTC-0.6B), 1.1B (Parakeet-TDT-1.1B), and Parakeet-RNNT-1.1B
Word-level timestamps: accurate start/end times for each recognized word
Best-in-class WER on LibriSpeech, MLS, VoxPopuli, Earnings21, and CHIME-6 benchmarks
Trained on 64,000+ hours of English speech data
Support for long-form audio transcription (hours-long audio with chunking)
Open-weight release on Hugging Face (nvidia/parakeet-*)

Use Cases

Real-time meeting transcription and captioning
Call center and contact center analytics
Voice assistant backend for English-language commands
Medical dictation and clinical documentation
Subtitle and caption generation for video content
Accessibility tools for the hearing-impaired
Podcast and audio search indexing

Hardware Requirements / Compatibility

Parakeet-CTC-0.6B: single GPU (T4, A10, A100) or CPU for small batches
Parakeet-1.1B: single A10G / A100 / H100 recommended for real-time workloads
TensorRT-optimized via NVIDIA Riva for production deployment
Runs on NVIDIA Jetson AGX Orin for edge deployment

Language Bindings / APIs

Python (NVIDIA NeMo: nemo.collections.asr)
NVIDIA Riva SDK (gRPC streaming API)
Hugging Face Transformers (AutoModelForCTC, pipeline)
NVIDIA-ASR-NIM microservices for containerized deployment through the current Speech NIM docs

Connections

NVIDIA-NeMo — Parakeet is trained, fine-tuned, and served via the NeMo ASR collection
NVIDIA-Riva — Riva provides production streaming ASR deployment for Parakeet models
NVIDIA-Speech-NIM-Microservices and NVIDIA-ASR-NIM - current docs surface listing Parakeet CTC, TDT, and RNNT ASR NIM options.
NVIDIA-Canary — Canary is NVIDIA’s multilingual complement to Parakeet’s English-focused ASR
TensorRT — Parakeet ASR NIM deployment uses TensorRT/Triton inference acceleration.
NVIDIA-NIM — Available as NIM containers for one-click ASR deployment

AIPS BOOM

Explorer

Parakeet-ASR

Parakeet ASR

Summary

Detail

Purpose

Key Features

Use Cases

Hardware Requirements / Compatibility

Language Bindings / APIs

Connections

Resources

Graph View

Table of Contents

Backlinks