skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Parakeet RNNT 0.6B

nvidia/parakeet-rnnt-0.6b

published Dec 2023 · updated Jun 2026

Parakeet RNNT 0.6B is an ASR model that transcribes English speech into lower-case text.

est. price
~$0.0034
· estimated, set at launch
API providers
0
downloads / mo
36.6K
license
cc-by-4.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureFastConformer-Transducer (RNNT)
Parameters0.6B
LicenseCC-BY-4.0

about this model

Parakeet-RNNT-0.6B is an automatic speech recognition (ASR) model that transcribes English speech into lower-case text. Developed jointly by NVIDIA NeMo and Suno.ai, it is a FastConformer Transducer model with approximately 600 million parameters. The FastConformer architecture is an optimized version of the Conformer model that uses 8x depthwise-separable convolutional downsampling, achieving 2.8x faster inference than the original Conformer while supporting scaling to billion-parameter models.

Key Capabilities

The model accepts 16 kHz mono-channel audio as input and outputs transcribed text as a string. It uses a SentencePiece Unigram tokenizer with a vocabulary size of 1024. The model was trained on 64,000 hours of English speech, comprising a private 40,000-hour subset and 24,000 hours from public datasets including LibriSpeech, Fisher Corpus, Switchboard-1, WSJ, VCTK, VoxPopuli, Europarl-ASR, MLS English, Mozilla Common Voice, and People's Speech.

Benchmark Performance

Word Error Rate (WER) with greedy decoding on standard benchmarks:

DatasetWER (%)
LS test-clean1.63
SPGI Speech3.06
TEDLIUM-v33.47
Vox Populi3.86
Common Voice8.07
Giga Speech10.07
Earnings-2214.78
AMI17.55

These are greedy decoding results without an external language model. The model supports transcription of long-form audio up to 11 hours through limited context attention, applied post-training with fine-tuning using a global token.

Licensing

This model is released under the CC-BY-4.0 license.

best for

FAQ

What is the primary use of Parakeet RNNT 0.6B?

It transcribes English speech into lower-case text for general ASR tasks.

What audio format does the model accept?

It accepts 16000 Hz mono-channel WAV files.

How can I use this model via gigarouter?

Use the gigarouter OpenAI-compatible endpoint with an API key to send audio and receive transcription.

What is the license for this model?

It is licensed under CC-BY-4.0.

What is the model's Word Error Rate on standard benchmarks?

It achieves WER of 1.63% on LibriSpeech test-clean and 14.78% on Earnings-22 with greedy decoding.

not yet live

We're benchmarking and onboarding Parakeet RNNT 0.6B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →