skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Parakeet CTC 1.1B

nvidia/parakeet-ctc-1.1b

published Dec 2023 · updated Sep 2025

Parakeet CTC 1.1B is an automatic speech recognition model that transcribes English speech into lower-case text using a FastConformer-CTC architecture with 1.1 billion parameters.

est. price
~$0.0034
· estimated, set at launch
API providers
0
downloads / mo
781.7K
license
cc-by-4.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureFastConformer-CTC
Parameters1.1B
LicenseCC-BY-4.0

about this model

Parakeet CTC 1.1B is an automatic speech recognition (ASR) model that transcribes English speech into lower-case text. It is an XXL version of the FastConformer CTC architecture with approximately 1.1 billion parameters, jointly developed by NVIDIA NeMo and Suno.ai.

The model is built on the Fast Conformer architecture, which is 2.8x faster than the original Conformer while supporting scaling to billion-parameter models. It uses CTC loss and a SentencePiece Unigram tokenizer with a vocabulary size of 1024. The model supports transcription of long-form speech up to 11 hours via post-training limited context attention with a global token. The architecture was accepted at ASRU 2023.

Training Data

The model was trained on 64,000 hours of English speech, comprising 40,000 hours of private data and 24,000 hours from public datasets including Librispeech, Fisher Corpus, Switchboard-1, WSJ, VCTK, VoxPopuli, Europarl-ASR, Multilingual Librispeech (MLS EN), Mozilla Common Voice (v7.0), and People's Speech.

Performance

Word Error Rate (WER%) with greedy decoding (no external language model) on standard benchmarks:

Benchmark WER (%)
AMI15.62
Earnings-2213.69
Giga Speech10.27
LibriSpeech test-clean1.83
SPGI Speech3.54
TEDLIUM-v34.20
Vox Populi3.54
Common Voice6.53

Additional benchmark results are available on the HuggingFace ASR Leaderboard.

Key Capabilities

  • Accepts 16 kHz mono-channel audio input
  • Supports transcription of long-form audio up to 11 hours
  • Fast Conformer architecture delivers 2.8x speed improvement over original Conformer
  • Licensed under CC-BY-4.0

best for

FAQ

What input format does the model require?

It accepts 16 kHz mono-channel WAV audio as input.

What output does the model produce?

It outputs transcribed speech as a lowercase English string.

How does this model compare in speed to the original Conformer?

The FastConformer architecture is 2.8x faster than the original Conformer.

What is the license for this model?

It is licensed under CC-BY-4.0.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send audio and receive transcriptions.

not yet live

We're benchmarking and onboarding Parakeet CTC 1.1B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →