Question 1

What is this model best for?

Accepted Answer

It excels at transcribing long audio segments (up to 24 minutes) with word-level timestamps, automatic punctuation, and high accuracy on numbers and song lyrics.

Question 2

How does Parakeet TDT 0.6B V2 compare in size and speed?

Accepted Answer

With 600 million parameters, it is compact yet very fast, achieving a real-time factor (RTFx) of 3380 with a batch size of 128 on the HF-Open-ASR leaderboard.

Question 3

What are the license terms?

Accepted Answer

The model is released under the CC-BY-4.0 license, allowing use with attribution.

Question 4

What input and output formats does the model support?

Accepted Answer

It accepts raw audio (e.g., WAV files) and outputs text with accurate word-level timestamps and automatic punctuation and capitalization.

Question 5

How can I call this model via the gigarouter API?

Accepted Answer

Use the gigarouter OpenAI-compatible endpoint with your API key. Pass the audio file as input and receive the transcribed text in the response.

Task	Automatic Speech Recognition (ASR)
Architecture	FastConformer with TDT decoder
Parameters	600M
License	CC-BY-4.0

Dataset	WER (%)
LibriSpeech clean	1.69
LibriSpeech other	3.19
SPGI Speech	2.17
tedlium-v3	3.38
Vox Populi	5.95
GigaSpeech test	9.74
Earnings-22	11.15
AMI Meetings	11.16

Parakeet TDT 0.6B V2

specs

about this model

Benchmark Summary

best for

FAQ

related speech-to-text models