skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Whisper Tiny

openai/whisper-tiny

published Sep 2022 · updated Feb 2024

Whisper Tiny is an automatic speech recognition (ASR) model that transcribes and translates speech across multiple languages using a Transformer encoder-decoder architecture trained on 680k hours of weakly supervised data.

est. price
~$0.0034
· estimated, set at launch
API providers
0
downloads / mo
1.6M
license
apache-2.0

specs

TaskAutomatic Speech Recognition (ASR) & Speech Translation
ArchitectureTransformer encoder-decoder (sequence-to-sequence)
Parameters39 M
LicenseMIT

about this model

openai/whisper-tiny is an automatic speech recognition (ASR) model that transcribes audio to text and can also perform speech translation, trained on 680,000 hours of weakly supervised multilingual data.

Architecture and training

Whisper uses a Transformer encoder-decoder (sequence-to-sequence) architecture. The model was trained on 680k hours of labelled speech: 65% English-only (438k hours), 18% non-English audio with English transcripts (126k hours), and 17% non-English audio with native transcripts (117k hours), covering 98 languages. It was proposed in the paper Robust Speech Recognition via Large-Scale Weak Supervision (Radford et al., OpenAI).

Key strengths and benchmarks

Whisper-tiny generalizes to many domains without fine-tuning. On LibriSpeech test-clean, it achieves a Word Error Rate (WER) of 7.55%. The tiny variant (39 million parameters) requires approximately 1 GB VRAM and runs about 10x faster than the large model. It supports both transcription (same language as audio) and translation (to English). Long audio can be transcribed by chunking into 30-second segments, with optional timestamp prediction.

Model sizes

SizeParametersEnglish-onlyMultilingual
tiny39 M
base74 M
small244 M
medium769 M
large1550 M
large-v21550 M

Known limitations

Due to weak supervision on noisy data, the model may produce hallucinations (text not present in the audio). Accuracy varies by language, with lower performance on low-resource languages that have less training data.

best for

FAQ

What is the input format for the Whisper Tiny API?

The API accepts audio as a file upload or a base64-encoded PCM 16-bit mono 16 kHz waveform. The model internally converts audio to log-Mel spectrograms.

How does Whisper Tiny compare in speed to larger Whisper variants?

Whisper Tiny is the smallest and fastest model, roughly 10x faster than large and requires about 1 GB VRAM.

What languages does Whisper Tiny support?

It supports 98 languages for speech recognition and can translate from many of those languages into English. Performance varies by language, especially for low-resource ones.

How can I call the Whisper Tiny model on gigarouter?

Use the OpenAI-compatible endpoint with your gigarouter API key, sending a POST request to the /v1/audio/transcriptions or /v1/audio/translations path with the audio file.

Is the model fine-tunable or available for local deployment?

The MIT license allows free use, modification, and distribution. The model can be deployed locally using the openai-whisper Python package and a compatible GPU, but gigarouter provides a hosted API.

not yet live

We're benchmarking and onboarding Whisper Tiny as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →