Question 1

What is the primary use case for Whisper Small English?

Accepted Answer

It is designed for English speech recognition and can transcribe audio to text with high accuracy and robustness to accents, background noise, and technical language.

Question 2

How does Whisper Small English compare in size and speed to larger Whisper models?

Accepted Answer

It has 244M parameters, requires about 2GB VRAM, and runs approximately 4x faster than the Whisper large model on an A100 GPU.

Question 3

What input format does the model expect?

Accepted Answer

It expects audio preprocessed into log-Mel spectrograms at a 16kHz sampling rate. The model handles audio chunks of up to 30 seconds; longer audio can be processed via chunking.

Question 4

How can I use this model via the gigarouter API?

Accepted Answer

Send requests to the gigarouter OpenAI-compatible endpoint with an API key and the audio data in the request body. The response will contain the transcribed text.

Question 5

Does the model support timestamps in the transcription?

Accepted Answer

Yes, by using the pipeline with `return_timestamps=True`, you can obtain segment-level timestamps for each transcribed phrase.

Task	Automatic Speech Recognition (ASR)
Architecture	Transformer encoder-decoder (sequence-to-sequence)
Parameters	244 million
Language	English-only

Whisper Small English

specs

about this model

best for

FAQ

related speech-to-text models