Faster Whisper Base

Systran/faster-whisper-base

published Nov 2023 · updated Nov 2023

Faster Whisper Base is an ASR model that transcribes speech to text using OpenAI's Whisper base architecture optimized with CTranslate2 for faster inference.

status

coming soon

API providers

downloads / mo

1.4M

license

mit

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Whisper Base (Transformer)
License	Apache 2.0
Quantization	FP16 (weights saved in float16)
Supported Languages	99 languages (same as original Whisper base)

about this model

Systran/faster-whisper-base is an automatic speech recognition (ASR) model that converts spoken language into text. It is the openai/whisper-base model optimized for efficient inference using the CTranslate2 runtime, which powers the faster-whisper library. The model weights are converted to the CTranslate2 format with FP16 quantization, and the runtime applies performance optimizations such as layer fusion, padding removal, and batch reordering to accelerate inference and reduce memory usage on both CPU and GPU.

Benchmark performance

On the LibriSpeech test set, the original Whisper base model achieves the following word error rates (WER):

test-clean: 5.01%
test-other: 12.85%

On Common Voice 11.0 Hindi test, WER is 131% (source: original model card). The model is released under the Apache 2.0 license.

Model size with quantization

CTranslate2 supports multiple precision levels that significantly reduce the model’s storage footprint while maintaining accuracy. The table below shows the disk size of the converted base Transformer model for each compute type (source: CTranslate2 quantization docs). When a compute type is not natively supported on a given hardware, CTranslate2 automatically falls back to a compatible alternative.

Compute type	Size
float32	364 MB
int16	187 MB
float16	182 MB
bfloat16	182 MB
int8_float32	100 MB
int8_float16	95 MB
int8_bfloat16	95 MB

As a hosted API on gigarouter, this model is available for direct, OpenAI-compatible integration without requiring local installation or model conversion.

best for

·Real-time speech transcription for meetings or calls
·Multilingual audio transcription (e.g., podcasts, videos)
·Voice-controlled applications requiring low-latency ASR

FAQ

How does this model differ from the original OpenAI Whisper base?

This model is converted to CTranslate2 format with FP16 quantization, enabling faster inference and lower memory usage.

What languages does it support?

It supports the same 99 languages as the original Whisper base model, including English, Chinese, Spanish, and more.

How can I use this model via the gigarouter API?

Send audio to the OpenAI-compatible endpoint with your API key, using the /v1/audio/transcriptions route.

What is the license for this model?

The model is licensed under Apache 2.0.

What is the expected word error rate (WER) on LibriSpeech?

On LibriSpeech test-clean, the original model achieves 5.01% WER; on test-other, 12.85% WER.

not yet live

We're benchmarking and onboarding Faster Whisper Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo