Faster Whisper Base
Systran/faster-whisper-base
published Nov 2023 · updated Nov 2023
Faster Whisper Base is an ASR model that transcribes speech to text using OpenAI's Whisper base architecture optimized with CTranslate2 for faster inference.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Whisper Base (Transformer) |
| License | Apache 2.0 |
| Quantization | FP16 (weights saved in float16) |
| Supported Languages | 99 languages (same as original Whisper base) |
about this model
Systran/faster-whisper-base is an automatic speech recognition (ASR) model that converts spoken language into text. It is the openai/whisper-base model optimized for efficient inference using the CTranslate2 runtime, which powers the faster-whisper library. The model weights are converted to the CTranslate2 format with FP16 quantization, and the runtime applies performance optimizations such as layer fusion, padding removal, and batch reordering to accelerate inference and reduce memory usage on both CPU and GPU.
Benchmark performance
On the LibriSpeech test set, the original Whisper base model achieves the following word error rates (WER):
- test-clean: 5.01%
- test-other: 12.85%
On Common Voice 11.0 Hindi test, WER is 131% (source: original model card). The model is released under the Apache 2.0 license.
Model size with quantization
CTranslate2 supports multiple precision levels that significantly reduce the model’s storage footprint while maintaining accuracy. The table below shows the disk size of the converted base Transformer model for each compute type (source: CTranslate2 quantization docs). When a compute type is not natively supported on a given hardware, CTranslate2 automatically falls back to a compatible alternative.
| Compute type | Size |
|---|---|
| float32 | 364 MB |
| int16 | 187 MB |
| float16 | 182 MB |
| bfloat16 | 182 MB |
| int8_float32 | 100 MB |
| int8_float16 | 95 MB |
| int8_bfloat16 | 95 MB |
As a hosted API on gigarouter, this model is available for direct, OpenAI-compatible integration without requiring local installation or model conversion.
best for
- ·Real-time speech transcription for meetings or calls
- ·Multilingual audio transcription (e.g., podcasts, videos)
- ·Voice-controlled applications requiring low-latency ASR
FAQ
This model is converted to CTranslate2 format with FP16 quantization, enabling faster inference and lower memory usage.
It supports the same 99 languages as the original Whisper base model, including English, Chinese, Spanish, and more.
Send audio to the OpenAI-compatible endpoint with your API key, using the /v1/audio/transcriptions route.
The model is licensed under Apache 2.0.
On LibriSpeech test-clean, the original model achieves 5.01% WER; on test-other, 12.85% WER.
We're benchmarking and onboarding Faster Whisper Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.