Wav2Vec2 Large XLSR-53 Arabic
jonatasgrosman/wav2vec2-large-xlsr-53-arabic
published Mar 2022 · updated Dec 2022
Wav2Vec2 Large XLSR-53 Arabic is an automatic speech recognition model fine-tuned for transcribing Arabic speech into text.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2-Large-XLSR-53 |
| Parameters | ~300M |
| License | Apache 2.0 |
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-arabic is an automatic speech recognition (ASR) model for Arabic fine-tuned from facebook/wav2vec2-large-xlsr-53. It was trained on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus. The model expects audio sampled at 16 kHz.
Key Strengths
On the Common Voice Arabic test set, the model achieves a Word Error Rate (WER) of 39.59% and a Character Error Rate (CER) of 18.18%. This performance compares favorably against other publicly available Arabic ASR models as shown in the evaluation table below (reported on 2021-05-14).
| Model | WER | CER |
|---|---|---|
| jonatasgrosman/wav2vec2-large-xlsr-53-arabic | 39.59% | 18.18% |
| bakrianoo/sinai-voice-ar-stt | 45.30% | 21.84% |
| othrif/wav2vec2-large-xlsr-arabic | 45.93% | 20.51% |
| kmfoda/wav2vec2-large-xlsr-arabic | 54.14% | 26.07% |
| mohammed/wav2vec2-large-xlsr-arabic | 56.11% | 26.79% |
| anas/wav2vec2-large-xlsr-arabic | 62.02% | 27.09% |
| elgeish/wav2vec2-large-xlsr-53-arabic | 100.00% | 100.56% |
This model is derived from the Apache-2.0 licensed base model and is hosted on gigarouter as a managed, OpenAI-compatible API. No local installation or environment setup is required—developers can integrate it via a single API call.
best for
- ·Transcribing Arabic audio recordings for subtitles or captions
- ·Building voice-controlled applications in Arabic
- ·Converting Arabic speech to text for analytics or search
FAQ
The model expects audio sampled at 16 kHz, provided as a waveform array or file path.
It achieves a Word Error Rate of 39.59% and Character Error Rate of 18.18% on Common Voice Arabic test data, outperforming several other fine-tuned XLSR-53 Arabic models.
It is derived from facebook/wav2vec2-large-xlsr-53, which is licensed under Apache 2.0.
Send audio data to the gigarouter OpenAI-compatible endpoint with your API key; the model will return transcribed text.
It was fine-tuned on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus.
We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Arabic as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.