skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Wav2Vec2 Large XLSR-53 Arabic

jonatasgrosman/wav2vec2-large-xlsr-53-arabic

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Arabic is an automatic speech recognition model fine-tuned for transcribing Arabic speech into text.

status
coming soon
API providers
0
downloads / mo
3.5M
license
apache-2.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureWav2Vec2-Large-XLSR-53
Parameters~300M
LicenseApache 2.0

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-arabic is an automatic speech recognition (ASR) model for Arabic fine-tuned from facebook/wav2vec2-large-xlsr-53. It was trained on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus. The model expects audio sampled at 16 kHz.

Key Strengths

On the Common Voice Arabic test set, the model achieves a Word Error Rate (WER) of 39.59% and a Character Error Rate (CER) of 18.18%. This performance compares favorably against other publicly available Arabic ASR models as shown in the evaluation table below (reported on 2021-05-14).

Model WER CER
jonatasgrosman/wav2vec2-large-xlsr-53-arabic 39.59% 18.18%
bakrianoo/sinai-voice-ar-stt 45.30% 21.84%
othrif/wav2vec2-large-xlsr-arabic 45.93% 20.51%
kmfoda/wav2vec2-large-xlsr-arabic 54.14% 26.07%
mohammed/wav2vec2-large-xlsr-arabic 56.11% 26.79%
anas/wav2vec2-large-xlsr-arabic 62.02% 27.09%
elgeish/wav2vec2-large-xlsr-53-arabic 100.00% 100.56%

This model is derived from the Apache-2.0 licensed base model and is hosted on gigarouter as a managed, OpenAI-compatible API. No local installation or environment setup is required—developers can integrate it via a single API call.

best for

FAQ

What input format does the model require?

The model expects audio sampled at 16 kHz, provided as a waveform array or file path.

How does this model compare to other Arabic ASR models?

It achieves a Word Error Rate of 39.59% and Character Error Rate of 18.18% on Common Voice Arabic test data, outperforming several other fine-tuned XLSR-53 Arabic models.

What is the license for this model?

It is derived from facebook/wav2vec2-large-xlsr-53, which is licensed under Apache 2.0.

How can I use this model via the gigarouter API?

Send audio data to the gigarouter OpenAI-compatible endpoint with your API key; the model will return transcribed text.

What datasets was the model fine-tuned on?

It was fine-tuned on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Arabic as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →