Wav2Vec2 Large XLSR-53 Arabic

jonatasgrosman/wav2vec2-large-xlsr-53-arabic

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Arabic is an automatic speech recognition model fine-tuned for transcribing Arabic speech into text.

status

coming soon

API providers

downloads / mo

3.5M

license

apache-2.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2-Large-XLSR-53
Parameters	~300M
License	Apache 2.0

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-arabic is an automatic speech recognition (ASR) model for Arabic fine-tuned from facebook/wav2vec2-large-xlsr-53. It was trained on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus. The model expects audio sampled at 16 kHz.

Key Strengths

On the Common Voice Arabic test set, the model achieves a Word Error Rate (WER) of 39.59% and a Character Error Rate (CER) of 18.18%. This performance compares favorably against other publicly available Arabic ASR models as shown in the evaluation table below (reported on 2021-05-14).

Model	WER	CER
jonatasgrosman/wav2vec2-large-xlsr-53-arabic	39.59%	18.18%
bakrianoo/sinai-voice-ar-stt	45.30%	21.84%
othrif/wav2vec2-large-xlsr-arabic	45.93%	20.51%
kmfoda/wav2vec2-large-xlsr-arabic	54.14%	26.07%
mohammed/wav2vec2-large-xlsr-arabic	56.11%	26.79%
anas/wav2vec2-large-xlsr-arabic	62.02%	27.09%
elgeish/wav2vec2-large-xlsr-53-arabic	100.00%	100.56%

This model is derived from the Apache-2.0 licensed base model and is hosted on gigarouter as a managed, OpenAI-compatible API. No local installation or environment setup is required—developers can integrate it via a single API call.

best for

·Transcribing Arabic audio recordings for subtitles or captions
·Building voice-controlled applications in Arabic
·Converting Arabic speech to text for analytics or search

FAQ

What input format does the model require?

The model expects audio sampled at 16 kHz, provided as a waveform array or file path.

How does this model compare to other Arabic ASR models?

It achieves a Word Error Rate of 39.59% and Character Error Rate of 18.18% on Common Voice Arabic test data, outperforming several other fine-tuned XLSR-53 Arabic models.

What is the license for this model?

It is derived from facebook/wav2vec2-large-xlsr-53, which is licensed under Apache 2.0.

How can I use this model via the gigarouter API?

Send audio data to the gigarouter OpenAI-compatible endpoint with your API key; the model will return transcribed text.

What datasets was the model fine-tuned on?

It was fine-tuned on the train and validation splits of Common Voice 6.1 and the Arabic Speech Corpus.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Arabic as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo