wav2vec2 large xls r 300m Urdu

kingabzpro/wav2vec2-large-xls-r-300m-Urdu

published Mar 2022 · updated Jun 2026

A popular open speech-to-text model, with 2.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.0034

· estimated, set at launch

API providers

downloads / mo

2.3M

license

apache-2.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2 CTC (XLS-R 300M backbone)
Parameters	300M
License	Apache-2.0

about this model

kingabzpro/wav2vec2-large-xls-r-300m-Urdu is an automatic speech recognition (ASR) model that transcribes Urdu speech from 16 kHz mono audio using a Connectionist Temporal Classification (CTC) decoder, with an optional 5-gram KenLM language model for improved accuracy.

It is a fine-tuned version of Facebook’s XLS-R 300M, which was pretrained on 436k hours of unlabeled speech across 128 languages (VoxPopuli, MLS, CommonVoice, BABEL, VoxLingua107). The base model’s architecture achieved relative word error rate reductions of 14–34% across multiple benchmarks and improved CoVoST-2 speech translation by an average of 7.4 BLEU.

Key strengths

Best reported result: 39.89% WER / 16.70% CER on the Urdu Common Voice 8.0 test set when decoded with the included 5-gram KenLM language model. This is a 29% relative WER improvement over greedy CTC (56.07% WER).
Efficient decoding: Greedy CTC yields faster, lightweight inference; the KenLM decoder boosts accuracy with minimal overhead.
Reproducible evaluation: A Kaggle notebook provides a five-sample smoke test and full evaluation script.

Benchmark results

Decoder	Test WER	Test CER
Greedy CTC	56.07%	23.70%
5-gram KenLM	39.89%	16.70%

The model is released under the Apache-2.0 license. It is hosted as a managed, OpenAI-compatible API by gigarouter.

FAQ

What audio format does the model expect?

16 kHz mono waveform audio.

Does the model include a language model for decoding?

Yes, an optional 5-gram KenLM language model is provided to improve accuracy.

What is the reported word error rate on Urdu Common Voice 8.0?

39.89% WER with KenLM decoding; 56.07% with greedy CTC.

What license is the model released under?

Apache-2.0.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key, sending audio data as per the documentation.

not yet live

We're benchmarking and onboarding wav2vec2 large xls r 300m Urdu as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo