skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

wav2vec2 large xls r 300m Urdu

kingabzpro/wav2vec2-large-xls-r-300m-Urdu

published Mar 2022 · updated Jun 2026

A popular open speech-to-text model, with 2.3M downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price
~$0.0034
· estimated, set at launch
API providers
0
downloads / mo
2.3M
license
apache-2.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureWav2Vec2 CTC (XLS-R 300M backbone)
Parameters300M
LicenseApache-2.0

about this model

kingabzpro/wav2vec2-large-xls-r-300m-Urdu is an automatic speech recognition (ASR) model that transcribes Urdu speech from 16 kHz mono audio using a Connectionist Temporal Classification (CTC) decoder, with an optional 5-gram KenLM language model for improved accuracy.

It is a fine-tuned version of Facebook’s XLS-R 300M, which was pretrained on 436k hours of unlabeled speech across 128 languages (VoxPopuli, MLS, CommonVoice, BABEL, VoxLingua107). The base model’s architecture achieved relative word error rate reductions of 14–34% across multiple benchmarks and improved CoVoST-2 speech translation by an average of 7.4 BLEU.

Key strengths

  • Best reported result: 39.89% WER / 16.70% CER on the Urdu Common Voice 8.0 test set when decoded with the included 5-gram KenLM language model. This is a 29% relative WER improvement over greedy CTC (56.07% WER).
  • Efficient decoding: Greedy CTC yields faster, lightweight inference; the KenLM decoder boosts accuracy with minimal overhead.
  • Reproducible evaluation: A Kaggle notebook provides a five-sample smoke test and full evaluation script.

Benchmark results

DecoderTest WERTest CER
Greedy CTC56.07%23.70%
5-gram KenLM39.89%16.70%

The model is released under the Apache-2.0 license. It is hosted as a managed, OpenAI-compatible API by gigarouter.

FAQ

What audio format does the model expect?

16 kHz mono waveform audio.

Does the model include a language model for decoding?

Yes, an optional 5-gram KenLM language model is provided to improve accuracy.

What is the reported word error rate on Urdu Common Voice 8.0?

39.89% WER with KenLM decoding; 56.07% with greedy CTC.

What license is the model released under?

Apache-2.0.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key, sending audio data as per the documentation.

not yet live

We're benchmarking and onboarding wav2vec2 large xls r 300m Urdu as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →