Wav2Vec2 Large XLSR Hindi

theainerd/Wav2Vec2-large-xlsr-hindi

published Mar 2022 · updated Apr 2025

Wav2Vec2 Large XLSR Hindi is an automatic speech recognition (ASR) model fine-tuned from Facebook's Wav2Vec2-Large-XLSR-53 for Hindi language transcription using the MUCS ASR challenge dataset.

est. price

~$0.0034

· estimated, set at launch

API providers

downloads / mo

2.1M

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2 Large XLSR-53
License	Apache-2.0 (base model)

about this model

theainerd/Wav2Vec2-large-xlsr-hindi is an automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53 for Hindi transcription. It is trained on the Multilingual and Code-switching ASR Challenges for low-resource Indian languages (MUCS) dataset and evaluated on the Hindi test split of Common Voice. The model requires input speech sampled at 16 kHz.

Key Strengths

Built on the pretrained XLSR-53 architecture, which leverages cross-lingual representation learning from 53 languages.
Fine-tuned for Hindi using the MUCS challenge corpus (95.05 hours of speech from stories domain).
Directly usable without an external language model; outputs character-level transcriptions.

Benchmark Performance

On the Common Voice Hindi test set, the model achieves a Word Error Rate (WER) of 72.62%. For context, the MUCS 2021 challenge baseline on the MUCS Hindi test set was 37.2% WER, while the top entries reached 12–14% WER. Differences in evaluation datasets and preprocessing mean these figures are not directly comparable.

Licensing

The base model (facebook/wav2vec2-large-xlsr-53) is released under the Apache-2.0 license. The fine-tuned Hindi model follows the same license.

Training Details

The fine-tuning script is publicly available as a Colab notebook. No external language model is used during inference.

best for

·Transcribing Hindi speech audio to text
·Building Hindi voice interfaces and applications
·Hindi language ASR research and fine-tuning

FAQ

What is this model best for?

Hindi automatic speech recognition (ASR), transcribing Hindi audio to text.

What input format does the model require?

Speech audio sampled at 16 kHz, processed using the Wav2Vec2Processor.

What output does the model produce?

Text transcription in Hindi (Devanagari script).

What is the model size or number of parameters?

The model card does not specify; the base Wav2Vec2-Large-XLSR-53 has approximately 300 million parameters.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key; send audio data as per the API documentation.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR Hindi as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo