Wav2Vec2 Large XLSR Hindi
theainerd/Wav2Vec2-large-xlsr-hindi
published Mar 2022 · updated Apr 2025
Wav2Vec2 Large XLSR Hindi is an automatic speech recognition (ASR) model fine-tuned from Facebook's Wav2Vec2-Large-XLSR-53 for Hindi language transcription using the MUCS ASR challenge dataset.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2 Large XLSR-53 |
| License | Apache-2.0 (base model) |
about this model
theainerd/Wav2Vec2-large-xlsr-hindi is an automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53 for Hindi transcription. It is trained on the Multilingual and Code-switching ASR Challenges for low-resource Indian languages (MUCS) dataset and evaluated on the Hindi test split of Common Voice. The model requires input speech sampled at 16 kHz.
Key Strengths
- Built on the pretrained XLSR-53 architecture, which leverages cross-lingual representation learning from 53 languages.
- Fine-tuned for Hindi using the MUCS challenge corpus (95.05 hours of speech from stories domain).
- Directly usable without an external language model; outputs character-level transcriptions.
Benchmark Performance
On the Common Voice Hindi test set, the model achieves a Word Error Rate (WER) of 72.62%. For context, the MUCS 2021 challenge baseline on the MUCS Hindi test set was 37.2% WER, while the top entries reached 12–14% WER. Differences in evaluation datasets and preprocessing mean these figures are not directly comparable.
Licensing
The base model (facebook/wav2vec2-large-xlsr-53) is released under the Apache-2.0 license. The fine-tuned Hindi model follows the same license.
Training Details
The fine-tuning script is publicly available as a Colab notebook. No external language model is used during inference.
best for
- ·Transcribing Hindi speech audio to text
- ·Building Hindi voice interfaces and applications
- ·Hindi language ASR research and fine-tuning
FAQ
Hindi automatic speech recognition (ASR), transcribing Hindi audio to text.
Speech audio sampled at 16 kHz, processed using the Wav2Vec2Processor.
Text transcription in Hindi (Devanagari script).
The model card does not specify; the base Wav2Vec2-Large-XLSR-53 has approximately 300 million parameters.
Use the gigarouter OpenAI-compatible endpoint with your API key; send audio data as per the API documentation.
We're benchmarking and onboarding Wav2Vec2 Large XLSR Hindi as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.