Wav2Vec2 Large XLSR-53 Hungarian
jonatasgrosman/wav2vec2-large-xlsr-53-hungarian
published Mar 2022 · updated Dec 2022
Wav2Vec2 Large XLSR-53 Hungarian is an automatic speech recognition model fine-tuned for transcribing Hungarian speech audio.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2 Large XLSR-53 |
| Fine-tuned on | Common Voice 6.1 and CSS10 (Hungarian) |
| Sampling Rate | 16 kHz |
| Evaluation (Common Voice test) | WER 31.40%, CER 6.20% |
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-hungarian is an automatic speech recognition (ASR) model that transcribes Hungarian speech into text. It is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 trained on the train and validation splits of Common Voice 6.1 and CSS10 Hungarian. The model expects speech input sampled at 16 kHz.
Evaluation Results
On the Common Voice Hungarian test set, the model achieves a Word Error Rate (WER) of 31.40% and a Character Error Rate (CER) of 6.20%. The following table compares these results with other publicly available Hungarian ASR models (evaluated on the same data in April 2021):
| Model | WER | CER |
|---|---|---|
| jonatasgrosman/wav2vec2-large-xlsr-53-hungarian | 31.40% | 6.20% |
| anton-l/wav2vec2-large-xlsr-53-hungarian | 42.39% | 9.39% |
| gchhablani/wav2vec2-large-xlsr-hu | 46.42% | 10.04% |
| birgermoell/wav2vec2-large-xlsr-hungarian | 46.93% | 10.31% |
The model uses a CTC decoder without an external language model, making it efficient for direct inference. It was trained using the wav2vec2-sprint training script and is available through gigarouter as a hosted, OpenAI-compatible API.
best for
- ·Transcribing Hungarian speech from audiobooks, podcasts, or voice recordings
- ·Building Hungarian voice-controlled applications and virtual assistants
- ·Automating subtitling or transcription of Hungarian media content
FAQ
It transcribes Hungarian speech audio into text using a fine-tuned Wav2Vec2 Large XLSR-53 model.
The model requires audio sampled at 16 kHz.
Word Error Rate (WER) is 31.40% and Character Error Rate (CER) is 6.20%.
You can call the OpenAI-compatible endpoint with an API key, sending Hungarian audio files for transcription.
We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Hungarian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.