Wav2Vec2 Large XLSR-53 Hungarian

jonatasgrosman/wav2vec2-large-xlsr-53-hungarian

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Hungarian is an automatic speech recognition model fine-tuned for transcribing Hungarian speech audio.

status

coming soon

API providers

downloads / mo

3.4M

license

apache-2.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2 Large XLSR-53
Fine-tuned on	Common Voice 6.1 and CSS10 (Hungarian)
Sampling Rate	16 kHz
Evaluation (Common Voice test)	WER 31.40%, CER 6.20%

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-hungarian is an automatic speech recognition (ASR) model that transcribes Hungarian speech into text. It is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 trained on the train and validation splits of Common Voice 6.1 and CSS10 Hungarian. The model expects speech input sampled at 16 kHz.

Evaluation Results

On the Common Voice Hungarian test set, the model achieves a Word Error Rate (WER) of 31.40% and a Character Error Rate (CER) of 6.20%. The following table compares these results with other publicly available Hungarian ASR models (evaluated on the same data in April 2021):

Model	WER	CER
jonatasgrosman/wav2vec2-large-xlsr-53-hungarian	31.40%	6.20%
anton-l/wav2vec2-large-xlsr-53-hungarian	42.39%	9.39%
gchhablani/wav2vec2-large-xlsr-hu	46.42%	10.04%
birgermoell/wav2vec2-large-xlsr-hungarian	46.93%	10.31%

The model uses a CTC decoder without an external language model, making it efficient for direct inference. It was trained using the wav2vec2-sprint training script and is available through gigarouter as a hosted, OpenAI-compatible API.

best for

·Transcribing Hungarian speech from audiobooks, podcasts, or voice recordings
·Building Hungarian voice-controlled applications and virtual assistants
·Automating subtitling or transcription of Hungarian media content

FAQ

What does this model do?

It transcribes Hungarian speech audio into text using a fine-tuned Wav2Vec2 Large XLSR-53 model.

What audio sample rate is required?

The model requires audio sampled at 16 kHz.

What are the model's WER and CER on Common Voice Hungarian test set?

Word Error Rate (WER) is 31.40% and Character Error Rate (CER) is 6.20%.

How can I use this model via the gigarouter API?

You can call the OpenAI-compatible endpoint with an API key, sending Hungarian audio files for transcription.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Hungarian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo