skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Wav2Vec2 Large XLSR-53 Hungarian

jonatasgrosman/wav2vec2-large-xlsr-53-hungarian

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Hungarian is an automatic speech recognition model fine-tuned for transcribing Hungarian speech audio.

status
coming soon
API providers
0
downloads / mo
3.4M
license
apache-2.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureWav2Vec2 Large XLSR-53
Fine-tuned onCommon Voice 6.1 and CSS10 (Hungarian)
Sampling Rate16 kHz
Evaluation (Common Voice test)WER 31.40%, CER 6.20%

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-hungarian is an automatic speech recognition (ASR) model that transcribes Hungarian speech into text. It is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 trained on the train and validation splits of Common Voice 6.1 and CSS10 Hungarian. The model expects speech input sampled at 16 kHz.

Evaluation Results

On the Common Voice Hungarian test set, the model achieves a Word Error Rate (WER) of 31.40% and a Character Error Rate (CER) of 6.20%. The following table compares these results with other publicly available Hungarian ASR models (evaluated on the same data in April 2021):

Model WER CER
jonatasgrosman/wav2vec2-large-xlsr-53-hungarian 31.40% 6.20%
anton-l/wav2vec2-large-xlsr-53-hungarian 42.39% 9.39%
gchhablani/wav2vec2-large-xlsr-hu 46.42% 10.04%
birgermoell/wav2vec2-large-xlsr-hungarian 46.93% 10.31%

The model uses a CTC decoder without an external language model, making it efficient for direct inference. It was trained using the wav2vec2-sprint training script and is available through gigarouter as a hosted, OpenAI-compatible API.

best for

FAQ

What does this model do?

It transcribes Hungarian speech audio into text using a fine-tuned Wav2Vec2 Large XLSR-53 model.

What audio sample rate is required?

The model requires audio sampled at 16 kHz.

What are the model's WER and CER on Common Voice Hungarian test set?

Word Error Rate (WER) is 31.40% and Character Error Rate (CER) is 6.20%.

How can I use this model via the gigarouter API?

You can call the OpenAI-compatible endpoint with an API key, sending Hungarian audio files for transcription.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Hungarian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →