Question 1

What is the word error rate (WER) of this model?

Accepted Answer

On the NST + Common Voice test set the WER is 2.5%. On the Common Voice test set alone it is 8.49% without a language model and 7.37% with a 4-gram language model.

Question 2

How do I use this model via the gigarouter API?

Accepted Answer

Use the gigarouter OpenAI-compatible endpoint with your API key. Send audio sampled at 16 kHz as input and receive transcription text in the response.

Question 3

What license is the model released under?

Accepted Answer

The model is released under the CC0-1.0 license, allowing free use for any purpose.

Question 4

Does the model require a language model to work?

Accepted Answer

No, the model can be used directly without a language model as shown in the usage example. A 4-gram language model can optionally improve accuracy.

Question 5

What audio format and sampling rate does the model expect?

Accepted Answer

The model expects audio input sampled at 16 kHz. The example code uses torchaudio to resample from 48 kHz to 16 kHz if needed.

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec 2.0 Large
Input Sampling Rate	16 kHz
Language	Swedish
License	CC0-1.0

Wav2Vec 2.0 Large VoxRex Swedish

specs

about this model

Key strengths

Performance illustration

Usage notes

best for

FAQ

related speech-to-text models