skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Wav2Vec2 Large XLSR-53 Dutch

jonatasgrosman/wav2vec2-large-xlsr-53-dutch

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Dutch is a speech recognition model fine-tuned for Dutch using Common Voice 6.1 and CSS10 datasets.

status
coming soon
API providers
0
downloads / mo
4.1M
license
apache-2.0

specs

TaskAutomatic Speech Recognition
ArchitectureWav2Vec2 Large XLSR-53
LanguageDutch
LicenseApache 2.0

about this model

This model is a Dutch automatic speech recognition (ASR) model based on facebook/wav2vec2-large-xlsr-53, fine-tuned on the train and validation splits of Common Voice 6.1 and the CSS10 single-speaker dataset. It accepts speech audio sampled at 16 kHz and produces text transcriptions without requiring a separate language model, though an optional language model can further reduce error rates.

Key capabilities and performance

  • Trained on both multi-speaker (Common Voice) and single-speaker (CSS10) Dutch data, giving it broad acoustic robustness for common speech tasks.
  • On the Common Voice nl test set (clean read speech) the model achieves a word error rate (WER) of 15.72% and a character error rate (CER) of 5.35%. With an external language model, WER drops to 12.84% and CER to 4.64%.
  • On the Robust Speech Event dev data (more challenging, variable conditions) the model reports a WER of 35.79% and CER of 17.67%; with a language model, WER improves to 31.54% and CER to 16.37%.
  • Licensed under Apache 2.0.

Inference example

The table below shows sample transcriptions from the Common Voice test set (without a language model):

ReferencePrediction
DE ABORIGINALS ZIJN DE OORSPRONKELIJKE BEWONERS VAN AUSTRALIË.DE ABBORIGENALS ZIJN DE OORSPRONKELIJKE BEWONERS VAN AUSTRALIË
MIJN TOETSENBORD ZIT VOL STOF.MIJN TOETSENBORD ZIT VOL STOF
ZE HAD DE BANK BESCHADIGD MET HAAR SKATEBOARD.ZE HAD DE BANK BESCHADIGD MET HAAR SCHEETBOORD
WAAR LAAT JIJ JE ONDERHOUD DOEN?WAAR LAAT JIJ HET ONDERHOUD DOEN
NA HET LEZEN VAN VELE BEOORDELINGEN HAD ZE EINDELIJK HAAR OOG LATEN VALLEN OP EEN LAPTOP MET EEN QWERTY TOETSENBORD.NA HET LEZEN VAN VELE BEOORDELINGEN HAD ZE EINDELIJK HAAR OOG LATEN VALLEN OP EEN LAPTOP MET EEN QUERTITOETSEMBORD
DE TAMPONS ZIJN OP.DE TAPONT ZIJN OP
MARIJKE KENT OLIVIER NU AL MEER DAN TWEE JAAR.MAARRIJKEN KENT OLIEVIER NU AL MEER DAN TWEE JAAR
HET VOEREN VAN BROOD AAN EENDEN IS EIGENLIJK ONGEZOND VOOR DE BEESTEN.HET VOEREN VAN BEUROT AAN EINDEN IS EIGENLIJK ONGEZOND VOOR DE BEESTEN
PARKET MOET JE STOFZUIGEN, TEGELS MOET JE DWEILEN.PARKET MOET JE STOF ZUIGEN MAAR TEGELS MOET JE DWEILEN
IN ONZE BUURT KENT IEDEREEN ELKAAR.IN ONZE BUURT KENT IEDEREEN ELKAAR

This model is hosted by gigarouter as a managed, OpenAI-compatible API. It is specialized for Dutch speech transcription and performs best on read-speech scenarios, with lower performance on noisy or spontaneously spoken audio as reflected in the Robust Speech Event results.

best for

FAQ

What input format does the model require?

Audio must be sampled at 16 kHz, mono or stereo, in common formats like wav or mp3.

How can I use this model via the gigarouter API?

Send audio to the OpenAI-compatible endpoint with an API key; the service returns a transcription.

What are the model's word error rate (WER) results?

On Common Voice nl test set: WER 15.72, CER 5.35; with an external language model: WER 12.84, CER 4.64.

Is the model available under a permissive license?

Yes, it is released under the Apache 2.0 license.

Can I use an external language model to improve accuracy?

Yes, the model supports integration with an external language model, which reduces WER by about 3%.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Dutch as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →