skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Wav2Vec2 XLSR-53 Large Portuguese

jonatasgrosman/wav2vec2-large-xlsr-53-portuguese

published Mar 2022 · updated Dec 2022

Wav2Vec2 XLSR-53 Large Portuguese is an automatic speech recognition model that transcribes Portuguese audio into text, fine-tuned on Common Voice 6.1.

status
coming soon
API providers
0
downloads / mo
3.2M
license
apache-2.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureWav2Vec2-XLSR-53 Large
LanguagePortuguese
DatasetCommon Voice 6.1
LicenseApache 2.0

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-portuguese is an automatic speech recognition (ASR) model fine-tuned from Facebook’s Wav2Vec2-XLSR-53 large checkpoint on Portuguese speech. It transcribes Portuguese audio into text and is optimized for input sampled at 16 kHz.

The model was fine-tuned on the train and validation splits of Mozilla Common Voice 6.1, and its performance has been evaluated on standard benchmarks without and with a language model (LM).

Benchmark Results

Dataset Metric Without LM With LM
Common Voice pt (test) WER 11.31% 9.01%
Common Voice pt (test) CER 3.74% 3.21%
Robust Speech Event Dev Data WER 42.1% 36.92%
Robust Speech Event Dev Data CER 17.93% 16.88%

The model achieves a word error rate of 11.31% on Common Voice Portuguese test data without a language model, improving to 9.01% with an LM. Corresponding character error rates are 3.74% and 3.21%. On the more challenging Robust Speech Event Dev Data, the model achieves a WER of 42.1% without LM and 36.92% with LM.

This is a specialist model for Portuguese ASR, available as a hosted API on gigarouter. It is released under the Apache 2.0 license.

best for

FAQ

What input sample rate does the model require?

The model expects speech input sampled at 16 kHz.

Can I use the model without a language model?

Yes, it can be used directly without a language model, but adding one improves accuracy (WER drops from 11.31% to 9.01% on Common Voice pt).

What is the license of this model?

Apache 2.0.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key; refer to the gigarouter documentation for exact endpoint and request format.

What architecture is the model based on?

It is based on Wav2Vec2-XLSR-53 Large, a self-supervised speech representation model pre-trained on 53 languages.

not yet live

We're benchmarking and onboarding Wav2Vec2 XLSR-53 Large Portuguese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →