Wav2Vec2 XLSR-53 Large Portuguese
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese
published Mar 2022 · updated Dec 2022
Wav2Vec2 XLSR-53 Large Portuguese is an automatic speech recognition model that transcribes Portuguese audio into text, fine-tuned on Common Voice 6.1.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2-XLSR-53 Large |
| Language | Portuguese |
| Dataset | Common Voice 6.1 |
| License | Apache 2.0 |
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese is an automatic speech recognition (ASR) model fine-tuned from Facebook’s Wav2Vec2-XLSR-53 large checkpoint on Portuguese speech. It transcribes Portuguese audio into text and is optimized for input sampled at 16 kHz.
The model was fine-tuned on the train and validation splits of Mozilla Common Voice 6.1, and its performance has been evaluated on standard benchmarks without and with a language model (LM).
Benchmark Results
| Dataset | Metric | Without LM | With LM |
|---|---|---|---|
| Common Voice pt (test) | WER | 11.31% | 9.01% |
| Common Voice pt (test) | CER | 3.74% | 3.21% |
| Robust Speech Event Dev Data | WER | 42.1% | 36.92% |
| Robust Speech Event Dev Data | CER | 17.93% | 16.88% |
The model achieves a word error rate of 11.31% on Common Voice Portuguese test data without a language model, improving to 9.01% with an LM. Corresponding character error rates are 3.74% and 3.21%. On the more challenging Robust Speech Event Dev Data, the model achieves a WER of 42.1% without LM and 36.92% with LM.
This is a specialist model for Portuguese ASR, available as a hosted API on gigarouter. It is released under the Apache 2.0 license.
best for
- ·Transcribing Portuguese audio recordings such as calls and meetings
- ·Adding Portuguese speech-to-text to voice assistants, captioning, and transcription services
FAQ
The model expects speech input sampled at 16 kHz.
Yes, it can be used directly without a language model, but adding one improves accuracy (WER drops from 11.31% to 9.01% on Common Voice pt).
Apache 2.0.
Use the OpenAI-compatible endpoint with your API key; refer to the gigarouter documentation for exact endpoint and request format.
It is based on Wav2Vec2-XLSR-53 Large, a self-supervised speech representation model pre-trained on 53 languages.
We're benchmarking and onboarding Wav2Vec2 XLSR-53 Large Portuguese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.