Wav2Vec2 Large XLSR-53 Polish
jonatasgrosman/wav2vec2-large-xlsr-53-polish
published Mar 2022 · updated Dec 2022
Wav2Vec2 Large XLSR-53 Polish is an automatic speech recognition model fine-tuned for Polish speech transcription.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2 Large XLSR-53 (based on facebook/wav2vec2-large-xlsr-53) |
| Language | Polish |
| Training Data | Common Voice 6.1 |
| Input Sample Rate | 16kHz |
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-polish is an automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53 on the Polish language. It was trained on the train and validation splits of the Common Voice 6.1 dataset. The model expects audio input sampled at 16 kHz.
The model is hosted by gigarouter as a managed API, enabling developers to transcribe Polish speech with a single API call. No local setup, model loading, or hardware management is required.
Example Transcriptions
The following table shows sample outputs from the model on Common Voice test data (ground-truth references vs. predictions):
| Reference | Prediction |
|---|---|
| CZY DRZWI BYŁY ZAMKNIĘTE? | PRZY DRZWI BYŁY ZAMKNIĘTE |
| GDZIEŻ TU POWÓD DO WYRZUTÓW? | WGDZIEŻ TO POM DO WYRYDÓ |
| O TEM JEDNAK NIE BYŁO MOWY. | O TEM JEDNAK NIE BYŁO MOWY |
| LUBIĘ GO. | LUBIĄ GO |
| — TO MI NIE POMAGA. | TO MNIE NIE POMAGA |
| WCIĄŻ LUDZIE WYSIADAJĄ PRZED ZAMKIEM, Z MIASTA, Z PRAGI. | WCIĄŻ LUDZIE WYSIADAJĄ PRZED ZAMKIEM Z MIASTA Z PRAGI |
| ALE ON WCALE INACZEJ NIE MYŚLAŁ. | ONY MONITCENIE PONACZUŁA NA MASU |
| A WY, CO TAK STOICIE? | A WY CO TAK STOICIE |
| A TEN PRZYRZĄD DO CZEGO SŁUŻY? | A TEN PRZYRZĄD DO CZEGO SŁUŻY |
| NA JUTRZEJSZYM KOLOKWIUM BĘDZIE PIĘĆ PYTAŃ OTWARTYCH I TEST WIELOKROTNEGO WYBORU. | NAJUTRZEJSZYM KOLOKWIUM BĘDZIE PIĘĆ PYTAŃ OTWARTYCH I TEST WIELOKROTNEGO WYBORU |
The model was originally fine-tuned using the wav2vec2-sprint training script and is available for evaluation via the Hugging Face model hub. As a hosted endpoint on gigarouter, it provides an OpenAI-compatible API for straightforward integration into production pipelines.
best for
- ·Transcribing Polish audio recordings, podcasts, and meetings
- ·Enabling voice commands in Polish-language applications
- ·Generating subtitles for Polish videos
- ·Speech-to-text for Polish customer service calls
FAQ
It is specifically trained for Polish speech recognition only.
The model expects audio sampled at 16kHz. Any format convertible to a 16kHz waveform works.
Use the gigarouter OpenAI-compatible endpoint with your API key. Send audio data in the request to receive transcriptions.
The model is open-source, but usage on gigarouter may have associated costs. Check gigarouter's pricing.
It is based on Wav2Vec2 Large XLSR-53, a transformer model pre-trained on multilingual speech and fine-tuned on Polish Common Voice 6.1.
We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Polish as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.