models / speech-to-text · coming soon

Wav2Vec2 Large XLSR-53 Polish

jonatasgrosman/wav2vec2-large-xlsr-53-polish

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Polish is an automatic speech recognition model fine-tuned for Polish speech transcription.

status

coming soon

API providers

downloads / mo

4.7M

license

apache-2.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2 Large XLSR-53 (based on facebook/wav2vec2-large-xlsr-53)
Language	Polish
Training Data	Common Voice 6.1
Input Sample Rate	16kHz

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-polish is an automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-large-xlsr-53 on the Polish language. It was trained on the train and validation splits of the Common Voice 6.1 dataset. The model expects audio input sampled at 16 kHz.

The model is hosted by gigarouter as a managed API, enabling developers to transcribe Polish speech with a single API call. No local setup, model loading, or hardware management is required.

Example Transcriptions

The following table shows sample outputs from the model on Common Voice test data (ground-truth references vs. predictions):

Reference	Prediction
CZY DRZWI BYŁY ZAMKNIĘTE?	PRZY DRZWI BYŁY ZAMKNIĘTE
GDZIEŻ TU POWÓD DO WYRZUTÓW?	WGDZIEŻ TO POM DO WYRYDÓ
O TEM JEDNAK NIE BYŁO MOWY.	O TEM JEDNAK NIE BYŁO MOWY
LUBIĘ GO.	LUBIĄ GO
— TO MI NIE POMAGA.	TO MNIE NIE POMAGA
WCIĄŻ LUDZIE WYSIADAJĄ PRZED ZAMKIEM, Z MIASTA, Z PRAGI.	WCIĄŻ LUDZIE WYSIADAJĄ PRZED ZAMKIEM Z MIASTA Z PRAGI
ALE ON WCALE INACZEJ NIE MYŚLAŁ.	ONY MONITCENIE PONACZUŁA NA MASU
A WY, CO TAK STOICIE?	A WY CO TAK STOICIE
A TEN PRZYRZĄD DO CZEGO SŁUŻY?	A TEN PRZYRZĄD DO CZEGO SŁUŻY
NA JUTRZEJSZYM KOLOKWIUM BĘDZIE PIĘĆ PYTAŃ OTWARTYCH I TEST WIELOKROTNEGO WYBORU.	NAJUTRZEJSZYM KOLOKWIUM BĘDZIE PIĘĆ PYTAŃ OTWARTYCH I TEST WIELOKROTNEGO WYBORU

The model was originally fine-tuned using the wav2vec2-sprint training script and is available for evaluation via the Hugging Face model hub. As a hosted endpoint on gigarouter, it provides an OpenAI-compatible API for straightforward integration into production pipelines.

best for

·Transcribing Polish audio recordings, podcasts, and meetings
·Enabling voice commands in Polish-language applications
·Generating subtitles for Polish videos
·Speech-to-text for Polish customer service calls

FAQ

What languages does this model support?

It is specifically trained for Polish speech recognition only.

What audio format and sample rate are required?

The model expects audio sampled at 16kHz. Any format convertible to a 16kHz waveform works.

How can I use this model via API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Send audio data in the request to receive transcriptions.

Is this model free to use?

The model is open-source, but usage on gigarouter may have associated costs. Check gigarouter's pricing.

What is the model's architecture?

It is based on Wav2Vec2 Large XLSR-53, a transformer model pre-trained on multilingual speech and fine-tuned on Polish Common Voice 6.1.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Polish as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo

wav2vec2-indonesian-javanese-sundanese

4.1M dl/mo