Wav2Vec2 Large XLSR-53 Russian

jonatasgrosman/wav2vec2-large-xlsr-53-russian

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Russian is an automatic speech recognition (ASR) model for Russian, fine-tuned from Facebook's wav2vec2-large-xlsr-53 on Common Voice and CSS10 datasets.

status

coming soon

API providers

downloads / mo

2.9M

license

apache-2.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2 (Transformer-based)
License	Apache-2.0
Language	Russian

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-russian is an automatic speech recognition (ASR) model for Russian, fine-tuned from Facebook's wav2vec2-large-xlsr-53 architecture. It was trained on the train and validation splits of Common Voice 6.1 and the CSS10 single-speaker dataset, using the wav2vec2-sprint training pipeline. Speech input must be sampled at 16 kHz.

Benchmark performance

On the Common Voice Russian test set, the model achieves a word error rate (WER) of 13.3% and a character error rate (CER) of 2.88%. When used with a language model, WER improves to 9.57% and CER to 2.24%. On the Robust Speech Event Dev Data, the model achieves a WER of 40.22% and CER of 14.8% (with language model: WER 33.61%, CER 13.5%). The model is licensed under Apache-2.0 and has been contributed to the robust-speech-event and xlsr-fine-tuning-week community challenges.

Example transcriptions

The following table illustrates the model's output on samples from the Common Voice test set (without a language model):

Reference	Prediction
ОН РАБОТАТЬ, А ЕЕ НЕ УДЕРЖАТЬ НИКАК — БЕГАЕТ ЗА КЛЁШЕМ КАЖДОГО БУЛЬВАРНИКА.	ОН РАБОТАТЬ А ЕЕ НЕ УДЕРЖАТ НИКАК БЕГАЕТ ЗА КЛЕШОМ КАЖДОГО БУЛЬБАРНИКА
ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ, Я БУДУ СЧИТАТЬ, ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ.	ЕСЛИ НЕ БУДЕТ ВОЗРАЖЕНИЙ Я БУДУ СЧИТАТЬ ЧТО АССАМБЛЕЯ СОГЛАСНА С ЭТИМ ПРЕДЛОЖЕНИЕМ
ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ МИР С ИЗРАИЛЕМ, А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕННОСТИ.	ПАЛЕСТИНЦАМ НЕОБХОДИМО СНАЧАЛА УСТАНОВИТЬ С НИ МИР ФЕЗРЕЛЕМ А ЗАТЕМ ДОБИВАТЬСЯ ПРИЗНАНИЯ ГОСУДАРСТВЕНСКИ
У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО, ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРИБАВЛЯЮ.	У МЕНЯ БЫЛО ТАКОЕ ЧУВСТВО ЧТО ЧТО-ТО ТАКОЕ ОЧЕНЬ ВАЖНОЕ Я ПРЕДБАВЛЯЕТ
ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ.	ТОЛЬКО ВРЯД ЛИ ПОЙМЕТ

Hosted on gigarouter, this model is available as a managed, OpenAI-compatible API — no custom inference code required.

best for

·Transcribing Russian-language audio content (e.g., meetings, lectures, interviews)
·Building voice-enabled applications for Russian speakers
·Automating subtitles and captions for Russian videos

FAQ

What is the input format for this model?

Audio must be sampled at 16kHz.

What languages does it support?

Russian only.

What license is it under?

Apache-2.0.

How can I use it via gigarouter?

Use the OpenAI-compatible endpoint with an API key.

What is the expected performance?

Word Error Rate of 13.3% on Common Voice test set without a language model.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Russian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo