models / speech-to-text · coming soon

Wav2Vec2 Large XLSR-53 Persian

jonatasgrosman/wav2vec2-large-xlsr-53-persian

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Persian is an automatic speech recognition model that transcribes Persian speech to text.

status

coming soon

API providers

downloads / mo

2.5M

license

apache-2.0

specs

Task	Automatic Speech Recognition (ASR)
Architecture	Wav2Vec2 Large XLSR-53
License	Apache 2.0
Dataset	Common Voice 6.1 Persian (train and validation splits)

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-persian is an automatic speech recognition (ASR) model fine-tuned from the facebook/wav2vec2-large-xlsr-53 checkpoint on Persian speech data. It was trained on the train and validation splits of the Common Voice 6.1 dataset. The model accepts speech input sampled at 16 kHz and transcribes it into Persian text.

Key strengths

On the Common Voice Persian test set, the model achieves a Word Error Rate (WER) of 30.12% and a Character Error Rate (CER) of 7.37%. This performance surpasses other publicly available Persian ASR models, as shown in the benchmark below:

Model	WER	CER
jonatasgrosman/wav2vec2-large-xlsr-53-persian	30.12%	7.37%
m3hrdadfi/wav2vec2-large-xlsr-persian-v2	33.85%	8.79%
m3hrdadfi/wav2vec2-large-xlsr-persian	34.37%	8.98%

This model is part of a 17-language multilingual fine-tuning suite by the same author, covering Arabic, Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Persian, Polish, Portuguese, Russian, and Spanish. It is released under the Apache 2.0 license and has a digital object identifier (DOI): 10.57967/hf/3576.

best for

·Transcribing Persian audio recordings such as interviews and lectures
·Voice-to-text for Persian-language applications
·Enabling search in Persian audio archives

FAQ

What input format does the model expect?

The model expects audio sampled at 16 kHz, provided as a waveform array or audio file.

What is the reported Word Error Rate (WER) on Common Voice Persian test data?

The model achieves a WER of 30.12% and a Character Error Rate (CER) of 7.37%.

What license is the model released under?

It is released under the Apache 2.0 license.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Persian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo