skip to content
gigarouter gigarouter
models / speech-to-text · coming soon

Wav2Vec2 Large XLSR-53 Persian

jonatasgrosman/wav2vec2-large-xlsr-53-persian

published Mar 2022 · updated Dec 2022

Wav2Vec2 Large XLSR-53 Persian is an automatic speech recognition model that transcribes Persian speech to text.

status
coming soon
API providers
0
downloads / mo
2.5M
license
apache-2.0

specs

TaskAutomatic Speech Recognition (ASR)
ArchitectureWav2Vec2 Large XLSR-53
LicenseApache 2.0
DatasetCommon Voice 6.1 Persian (train and validation splits)

about this model

jonatasgrosman/wav2vec2-large-xlsr-53-persian is an automatic speech recognition (ASR) model fine-tuned from the facebook/wav2vec2-large-xlsr-53 checkpoint on Persian speech data. It was trained on the train and validation splits of the Common Voice 6.1 dataset. The model accepts speech input sampled at 16 kHz and transcribes it into Persian text.

Key strengths

On the Common Voice Persian test set, the model achieves a Word Error Rate (WER) of 30.12% and a Character Error Rate (CER) of 7.37%. This performance surpasses other publicly available Persian ASR models, as shown in the benchmark below:

ModelWERCER
jonatasgrosman/wav2vec2-large-xlsr-53-persian30.12%7.37%
m3hrdadfi/wav2vec2-large-xlsr-persian-v233.85%8.79%
m3hrdadfi/wav2vec2-large-xlsr-persian34.37%8.98%

This model is part of a 17-language multilingual fine-tuning suite by the same author, covering Arabic, Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Persian, Polish, Portuguese, Russian, and Spanish. It is released under the Apache 2.0 license and has a digital object identifier (DOI): 10.57967/hf/3576.

best for

FAQ

What input format does the model expect?

The model expects audio sampled at 16 kHz, provided as a waveform array or audio file.

What is the reported Word Error Rate (WER) on Common Voice Persian test data?

The model achieves a WER of 30.12% and a Character Error Rate (CER) of 7.37%.

What license is the model released under?

It is released under the Apache 2.0 license.

not yet live

We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Persian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →