Wav2Vec2 Large XLSR-53 Japanese
jonatasgrosman/wav2vec2-large-xlsr-53-japanese
published Mar 2022 · updated Dec 2022
Wav2Vec2 Large XLSR-53 Japanese is a speech recognition model fine-tuned from Facebook's wav2vec2-large-xlsr-53 on Japanese datasets including Common Voice 6.1, CSS10, and JSUT.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2-Large-XLSR-53 |
| License | Apache 2.0 |
| Language | Japanese |
about this model
jonatasgrosman/wav2vec2-large-xlsr-53-japanese is an automatic speech recognition (ASR) model that transcribes Japanese speech into text. It is fine-tuned from facebook/wav2vec2-large-xlsr-53 on the train and validation splits of Common Voice 6.1, CSS10, and JSUT datasets. The model requires audio input sampled at 16 kHz. It is released under the Apache-2.0 license and has a registered DOI: 10.57967/hf/3568.
Evaluation Results
The model was evaluated on the Japanese test split of Common Voice 6.1 (evaluation date: 2021-05-10). Word Error Rate (WER) and Character Error Rate (CER) are reported below alongside results for other publicly available Japanese XLSR-53 fine-tuned models.
| Model | WER | CER |
|---|---|---|
| jonatasgrosman/wav2vec2-large-xlsr-53-japanese | 81.80% | 20.16% |
| vumichien/wav2vec2-large-xlsr-japanese | 1108.86% | 23.40% |
| qqhann/w2v_hf_jsut_xlsr53 | 1012.18% | 70.77% |
Additional Context
This model is part of a family of 15 XLSR-53 fine-tuned models covering Arabic, Chinese, Dutch, Finnish, French, German, Greek, Hungarian, Italian, Japanese, Persian, Polish, Portuguese, Russian, and Spanish. The training and evaluation scripts used are available in the wav2vec2-sprint repository (now deprecated in favor of the HuggingSound library). As a hosted API on gigarouter, the model is ready for production use without requiring local setup or dependency management.
best for
- ·Transcribing Japanese audio recordings
- ·Subtitling Japanese videos
- ·Building voice interfaces for Japanese applications
FAQ
The model expects speech input sampled at 16 kHz, mono audio.
Use the gigarouter OpenAI-compatible endpoint with your API key, passing audio data in the request.
On the Common Voice Japanese test set, it reports a WER of 81.80% and CER of 20.16% (evaluated May 2021).
It is released under the Apache 2.0 license.
No, this model is fine-tuned exclusively for Japanese speech recognition.
We're benchmarking and onboarding Wav2Vec2 Large XLSR-53 Japanese as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.