Wav2Vec2 XLS-R 300M Mixed
mesolitica/wav2vec2-xls-r-300m-mixed
published Jun 2022 · updated Jun 2022
Wav2Vec2 XLS-R 300M Mixed is an automatic speech recognition model that transcribes audio in Malay, Singlish, and Mandarin.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | wav2vec 2.0 |
| Parameters | 300M |
| License | Apache-2.0 |
about this model
wav2vec2-xls-r-300m-mixed is an automatic speech recognition (ASR) model fine-tuned from Facebook’s XLS-R 300M checkpoint on a mixed dataset of Malay, Singlish, and Mandarin speech. The base XLS-R model uses the wav2vec 2.0 architecture, contains 300 million parameters, and was pretrained on 436,000 hours of unlabeled speech across 128 languages (Apache-2.0 licensed). This fine-tuned variant is specialized for three languages and is hosted on gigarouter as a managed, OpenAI-compatible API.
The model was trained on a single RTX 3090 Ti 24GB VRAM and evaluated on held-out sets (Malay: 765 utterances, Singlish: 3,579, Mandarin: 614). A language model (huseinzol05/language-model-bahasa-manglish-combined) is available to further reduce error rates via LM-decoding.
Benchmark Results
| Evaluation Set | CER | WER | CER (with LM) | WER (with LM) |
|---|---|---|---|---|
| Mixed | 0.0481 | 0.1322 | 0.0412 | 0.0988 |
| Malay | 0.0516 | 0.1956 | 0.0392 | 0.1271 |
| Singlish | 0.0495 | 0.1276 | 0.0427 | 0.0968 |
| Mandarin | 0.0356 | 0.0799 | 0.0349 | 0.0754 |
All metrics are reported on the evaluation set from the Malaya Speech STT preparation. The language model offers consistent improvements across all languages.
best for
- ·Transcribing Malay conversational audio
- ·Transcribing Singlish (Singapore English mixed with Chinese dialects) speech
- ·Transcribing Mandarin Chinese speech
FAQ
It supports Malay, Singlish, and Mandarin Chinese.
Audio files in common formats (e.g., WAV, MP3) as a binary upload or base64-encoded string.
Yes, it can optionally use an external language model (LM) to improve accuracy; LM-enhanced metrics are provided for each language.
Use the OpenAI-compatible endpoint with your API key, specifying the model name wav2vec2-xls-r-300m-mixed.
Mixed evaluation: CER 4.8%, WER 13.2%; with LM: CER 4.1%, WER 9.9%. Breakdown per language is available in the model card.
We're benchmarking and onboarding Wav2Vec2 XLS-R 300M Mixed as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.