Romanian Wav2Vec2
gigant/romanian-wav2vec2
published Mar 2022 · updated Sep 2023
Romanian Wav2Vec2 is a automatic speech recognition (ASR) model for Romanian, fine-tuned from Wav2Vec2-XLS-R-300M with a 5-gram language model.
specs
| Task | Automatic Speech Recognition (ASR) |
| Architecture | Wav2Vec2-XLS-R-300M with CTC head and 5-gram language model (pyctcdecode + kenlm) |
| Parameters | ~300 million |
about this model
gigant/romanian-wav2vec2 is an automatic speech recognition (ASR) model fine-tuned from facebook/wav2vec2-xls-r-300m on the Common Voice 8.0 Romanian subset and additional data from the Romanian Speech Synthesis dataset. The architecture uses a CTC head with a 5-gram language model (built with pyctcdecode and kenlm) trained on the Romanian Corpora Parliament dataset. Audio input must be sampled at 16 kHz; output text is lowercased without punctuation.
The model achieved TOP‑1 on Romanian speech recognition during HuggingFace’s Robust Speech Challenge (Speech Bench; Leaderboard). Without the 5‑gram LM optimization, evaluation on the Common Voice 8.0 Romanian test set yields:
- Loss: 0.1553
- Word error rate (WER): 0.1174
- Character error rate (CER): 0.0294
Training hyperparameters: learning rate 0.003, batch size 48 (gradient accumulation 3), Adam optimizer, linear scheduler with 500 warmup steps, 50 epochs, mixed precision (AMP).
You can test the model online via the Romanian Speech Recognition Space.
best for
- ·Transcribing Romanian audio recordings
- ·Building Romanian voice assistants
- ·Subtitling Romanian media content
- ·Automating Romanian call center transcription
FAQ
It is best for Romanian speech recognition, achieving top-1 performance on the Hugging Face Robust Speech Challenge. It outputs lowercase text without punctuation.
Audio clips sampled at 16kHz. The model predicts text directly from the audio waveform.
Use the OpenAI-compatible endpoint with your API key. Send audio bytes or a URL to the /v1/audio/transcriptions endpoint.
The base model has ~300 million parameters. Speed depends on hardware; it is suitable for both real-time and batch processing.
Yes, it includes a 5-gram language model trained on Romanian parliamentary data, which boosts accuracy (WER 0.1174 on Common Voice test set).
We're benchmarking and onboarding Romanian Wav2Vec2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.