F5-TTS Russian
Misha24-10/F5-TTS_RUSSIAN
published May 2025 · updated Jan 2026
F5-TTS Russian is a text-to-speech model fine-tuned for Russian speech synthesis with accent control.
specs
| Task | Text-to-Speech (TTS) |
| Architecture | Diffusion Transformer (DiT) with ConvNeXt V2 text encoder and Flow Matching |
| License | CC-BY-NC-4.0 |
| Training Data | 5000+ hours of Russian and English speech |
about this model
Misha24-10/F5-TTS_RUSSIAN is a text-to-speech model fine-tuned from F5-TTS (Diffusion Transformer with ConvNeXt V2 text encoder and Flow Matching) for Russian speech synthesis. It was trained on over 5,000 hours of combined Russian and English speech data.
Key strengths
The model supports explicit stress marking: place a + before the stressed vowel (e.g., молок+о). For automatic stress placement, the RUAccent library can be used. Three fine-tuned checkpoints are available: the base version, an accent-tuned version with 100% stress annotation in training data, and an additional fine-tuned version (v2, +16 epochs) with data filtering that removes ~5% of samples with artifacts.
The underlying F5-TTS architecture uses "Sway Sampling" for inference-step optimization, achieving competitive performance in naturalness and prosody. The model was pre-trained on 95K hours of English and Chinese, then fine-tuned on Russian data comprising:
| Source | Hours |
|---|---|
| Custom Russian dataset | 4,000 |
| Common Voice RU | 239 |
| Common Voice EN | 240 |
| Sova (RuDevices + RuAudiobooks) | 400 |
| LibriHeavy (partial) | 180 |
A demo comparing F5-TTS_RUSSIAN against XTTS-v2 and FishSpeech is available at the project page. Note: the original F5-TTS is licensed under cc-by-nc-4.0; the fine-tuned model card does not specify a separate license.
best for
- ·Russian text-to-speech generation with natural prosody
- ·Speech synthesis with controllable word stress accents
- ·Voice applications requiring fluent Russian speech output
FAQ
It is a fine-tuned version of the F5-TTS model, adapted for Russian speech synthesis using over 5000 hours of Russian and English audio data.
Place a + before the stressed vowel in a word (e.g., молок+о produces молокó). You can also use the RUAccent library for automatic accent placement.
The original F5-TTS is licensed under CC-BY-NC-4.0 (non-commercial). The Russian fine-tune inherits this license; check the model card for any updates.
Use the OpenAI-compatible endpoint with your gigarouter API key; send a POST request with input text and any optional parameters like accent markup.
The model was trained on a custom 4000-hour Russian dataset, plus Common Voice (RU and EN), Sova audiobooks, and partial LibriHeavy, totalling over 5000 hours.
We're benchmarking and onboarding F5-TTS Russian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.