skip to content
gigarouter gigarouter
models / text-to-speech · coming soon

F5-TTS Russian

Misha24-10/F5-TTS_RUSSIAN

published May 2025 · updated Jan 2026

F5-TTS Russian is a text-to-speech model fine-tuned for Russian speech synthesis with accent control.

status
coming soon
API providers
0
downloads / mo
89.9K
license
cc-by-nc-4.0

specs

TaskText-to-Speech (TTS)
ArchitectureDiffusion Transformer (DiT) with ConvNeXt V2 text encoder and Flow Matching
LicenseCC-BY-NC-4.0
Training Data5000+ hours of Russian and English speech

about this model

Misha24-10/F5-TTS_RUSSIAN is a text-to-speech model fine-tuned from F5-TTS (Diffusion Transformer with ConvNeXt V2 text encoder and Flow Matching) for Russian speech synthesis. It was trained on over 5,000 hours of combined Russian and English speech data.

Key strengths

The model supports explicit stress marking: place a + before the stressed vowel (e.g., молок+о). For automatic stress placement, the RUAccent library can be used. Three fine-tuned checkpoints are available: the base version, an accent-tuned version with 100% stress annotation in training data, and an additional fine-tuned version (v2, +16 epochs) with data filtering that removes ~5% of samples with artifacts.

The underlying F5-TTS architecture uses "Sway Sampling" for inference-step optimization, achieving competitive performance in naturalness and prosody. The model was pre-trained on 95K hours of English and Chinese, then fine-tuned on Russian data comprising:

SourceHours
Custom Russian dataset4,000
Common Voice RU239
Common Voice EN240
Sova (RuDevices + RuAudiobooks)400
LibriHeavy (partial)180

A demo comparing F5-TTS_RUSSIAN against XTTS-v2 and FishSpeech is available at the project page. Note: the original F5-TTS is licensed under cc-by-nc-4.0; the fine-tuned model card does not specify a separate license.

best for

FAQ

What is F5-TTS Russian?

It is a fine-tuned version of the F5-TTS model, adapted for Russian speech synthesis using over 5000 hours of Russian and English audio data.

How do I control accents in generated speech?

Place a + before the stressed vowel in a word (e.g., молок+о produces молокó). You can also use the RUAccent library for automatic accent placement.

What license does this model use?

The original F5-TTS is licensed under CC-BY-NC-4.0 (non-commercial). The Russian fine-tune inherits this license; check the model card for any updates.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key; send a POST request with input text and any optional parameters like accent markup.

What training data was used?

The model was trained on a custom 4000-hour Russian dataset, plus Common Voice (RU and EN), Sova audiobooks, and partial LibriHeavy, totalling over 5000 hours.

not yet live

We're benchmarking and onboarding F5-TTS Russian as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →