skip to content
gigarouter gigarouter
models / text-to-speech · coming soon

VieNeu-TTS v3 Turbo

pnnbao-ump/VieNeu-TTS-v3-Turbo

published Jun 2026 · updated Jun 2026

VieNeu-TTS v3 Turbo is a Vietnamese TTS model that generates 48 kHz high-fidelity speech with instant voice cloning, built-in multi-speaker default voices, and experimental emotion cues.

est. price
~$0.0075
· estimated, set at launch
API providers
0
downloads / mo
135K
license
apache-2.0

specs

TaskText-to-Speech (TTS)
ArchitectureOriginal design by Phạm Nguyễn Ngọc Bảo, trained from scratch; uses MOSS-Audio-Tokenizer-Nano codec
ParametersNot specified
LicenseApache License 2.0

about this model

VieNeu-TTS-v3-Turbo is a text-to-speech model that generates 48 kHz high-fidelity Vietnamese and bilingual English–Vietnamese speech with instant voice cloning, built-in multi-speaker default voices, and experimental emotion cues.

The model is an original architecture designed and trained from scratch by Phạm Nguyễn Ngọc Bảo on approximately 10,000 hours of English–Vietnamese speech. It uses the MOSS-Audio-Tokenizer-Nano neural audio codec and the sea-g2p grapheme-to-phoneme converter.

Key capabilities

  • 48 kHz output — a substantial fidelity increase over the previous 24 kHz v2.
  • Built-in default voices — ten preset voices (male and female) addressed by dedicated speaker tokens; no reference clip required for these voices.
  • Instant voice cloning — clones a voice from a 3–5 second reference audio clip.
  • Emotion and non-verbal cues (experimental) — supports inline tags [cười] (laugh), [thở dài] (sigh), and [hắng giọng] (clear throat).
  • Batched generation — synthesises multiple chunks in one pass, batch size up to 32, including multi-speaker conversation mode.
  • Bilingual code-switching — seamless transitions between Vietnamese and English within a single utterance.

Default voices

VoiceGenderStyle
Ngọc Lan (default)FemaleSoft / gentle
Ngọc LinhFemaleBright
Trúc LyFemaleYouthful
Mỹ DuyênFemaleSmooth
Xuân VĩnhMaleUpbeat
Thái SơnMaleFirm
Gia BảoMaleSmooth
Đức TríMaleClear
Trọng HữuMaleKnowledgeable
Bình AnMaleEven / calm

For any other voice, voice cloning with a short reference clip is used.

The model is distributed under the Apache License 2.0. A recommended temperature of 0.8 is suggested for stable results; higher values add expressiveness but may reduce stability.

best for

FAQ

What is the output audio quality of VieNeu-TTS v3 Turbo?

It generates 48 kHz high-fidelity speech, a significant upgrade from the 24 kHz of v2.

How do I clone a voice with this model?

Provide a 3–5 second reference audio clip via the ref_audio parameter in the SDK or API; no fine-tuning is needed.

What are the built-in default voices and how do I use them?

There are 10 built-in voices (e.g., Ngọc Lan, Xuân Vĩnh) that can be selected by name via the voice parameter without any reference audio.

What is the license for this model?

It is distributed under Apache License 2.0; attribution must be kept for the original project and this Hugging Face package.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, passing the model name and input text as parameters.

not yet live

We're benchmarking and onboarding VieNeu-TTS v3 Turbo as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →