VieNeu-TTS v3 Turbo

pnnbao-ump/VieNeu-TTS-v3-Turbo

published Jun 2026 · updated Jun 2026

VieNeu-TTS v3 Turbo is a Vietnamese TTS model that generates 48 kHz high-fidelity speech with instant voice cloning, built-in multi-speaker default voices, and experimental emotion cues.

est. price

~$0.0075

· estimated, set at launch

API providers

downloads / mo

135K

license

apache-2.0

specs

Task	Text-to-Speech (TTS)
Architecture	Original design by Phạm Nguyễn Ngọc Bảo, trained from scratch; uses MOSS-Audio-Tokenizer-Nano codec
Parameters	Not specified
License	Apache License 2.0

about this model

VieNeu-TTS-v3-Turbo is a text-to-speech model that generates 48 kHz high-fidelity Vietnamese and bilingual English–Vietnamese speech with instant voice cloning, built-in multi-speaker default voices, and experimental emotion cues.

The model is an original architecture designed and trained from scratch by Phạm Nguyễn Ngọc Bảo on approximately 10,000 hours of English–Vietnamese speech. It uses the MOSS-Audio-Tokenizer-Nano neural audio codec and the sea-g2p grapheme-to-phoneme converter.

Key capabilities

48 kHz output — a substantial fidelity increase over the previous 24 kHz v2.
Built-in default voices — ten preset voices (male and female) addressed by dedicated speaker tokens; no reference clip required for these voices.
Instant voice cloning — clones a voice from a 3–5 second reference audio clip.
Emotion and non-verbal cues (experimental) — supports inline tags [cười] (laugh), [thở dài] (sigh), and [hắng giọng] (clear throat).
Batched generation — synthesises multiple chunks in one pass, batch size up to 32, including multi-speaker conversation mode.
Bilingual code-switching — seamless transitions between Vietnamese and English within a single utterance.

Default voices

Voice	Gender	Style
Ngọc Lan (default)	Female	Soft / gentle
Ngọc Linh	Female	Bright
Trúc Ly	Female	Youthful
Mỹ Duyên	Female	Smooth
Xuân Vĩnh	Male	Upbeat
Thái Sơn	Male	Firm
Gia Bảo	Male	Smooth
Đức Trí	Male	Clear
Trọng Hữu	Male	Knowledgeable
Bình An	Male	Even / calm

For any other voice, voice cloning with a short reference clip is used.

The model is distributed under the Apache License 2.0. A recommended temperature of 0.8 is suggested for stable results; higher values add expressiveness but may reduce stability.

best for

·Vietnamese text-to-speech with high-fidelity 48 kHz output
·Instant voice cloning from a 3–5 second audio clip
·Multi-speaker conversation generation with batched scripts
·Bilingual English–Vietnamese code-switching TTS

FAQ

What is the output audio quality of VieNeu-TTS v3 Turbo?

It generates 48 kHz high-fidelity speech, a significant upgrade from the 24 kHz of v2.

How do I clone a voice with this model?

Provide a 3–5 second reference audio clip via the ref_audio parameter in the SDK or API; no fine-tuning is needed.

What are the built-in default voices and how do I use them?

There are 10 built-in voices (e.g., Ngọc Lan, Xuân Vĩnh) that can be selected by name via the voice parameter without any reference audio.

What is the license for this model?

It is distributed under Apache License 2.0; attribution must be kept for the original project and this Hugging Face package.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, passing the model name and input text as parameters.

not yet live

We're benchmarking and onboarding VieNeu-TTS v3 Turbo as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →

XTTS-v2

9.3M dl/mo

Qwen3-TTS-12Hz-1.7B-CustomVoice

2M dl/mo

Qwen3-TTS-12Hz-0.6B-CustomVoice