VieNeu TTS V2

pnnbao-ump/VieNeu-TTS-v2

published May 2026 · updated May 2026

VieNeu TTS V2 is a TTS model that synthesizes natural Vietnamese and bilingual English-Vietnamese speech with instant voice cloning from 3-5 seconds of audio.

est. price

~$0.0075

· estimated, set at launch

API providers

downloads / mo

78.5K

license

apache-2.0

specs

Task	Text-to-Speech
Parameters	0.3 billion
Training Data	10,000+ hours English-Vietnamese
Features	Instant voice cloning, multi-speaker podcast mode, bilingual code-switching
Available Formats	PyTorch (GGUF Q4 for CPU)

about this model

VieNeu-TTS-v2 is a Vietnamese text-to-speech model that generates natural bilingual speech with instant voice cloning capabilities, supporting multi-speaker conversations and seamless English-Vietnamese code-switching.

Capabilities

Trained on 10,000+ hours of bilingual English-Vietnamese data for natural prosody.
Zero-shot voice cloning from 3–5 seconds of reference audio.
Multi-speaker dialogue mode with automatic character detection and emotional nuance.
High-fidelity pronunciation of mixed English-Vietnamese text via the sea-g2p phonemizer.
Preset voices across Northern and Southern accents, both male and female.

Reference Voices

Name	Gender	Accent
Bình	Male	North
Tuyên	Male	North
Nguyên	Male	South
Hương	Female	North
Ngọc	Female	North
Đoan	Female	South

VieNeu-TTS-v2 is developed by Phạm Nguyễn Ngọc Bảo. The model is hosted on gigarouter as a managed API compatible with OpenAI’s format, enabling developers to integrate high-quality Vietnamese TTS without local GPU infrastructure.

best for

·Podcast-style multi-speaker dialogue with emotional nuance
·Real-time voice cloning using a 3-5 second audio sample
·Bilingual Vietnamese-English text-to-speech applications

FAQ

What input does the model accept?

Accepts text in Vietnamese or English, optionally a reference audio file for voice cloning, and an emotion mode (natural or storytelling).

What audio format does it output?

Outputs WAV audio files.

How can I call this model via API?

Use the gigarouter OpenAI-compatible endpoint with an API key.

Does it support multilingual speech?

Yes, it supports seamless English-Vietnamese code-switching in a single utterance.

How large is the model?

The PyTorch model is approximately 180 MB with 0.3 billion parameters. A GGUF Q4 quantized version is available for CPU deployment.

not yet live

We're benchmarking and onboarding VieNeu TTS V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →

XTTS-v2

9.3M dl/mo

Qwen3-TTS-12Hz-1.7B-CustomVoice

2M dl/mo

Qwen3-TTS-12Hz-0.6B-CustomVoice