VieNeu TTS V2
pnnbao-ump/VieNeu-TTS-v2
published May 2026 · updated May 2026
VieNeu TTS V2 is a TTS model that synthesizes natural Vietnamese and bilingual English-Vietnamese speech with instant voice cloning from 3-5 seconds of audio.
specs
| Task | Text-to-Speech |
| Parameters | 0.3 billion |
| Training Data | 10,000+ hours English-Vietnamese |
| Features | Instant voice cloning, multi-speaker podcast mode, bilingual code-switching |
| Available Formats | PyTorch (GGUF Q4 for CPU) |
about this model
VieNeu-TTS-v2 is a Vietnamese text-to-speech model that generates natural bilingual speech with instant voice cloning capabilities, supporting multi-speaker conversations and seamless English-Vietnamese code-switching.
Capabilities
- Trained on 10,000+ hours of bilingual English-Vietnamese data for natural prosody.
- Zero-shot voice cloning from 3–5 seconds of reference audio.
- Multi-speaker dialogue mode with automatic character detection and emotional nuance.
- High-fidelity pronunciation of mixed English-Vietnamese text via the
sea-g2pphonemizer. - Preset voices across Northern and Southern accents, both male and female.
Reference Voices
| Name | Gender | Accent |
|---|---|---|
| Bình | Male | North |
| Tuyên | Male | North |
| Nguyên | Male | South |
| Hương | Female | North |
| Ngọc | Female | North |
| Đoan | Female | South |
VieNeu-TTS-v2 is developed by Phạm Nguyễn Ngọc Bảo. The model is hosted on gigarouter as a managed API compatible with OpenAI’s format, enabling developers to integrate high-quality Vietnamese TTS without local GPU infrastructure.
best for
- ·Podcast-style multi-speaker dialogue with emotional nuance
- ·Real-time voice cloning using a 3-5 second audio sample
- ·Bilingual Vietnamese-English text-to-speech applications
FAQ
Accepts text in Vietnamese or English, optionally a reference audio file for voice cloning, and an emotion mode (natural or storytelling).
Outputs WAV audio files.
Use the gigarouter OpenAI-compatible endpoint with an API key.
Yes, it supports seamless English-Vietnamese code-switching in a single utterance.
The PyTorch model is approximately 180 MB with 0.3 billion parameters. A GGUF Q4 quantized version is available for CPU deployment.
We're benchmarking and onboarding VieNeu TTS V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.