MeloTTS Korean
myshell-ai/MeloTTS-Korean
published Feb 2024 · updated Feb 2024
MeloTTS Korean is a high-quality Korean text-to-speech model that generates natural-sounding speech from text, optimized for real-time CPU inference.
specs
| Task | Text-to-Speech (TTS) |
| Architecture | VITS-based (VITS, VITS2, Bert-VITS2) |
| Language | Korean |
| License | MIT |
about this model
myshell-ai/MeloTTS-Korean is a text-to-speech model that generates natural Korean speech as part of the MeloTTS multilingual family developed by MyShell.ai in collaboration with MIT and Tsinghua University. The model produces high-quality audio for Korean text and supports CPU real-time inference, making it suitable for low-latency applications without dedicated GPU hardware.
Multilingual family
The Korean model is one of several language-specific variants. The full MeloTTS family covers the following languages with distinct accents where applicable:
| Language | Example Audio (Korean) |
|---|---|
| Korean | Listen |
| English (American, British, Indian, Australian, Default) | Example |
| Spanish | Example |
| French | Example |
| Chinese (mixed EN) | Example |
| Japanese | Example |
Key capabilities
- CPU real-time inference: the model processes audio fast enough for real-time use on CPU.
- Mixed-language support: the Chinese variant can handle mixed Chinese and English text; the Korean model outputs natural Korean.
- Speed control: playback speed can be adjusted via the API parameter.
Community adoption
The Korean model has received approximately 68,449 downloads in the past month and is used in 19 Hugging Face Spaces. Three quantized versions are available for reduced model size while maintaining quality.
Authorship and citation
The MeloTTS project is authored by Wenliang Zhao (Tsinghua University), Xumin Yu (Tsinghua University), and Zengyi Qin (MIT and MyShell). The recommended citation is:
@software{zhao2024melo, author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi}, title={MeloTTS: High-quality Multi-lingual Multi-speaker Text-to-Speech}, year={2024}}
best for
- ·Generating Korean speech for virtual assistants and chatbots
- ·Adding realistic voice to Korean-language videos or podcasts
- ·Real-time TTS applications on CPU-limited devices
FAQ
This specific model supports Korean. Other MeloTTS models cover English, Spanish, French, Chinese, and Japanese.
The model is fast enough for real-time CPU inference, as stated in the official documentation.
It is licensed under MIT, allowing both commercial and non-commercial use.
Use the OpenAI-compatible endpoint by sending a request with your API key and text input; the response contains audio in WAV format.
We're benchmarking and onboarding MeloTTS Korean as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.