MeloTTS Korean

myshell-ai/MeloTTS-Korean

published Feb 2024 · updated Feb 2024

MeloTTS Korean is a high-quality Korean text-to-speech model that generates natural-sounding speech from text, optimized for real-time CPU inference.

status

coming soon

API providers

downloads / mo

68.4K

license

mit

specs

Task	Text-to-Speech (TTS)
Architecture	VITS-based (VITS, VITS2, Bert-VITS2)
Language	Korean
License	MIT

about this model

myshell-ai/MeloTTS-Korean is a text-to-speech model that generates natural Korean speech as part of the MeloTTS multilingual family developed by MyShell.ai in collaboration with MIT and Tsinghua University. The model produces high-quality audio for Korean text and supports CPU real-time inference, making it suitable for low-latency applications without dedicated GPU hardware.

Multilingual family

The Korean model is one of several language-specific variants. The full MeloTTS family covers the following languages with distinct accents where applicable:

Language	Example Audio (Korean)
Korean	Listen
English (American, British, Indian, Australian, Default)	Example
Spanish	Example
French	Example
Chinese (mixed EN)	Example
Japanese	Example

Key capabilities

CPU real-time inference: the model processes audio fast enough for real-time use on CPU.
Mixed-language support: the Chinese variant can handle mixed Chinese and English text; the Korean model outputs natural Korean.
Speed control: playback speed can be adjusted via the API parameter.

Community adoption

The Korean model has received approximately 68,449 downloads in the past month and is used in 19 Hugging Face Spaces. Three quantized versions are available for reduced model size while maintaining quality.

Authorship and citation

The MeloTTS project is authored by Wenliang Zhao (Tsinghua University), Xumin Yu (Tsinghua University), and Zengyi Qin (MIT and MyShell). The recommended citation is:

@software{zhao2024melo, author={Zhao, Wenliang and Yu, Xumin and Qin, Zengyi}, title={MeloTTS: High-quality Multi-lingual Multi-speaker Text-to-Speech}, year={2024}}

best for

·Generating Korean speech for virtual assistants and chatbots
·Adding realistic voice to Korean-language videos or podcasts
·Real-time TTS applications on CPU-limited devices

FAQ

What languages does MeloTTS Korean support?

This specific model supports Korean. Other MeloTTS models cover English, Spanish, French, Chinese, and Japanese.

How fast is inference on CPU?

The model is fast enough for real-time CPU inference, as stated in the official documentation.

What is the license of MeloTTS Korean?

It is licensed under MIT, allowing both commercial and non-commercial use.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint by sending a request with your API key and text input; the response contains audio in WAV format.

not yet live

We're benchmarking and onboarding MeloTTS Korean as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →

XTTS-v2

9.3M dl/mo

Qwen3-TTS-12Hz-1.7B-CustomVoice

2M dl/mo

Qwen3-TTS-12Hz-0.6B-CustomVoice