tasks / text-to-speech

Hosted text-to-speech models

37 models · 0 live as APIs · benchmarked & compared

Text-to-speech (TTS) models convert written text into natural-sounding speech, solving problems such as generating voiceovers for videos, enabling screen readers for accessibility, powering interactive voice response systems, and providing real-time narration in navigation or e-learning applications. In production, TTS is typically integrated via API calls from applications that need to stream audio on demand—common architectures use queuing for batch jobs or low-latency streaming for conversational agents.

Choosing among the 37 models being onboarded (including coqui/XTTS-v2, multiple Qwen/Qwen3-TTS-12Hz variants, OpenMOSS-Team/MOSS-TTS, k2-fsa/OmniVoice, SWivid/F5-TTS, ai4bharat/indic-parler-tts, and Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign) involves a trade-off between quality, speed, and model size. Larger models (e.g., 1.7B parameters) generally produce richer, more expressive speech at the cost of higher latency and compute; smaller models (e.g., 0.6B) offer faster inference and lower cost, suitable for high-throughput or latency-sensitive applications.

Using a hosted API eliminates the overhead of provisioning GPU infrastructure, managing model updates, and scaling for variable demand—making it more economical than self-hosting for most call volumes below tens of thousands of requests per minute.

compare

model	params	downloads/mo	price	status
coqui/XTTS-v2	-	9.3M	at launch	coming soon
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice	1916.7M	2M	~$0.0075 / 1k chars	coming soon
Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice	905.8M	1.2M	~$0.0075 / 1k chars	coming soon
OpenMOSS-Team/MOSS-TTS	8489.8M	911.8K	~$0.0075 / 1k chars	coming soon
k2-fsa/OmniVoice	612.6M	902.4K	~$0.0075 / 1k chars	coming soon
SWivid/F5-TTS	-	799.1K	at launch	coming soon
ai4bharat/indic-parler-tts	937.8M	764.6K	~$0.0075 / 1k chars	coming soon
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign	1916.7M	657.8K	~$0.0075 / 1k chars	coming soon
openbmb/VoxCPM2	2290M	640.8K	~$0.0075 / 1k chars	coming soon
microsoft/VibeVoice-Realtime-0.5B	1017.6M	638.1K	~$0.0075 / 1k chars	coming soon
onnx-community/Kokoro-82M-v1.0-ONNX	-	576.6K	at launch	coming soon
Qwen/Qwen3-TTS-12Hz-0.6B-Base	914.6M	571.2K	~$0.0075 / 1k chars	coming soon
fishaudio/s2-pro	4561.9M	434.2K	~$0.0075 / 1k chars	coming soon
sesame/csm-1b	1552.8M	308.2K	~$0.0075 / 1k chars	coming soon
microsoft/VibeVoice-1.5B	2704M	235.5K	~$0.0075 / 1k chars	coming soon
OpenMOSS-Team/MOSS-TTS-v1.5	8489.8M	205.8K	~$0.0075 / 1k chars	coming soon
bosonai/higgs-tts-2-3b-base	5771.3M	150.2K	~$0.0075 / 1k chars	coming soon
facebook/mms-tts-eng	36.3M	137K	~$0.0075 / 1k chars	coming soon
myshell-ai/MeloTTS-English	-	135.1K	at launch	coming soon
pnnbao-ump/VieNeu-TTS-v3-Turbo	130.9M	135K	~$0.0075 / 1k chars	coming soon
facebook/hf-seamless-m4t-medium	-	113.8K	at launch	coming soon
neuphonic/neutts-nano	228.7M	113.3K	~$0.0075 / 1k chars	coming soon
SWivid/E2-TTS	-	108.8K	at launch	coming soon
bosonai/higgs-tts-3-4b	4654.9M	108.6K	~$0.0075 / 1k chars	coming soon
Misha24-10/F5-TTS_RUSSIAN	-	89.9K	at launch	coming soon
canopylabs/3b-de-ft-research_release	3300.9M	86.4K	~$0.0075 / 1k chars	coming soon
OpenMOSS-Team/MOSS-TTS-Nano-100M	-	83.5K	at launch	coming soon
microsoft/speecht5_tts	-	80.8K	at launch	coming soon
moonshotai/Kimi-Audio-7B-Instruct	9766.3M	79K	~$0.0075 / 1k chars	coming soon
pnnbao-ump/VieNeu-TTS-v2	293.7M	78.5K	~$0.0075 / 1k chars	coming soon
mistralai/Voxtral-4B-TTS-2603	-	74.5K	at launch	coming soon
kenpath/svara-tts-v1	-	73.1K	at launch	coming soon
myshell-ai/MeloTTS-Spanish	-	71.3K	at launch	coming soon
myshell-ai/MeloTTS-Korean	-	68.4K	at launch	coming soon
Supertone/supertonic-3	-	65.8K	at launch	coming soon
multimodalart/higgs-audio-v3-tts-4b-transformers	4654.9M	62.9K	~$0.0075 / 1k chars	coming soon
sbintuitions/sarashina2.2-tts	809.9M	59.6K	~$0.0075 / 1k chars	coming soon

get a key + $25 free →docs