Hosted text-to-speech models
37 models · 0 live as APIs · benchmarked & compared
Text-to-speech (TTS) models convert written text into natural-sounding speech, solving problems such as generating voiceovers for videos, enabling screen readers for accessibility, powering interactive voice response systems, and providing real-time narration in navigation or e-learning applications. In production, TTS is typically integrated via API calls from applications that need to stream audio on demand—common architectures use queuing for batch jobs or low-latency streaming for conversational agents.
Choosing among the 37 models being onboarded (including coqui/XTTS-v2, multiple Qwen/Qwen3-TTS-12Hz variants, OpenMOSS-Team/MOSS-TTS, k2-fsa/OmniVoice, SWivid/F5-TTS, ai4bharat/indic-parler-tts, and Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign) involves a trade-off between quality, speed, and model size. Larger models (e.g., 1.7B parameters) generally produce richer, more expressive speech at the cost of higher latency and compute; smaller models (e.g., 0.6B) offer faster inference and lower cost, suitable for high-throughput or latency-sensitive applications.
Using a hosted API eliminates the overhead of provisioning GPU infrastructure, managing model updates, and scaling for variable demand—making it more economical than self-hosting for most call volumes below tens of thousands of requests per minute.
compare
| model | params | downloads/mo | price | status |
|---|---|---|---|---|
| coqui/XTTS-v2 | - | 9.3M | at launch | coming soon |
| Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice | 1916.7M | 2M | ~$0.0075 / 1k chars | coming soon |
| Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice | 905.8M | 1.2M | ~$0.0075 / 1k chars | coming soon |
| OpenMOSS-Team/MOSS-TTS | 8489.8M | 911.8K | ~$0.0075 / 1k chars | coming soon |
| k2-fsa/OmniVoice | 612.6M | 902.4K | ~$0.0075 / 1k chars | coming soon |
| SWivid/F5-TTS | - | 799.1K | at launch | coming soon |
| ai4bharat/indic-parler-tts | 937.8M | 764.6K | ~$0.0075 / 1k chars | coming soon |
| Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign | 1916.7M | 657.8K | ~$0.0075 / 1k chars | coming soon |
| openbmb/VoxCPM2 | 2290M | 640.8K | ~$0.0075 / 1k chars | coming soon |
| microsoft/VibeVoice-Realtime-0.5B | 1017.6M | 638.1K | ~$0.0075 / 1k chars | coming soon |
| onnx-community/Kokoro-82M-v1.0-ONNX | - | 576.6K | at launch | coming soon |
| Qwen/Qwen3-TTS-12Hz-0.6B-Base | 914.6M | 571.2K | ~$0.0075 / 1k chars | coming soon |
| fishaudio/s2-pro | 4561.9M | 434.2K | ~$0.0075 / 1k chars | coming soon |
| sesame/csm-1b | 1552.8M | 308.2K | ~$0.0075 / 1k chars | coming soon |
| microsoft/VibeVoice-1.5B | 2704M | 235.5K | ~$0.0075 / 1k chars | coming soon |
| OpenMOSS-Team/MOSS-TTS-v1.5 | 8489.8M | 205.8K | ~$0.0075 / 1k chars | coming soon |
| bosonai/higgs-tts-2-3b-base | 5771.3M | 150.2K | ~$0.0075 / 1k chars | coming soon |
| facebook/mms-tts-eng | 36.3M | 137K | ~$0.0075 / 1k chars | coming soon |
| myshell-ai/MeloTTS-English | - | 135.1K | at launch | coming soon |
| pnnbao-ump/VieNeu-TTS-v3-Turbo | 130.9M | 135K | ~$0.0075 / 1k chars | coming soon |
| facebook/hf-seamless-m4t-medium | - | 113.8K | at launch | coming soon |
| neuphonic/neutts-nano | 228.7M | 113.3K | ~$0.0075 / 1k chars | coming soon |
| SWivid/E2-TTS | - | 108.8K | at launch | coming soon |
| bosonai/higgs-tts-3-4b | 4654.9M | 108.6K | ~$0.0075 / 1k chars | coming soon |
| Misha24-10/F5-TTS_RUSSIAN | - | 89.9K | at launch | coming soon |
| canopylabs/3b-de-ft-research_release | 3300.9M | 86.4K | ~$0.0075 / 1k chars | coming soon |
| OpenMOSS-Team/MOSS-TTS-Nano-100M | - | 83.5K | at launch | coming soon |
| microsoft/speecht5_tts | - | 80.8K | at launch | coming soon |
| moonshotai/Kimi-Audio-7B-Instruct | 9766.3M | 79K | ~$0.0075 / 1k chars | coming soon |
| pnnbao-ump/VieNeu-TTS-v2 | 293.7M | 78.5K | ~$0.0075 / 1k chars | coming soon |
| mistralai/Voxtral-4B-TTS-2603 | - | 74.5K | at launch | coming soon |
| kenpath/svara-tts-v1 | - | 73.1K | at launch | coming soon |
| myshell-ai/MeloTTS-Spanish | - | 71.3K | at launch | coming soon |
| myshell-ai/MeloTTS-Korean | - | 68.4K | at launch | coming soon |
| Supertone/supertonic-3 | - | 65.8K | at launch | coming soon |
| multimodalart/higgs-audio-v3-tts-4b-transformers | 4654.9M | 62.9K | ~$0.0075 / 1k chars | coming soon |
| sbintuitions/sarashina2.2-tts | 809.9M | 59.6K | ~$0.0075 / 1k chars | coming soon |