Higgs TTS 3

bosonai/higgs-tts-3-4b

published Jun 2026 · updated Jun 2026

Higgs TTS 3 is a TTS model that generates expressive conversational speech with zero-shot voice cloning and inline control over emotion, style, prosody, pauses, and sound effects across 100+ languages.

est. price

~$0.0075

· estimated, set at launch

API providers

downloads / mo

108.6K

license

other

specs

Task	Text-to-Speech (TTS)
Architecture	~4B autoregressive decoder with multi-codebook fused embedding/head, Higgs Tokenizer (8 codebooks, 25 fps)
Parameters	~4 billion
License	Boson Higgs TTS 3 Research and Non-Commercial License (non-commercial use only; commercial requires separate license)

about this model

bosonai/higgs-tts-3-4b is a text-to-speech (TTS) model designed for voice chat, producing expressive conversational speech across over 100 languages with zero-shot voice cloning and inline control over emotion, style, prosody, pauses, and sound effects. Hosted on gigarouter as an OpenAI-compatible API, the model is built on a ~4B autoregressive decoder backbone (36 layers, hidden size 2560, GQA 32/8) with a Higgs Tokenizer that encodes audio into 8 codebooks at 25 fps. It outputs 24 kHz waveforms at 40 ms per frame.

Multilingual Performance

The model achieves single-digit word error rate (WER) or character error rate (CER) on 102 languages. 85 languages score below 5 (polished quality), and 17 languages score between 5 and 10 (usable). Benchmark results from the Boson blog show Higgs TTS 3 outperforming both its predecessor and the best non-Higgs model across four multilingual tests:

Benchmark	Higgs TTS 3	Best non-Higgs
SeedTTS	1.11	1.21
CV3	4.41	4.60
MiniMax-Multilingual	2.74	2.98
Higgs-Multilingual (111 languages)	3.61	3.63

Conversational Benchmarks

In win-rate evaluations against models including Fish Audio S2 Pro, Qwen3-TTS-1.7B, IndexTTS-2, MOSS-TTS-v1.5, and OmniVoice, Higgs TTS 3 achieved an overall win rate of 53.65%. Category breakdowns: Emotions 53.75%, Foreign Words 48.75%, Paralinguistics 68.57%, Complex Pronunciation 25.10%, Questions 61.43%, Syntactic Complexity 60.71%. Results have been reproduced by the SGLang-Omni team.

Control Tokens

Higgs TTS 3 architecture diagram showing autoregressive decoder, Higgs Tokenizer, multi-codebook fused embedding, and waveform output.

best for

·Voice chatbots and conversational AI
·Expressive narration for audiobooks and stories
·Multilingual customer service voice responses
·Zero-shot voice cloning for personalized assistants

FAQ

What languages does Higgs TTS 3 support?

It supports 100+ languages with single-digit WER/CER on 102 languages; 85 languages achieve below 5% WER/CER.

Can I control emotion and style in the speech output?

Yes, by inserting control tokens like <|emotion:anger|> or <|style:whispering|> in the text.

How does Higgs TTS 3 compare to other TTS models?

It achieves lower WER across benchmarks than Higgs TTS 2 and outperforms models like SeedTTS and Qwen3-TTS in conversational benchmarks (53.65% overall win rate).

How do I call Higgs TTS 3 via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, passing the text and optional control tokens.

What are the licensing terms for Higgs TTS 3?

It is released for research and non-commercial use under the Boson Higgs TTS 3 Research and Non-Commercial License; commercial use requires a separate license.

not yet live

We're benchmarking and onboarding Higgs TTS 3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →

XTTS-v2

9.3M dl/mo

Qwen3-TTS-12Hz-1.7B-CustomVoice

2M dl/mo

Qwen3-TTS-12Hz-0.6B-CustomVoice