skip to content
gigarouter gigarouter
models / text-to-speech · coming soon

Higgs TTS 3

bosonai/higgs-tts-3-4b

published Jun 2026 · updated Jun 2026

Higgs TTS 3 is a TTS model that generates expressive conversational speech with zero-shot voice cloning and inline control over emotion, style, prosody, pauses, and sound effects across 100+ languages.

est. price
~$0.0075
· estimated, set at launch
API providers
0
downloads / mo
108.6K
license
other

specs

TaskText-to-Speech (TTS)
Architecture~4B autoregressive decoder with multi-codebook fused embedding/head, Higgs Tokenizer (8 codebooks, 25 fps)
Parameters~4 billion
LicenseBoson Higgs TTS 3 Research and Non-Commercial License (non-commercial use only; commercial requires separate license)

about this model

bosonai/higgs-tts-3-4b is a text-to-speech (TTS) model designed for voice chat, producing expressive conversational speech across over 100 languages with zero-shot voice cloning and inline control over emotion, style, prosody, pauses, and sound effects. Hosted on gigarouter as an OpenAI-compatible API, the model is built on a ~4B autoregressive decoder backbone (36 layers, hidden size 2560, GQA 32/8) with a Higgs Tokenizer that encodes audio into 8 codebooks at 25 fps. It outputs 24 kHz waveforms at 40 ms per frame.

Multilingual Performance

The model achieves single-digit word error rate (WER) or character error rate (CER) on 102 languages. 85 languages score below 5 (polished quality), and 17 languages score between 5 and 10 (usable). Benchmark results from the Boson blog show Higgs TTS 3 outperforming both its predecessor and the best non-Higgs model across four multilingual tests:

BenchmarkHiggs TTS 3Best non-Higgs
SeedTTS1.111.21
CV34.414.60
MiniMax-Multilingual2.742.98
Higgs-Multilingual (111 languages)3.613.63

Conversational Benchmarks

In win-rate evaluations against models including Fish Audio S2 Pro, Qwen3-TTS-1.7B, IndexTTS-2, MOSS-TTS-v1.5, and OmniVoice, Higgs TTS 3 achieved an overall win rate of 53.65%. Category breakdowns: Emotions 53.75%, Foreign Words 48.75%, Paralinguistics 68.57%, Complex Pronunciation 25.10%, Questions 61.43%, Syntactic Complexity 60.71%. Results have been reproduced by the SGLang-Omni team.

Control Tokens

Inline control tokens follow <|category:value|> syntax and can be inserted mid-utterance. Supported categories include 21 emotions (e.g., <|emotion:elation|>, <|emotion:anger|>), 3 styles (<|style:singing|>, <|style:shouting|>, <|style:whispering|>), and 9 sound effects (e.g., <|sfx:laughter|>, <|sfx:cough|>). Detailed prompting examples are available in the model’s PROMPTING.md.

Higgs TTS 3 architecture diagram showing autoregressive decoder, Higgs Tokenizer, multi-codebook fused embedding, and waveform output.

best for

FAQ

What languages does Higgs TTS 3 support?

It supports 100+ languages with single-digit WER/CER on 102 languages; 85 languages achieve below 5% WER/CER.

Can I control emotion and style in the speech output?

Yes, by inserting control tokens like <|emotion:anger|> or <|style:whispering|> in the text.

How does Higgs TTS 3 compare to other TTS models?

It achieves lower WER across benchmarks than Higgs TTS 2 and outperforms models like SeedTTS and Qwen3-TTS in conversational benchmarks (53.65% overall win rate).

How do I call Higgs TTS 3 via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, passing the text and optional control tokens.

What are the licensing terms for Higgs TTS 3?

It is released for research and non-commercial use under the Boson Higgs TTS 3 Research and Non-Commercial License; commercial use requires a separate license.

not yet live

We're benchmarking and onboarding Higgs TTS 3 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →