Question 1

What is E2 TTS?

Accepted Answer

E2 TTS is a fully non-autoregressive zero-shot text-to-speech model that uses flow matching to generate mel spectrograms from text with filler tokens, achieving human-level naturalness.

Question 2

What input format does E2 TTS expect?

Accepted Answer

The model takes text as input and an audio prompt for voice cloning; generated mel spectrograms are then converted to waveform.

Question 3

How can I use E2 TTS via the gigarouter API?

Accepted Answer

Use the gigarouter OpenAI-compatible endpoint with an API key to send requests for text-to-speech synthesis.

Question 4

Is E2 TTS free to use?

Accepted Answer

The model is licensed under CC BY-NC 4.0, which allows non-commercial use with attribution.

Question 5

How does E2 TTS compare to other zero-shot TTS models?

Accepted Answer

E2 TTS achieves state-of-the-art speaker similarity and intelligibility comparable to Voicebox and NaturalSpeech 3 while being simpler and fully non-autoregressive.

Task	Text-to-Speech
Architecture	Flat-UNet Transformer
License	CC BY-NC 4.0

E2 TTS

specs

about this model

best for

FAQ

related text-to-speech models