Question 1

What tasks can SeamlessM4T Medium perform?

Accepted Answer

It supports speech-to-speech translation (S2ST), speech-to-text translation (S2TT), text-to-speech translation (T2ST), text-to-text translation (T2TT), and automatic speech recognition (ASR) from a single model.

Question 2

How many languages does it support?

Accepted Answer

It covers 101 languages for speech input, 196 languages for text input/output, and 35 languages for speech output.

Question 3

How can I use this model via gigarouter?

Accepted Answer

Use the gigarouter OpenAI-compatible endpoint with an API key. Send requests with the required input (text or audio) and target language; the API returns translated text or speech.

Question 4

What is the license for SeamlessM4T Medium?

Accepted Answer

It is released under the CC-BY-NC 4.0 license, which allows non-commercial use with attribution.

Question 5

How does this model compare to previous translation systems?

Accepted Answer

On FLEURS, SeamlessM4T achieves a 20% BLEU improvement over prior SOTA in direct speech-to-text translation and improves into-English translation by 1.3 BLEU points in speech-to-text and 2.6 ASR-BLEU points in speech-to-speech compared to strong cascaded models.

Task	Text-to-Speech Translation / Multilingual Translation
Architecture	SeamlessM4T (encoder-decoder with w2v-BERT 2.0)
Parameters	1.2B
License	CC-BY-NC 4.0

SeamlessM4T Medium

specs

about this model

Capabilities

Performance

Additional Details

best for

FAQ

related text-to-speech models