skip to content
gigarouter gigarouter
models / text-to-speech · coming soon

SeamlessM4T Medium

facebook/hf-seamless-m4t-medium

published Aug 2023 · updated Dec 2023

SeamlessM4T Medium is a unified multilingual translation model supporting text-to-speech, speech-to-speech, speech-to-text, and text-to-text tasks across up to 196 languages.

status
coming soon
API providers
0
downloads / mo
113.8K
license
cc-by-nc-4.0

specs

TaskText-to-Speech Translation / Multilingual Translation
ArchitectureSeamlessM4T (encoder-decoder with w2v-BERT 2.0)
Parameters1.2B
LicenseCC-BY-NC 4.0

about this model

hf-seamless-m4t-medium is a text-to-speech (TTS) model that generates natural speech from text across 35 languages, built on a unified multimodal translation architecture that also supports speech-to-speech, speech-to-text, and text-to-text translation. Developed by Meta and hosted by gigarouter as a managed API, it enables direct text-to-speech generation without requiring separate ASR or TTS pipelines.

Capabilities

  • Accepts text input in 196 languages and produces speech output in 35 languages (with dedicated speaker identities).
  • Supports simultaneous text and speech generation (using return_intermediate_token_ids=True).
  • Provides a single model for five tasks: S2ST, S2TT, T2ST, T2TT, and ASR—all accessible via the same API endpoint.

Performance

On the FLEURS benchmark, the model achieves a 20% BLEU improvement over previous state-of-the-art in direct speech-to-text translation. It improves into-English translation by 1.3 BLEU points in speech-to-text and 2.6 ASR-BLEU points in speech-to-speech compared to strong cascaded systems. The model was trained on 1 million hours of open speech audio using w2v-BERT 2.0 self-supervised learning.

Additional Details

  • Model size: 1.2 billion parameters.
  • License: CC-BY-NC 4.0 (non-commercial).
  • Safety evaluations include gender bias and added toxicity metrics, reported in the SeamlessM4T paper.
  • Full evaluation metrics (BLEU, WER, chrF) are available for download.

best for

FAQ

What tasks can SeamlessM4T Medium perform?

It supports speech-to-speech translation (S2ST), speech-to-text translation (S2TT), text-to-speech translation (T2ST), text-to-text translation (T2TT), and automatic speech recognition (ASR) from a single model.

How many languages does it support?

It covers 101 languages for speech input, 196 languages for text input/output, and 35 languages for speech output.

How can I use this model via gigarouter?

Use the gigarouter OpenAI-compatible endpoint with an API key. Send requests with the required input (text or audio) and target language; the API returns translated text or speech.

What is the license for SeamlessM4T Medium?

It is released under the CC-BY-NC 4.0 license, which allows non-commercial use with attribution.

How does this model compare to previous translation systems?

On FLEURS, SeamlessM4T achieves a 20% BLEU improvement over prior SOTA in direct speech-to-text translation and improves into-English translation by 1.3 BLEU points in speech-to-text and 2.6 ASR-BLEU points in speech-to-speech compared to strong cascaded models.

not yet live

We're benchmarking and onboarding SeamlessM4T Medium as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text-to-speech models

compare all →