Qwen3 ASR 1.7B

Qwen/Qwen3-ASR-1.7B

published Jan 2026 · updated Jan 2026

Qwen3 ASR 1.7B is an ASR model that supports language identification and speech recognition for 52 languages and dialects.

est. price

~$0.0034

· estimated, set at launch

API providers

downloads / mo

1.4M

license

apache-2.0

specs

Task	Automatic Speech Recognition
Architecture	Qwen3-Omni-based transformer
Parameters	1.7B
License	Apache 2.0

about this model

Qwen3-ASR-1.7B is an automatic speech recognition (ASR) model that supports language identification and speech-to-text for 52 languages and dialects, building on the audio understanding foundation of Qwen3-Omni.

Capabilities

The model handles 30 languages and 22 Chinese dialects, including English accents from multiple regions. It processes speech, singing voice, and songs with background music. A single model supports both offline and streaming inference, as well as long audio transcription.

Performance

Qwen3-ASR-1.7B achieves state-of-the-art results among open-source ASR models and is competitive with the strongest proprietary commercial APIs. The smaller Qwen3-ASR-0.6B offers an accuracy-efficiency trade-off, achieving an average time-to-first-token of 92 ms and transcribing 2,000 seconds of speech in 1 second at a concurrency of 128.

Architecture

Diagram showing Qwen3-ASR model overview and supported languages Architecture diagram of the Qwen3-ASR model

Forced Alignment

An optional companion, Qwen3-ForcedAligner-0.6B, predicts timestamps for arbitrary units within up to 5 minutes of speech across 11 languages, outperforming existing end-to-end forced-alignment models in accuracy.

License & Source

Released under Apache 2.0. Full technical details: arXiv:2601.21337.

best for

·Multilingual speech transcription for global applications
·Real-time streaming ASR for voice assistants
·Forced alignment for subtitle generation

FAQ

What languages does Qwen3 ASR 1.7B support?

It supports 30 languages and 22 Chinese dialects, totaling 52 languages and dialects.

How does its performance compare to other ASR models?

It achieves state-of-the-art results among open-source ASR models and is competitive with proprietary commercial APIs.

What is the license of this model?

It is released under the Apache 2.0 license.

How can I use this model via the gigarouter API?

Send requests to the gigarouter OpenAI-compatible endpoint with your API key; the endpoint accepts audio URLs, file paths, or raw audio data.

What input formats are accepted?

Audio can be provided as a local path, URL, base64-encoded data, or a (numpy array, sample rate) tuple.

not yet live

We're benchmarking and onboarding Qwen3 ASR 1.7B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo