skip to content
gigarouter gigarouter
tasks / speech-to-text

Hosted speech-to-text models

53 models · 0 live as APIs · benchmarked & compared

Speech-to-text models convert spoken audio into written text, enabling applications such as real-time captioning, meeting transcription, voice-controlled interfaces, and automated subtitling. Speaker diarization models—such as pyannote/speaker-diarization-3.1—extend this by identifying who spoke when, which is critical for multi-speaker recordings like conference calls or interviews.

In production, these models are typically deployed in pipelines that include voice activity detection, language identification, and post-processing for punctuation and formatting. The choice among models involves a trade-off between transcription accuracy, latency, and computational cost. For example, openai/whisper-base offers a fast, compact option, while larger variants or specialized models like jonatasgrosman/wav2vec2-large-xlsr-53-japanese are tuned for specific languages or higher accuracy at the expense of speed and memory.

This page lists 30 speech-to-text models (0 currently live, the remainder being onboarded), including pyannote/speaker-diarization-3.1, argmaxinc/whisperkit-coreml, openai/whisper-base, and several wav2vec2 variants. Calling a

compare

modelparamsdownloads/mopricestatus
pyannote/speaker-diarization-3.1-8.2Mat launchcoming soon
argmaxinc/whisperkit-coreml-8Mat launchcoming soon
openai/whisper-base72.6M6.4M~$0.0034 / minutecoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-japanese-6.1Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-polish-4.7Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-dutch-4.1Mat launchcoming soon
indonesian-nlp/wav2vec2-indonesian-javanese-sundanese-4.1Mat launchcoming soon
pyannote/speaker-diarization-community-1-4Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-arabic-3.5Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-hungarian-3.4Mat launchcoming soon
openai/whisper-small241.7M3.3M~$0.0034 / minutecoming soon
MahmoudAshraf/mms-300m-1130-forced-aligner315.5M3.2M~$0.0034 / minutecoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-portuguese-3.2Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-russian-2.9Mat launchcoming soon
gigant/romanian-wav2vec2315.5M2.8M~$0.0034 / minutecoming soon
anuragshas/wav2vec2-large-xlsr-53-telugu-2.8Mat launchcoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-persian-2.5Mat launchcoming soon
KBLab/wav2vec2-large-voxrex-swedish315.5M2.5M~$0.0034 / minutecoming soon
kingabzpro/wav2vec2-large-xls-r-300m-Urdu315.5M2.3M~$0.0034 / minutecoming soon
theainerd/Wav2Vec2-large-xlsr-hindi315.5M2.1M~$0.0034 / minutecoming soon
pyannote/voice-activity-detection-2Mat launchcoming soon
mistralai/Voxtral-Mini-4B-Realtime-26024429.7M2M~$0.0034 / minutecoming soon
imvladikon/wav2vec2-xls-r-300m-hebrew315.5M1.8M~$0.0034 / minutecoming soon
mesolitica/wav2vec2-xls-r-300m-mixed-1.8Mat launchcoming soon
airesearch/wav2vec2-large-xlsr-53-th-1.7Mat launchcoming soon
openai/whisper-tiny37.8M1.6M~$0.0034 / minutecoming soon
jonatasgrosman/wav2vec2-large-xlsr-53-chinese-zh-cn-1.5Mat launchcoming soon
mlx-community/parakeet-tdt-0.6b-v2-1.5Mat launchcoming soon
arijitx/wav2vec2-xls-r-300m-bengali-1.4Mat launchcoming soon
Systran/faster-whisper-base-1.4Mat launchcoming soon
Qwen/Qwen3-ASR-1.7B2349.2M1.4M~$0.0034 / minutecoming soon
Qwen/Qwen3-ASR-0.6B938M941.1K~$0.0034 / minutecoming soon
nvidia/parakeet-ctc-1.1b1062.6M781.7K~$0.0034 / minutecoming soon
microsoft/Phi-4-multimodal-instruct5574.5M541.1K~$0.0034 / minutecoming soon
zai-org/GLM-ASR-Nano-25122257.8M133.7K~$0.0034 / minutecoming soon
openai/whisper-large-v21543.3M115K~$0.0034 / minutecoming soon
openai/whisper-medium.en763.9M50.1K~$0.0034 / minutecoming soon
openai/whisper-small.en241.7M45.8K~$0.0034 / minutecoming soon
UsefulSensors/moonshine-base61.5M40.6K~$0.0034 / minutecoming soon
nvidia/parakeet-rnnt-0.6b616.7M36.6K~$0.0034 / minutecoming soon
openai/whisper-large1543.3M35K~$0.0034 / minutecoming soon
openai/whisper-base.en72.6M30.8K~$0.0034 / minutecoming soon
nvidia/parakeet-ctc-0.6b608.8M15.3K~$0.0034 / minutecoming soon
UsefulSensors/moonshine-streaming-medium265.9M12.9K~$0.0034 / minutecoming soon
UsefulSensors/moonshine-streaming-small140.1M6.1K~$0.0034 / minutecoming soon
nvidia/canary-1b-flash811M3.9K~$0.0034 / minutecoming soon
distil-whisper/distil-large-v3.5756.4M3K~$0.0034 / minutecoming soon
nvidia/parakeet-rnnt-1.1b1070.5M2.4K~$0.0034 / minutecoming soon
AutoArk-AI/ARK-ASR-3B4063.4M1.7K~$0.0034 / minutecoming soon
AutoArk-AI/ARK-ASR-0.6B1299.5M1.6K~$0.0034 / minutecoming soon
OpenMOSS-Team/MOSS-Transcribe-preview-2B2418.8M879~$0.0034 / minutecoming soon
shunyalabs/pingala-v1-universal808.9M73~$0.0034 / minutecoming soon
kyutai/stt-2.6b-en2617.1M-~$0.0034 / minutecoming soon