rate card
Models & pricing
The specialist models we've benchmarked, hosted and priced — with the long tail we're onboarding next below. Prices are in each model's native unit; realtime is the on-demand rate, batch is a discounted flexible tier (send X-Tier: batch).
allembeddingsspeech-to-textvision-languagezero-shot imagererankerimage-to-texttext-to-speechobject detectiondepth estimationtext generation
53 matches in speech-to-text · clear
no live models match — see the roadmap below or clear the filter.
| model | task | tier | realtime | batch |
|---|
On the roadmap
53 modelsHigh-demand specialist models with no hosted API. We benchmark and onboard them by task - each has a page; sign in and tell us which you need to jump the queue.
speech-to-text · 53
speaker-diarization-3.1whisperkit-coremlwhisper-basewav2vec2-large-xlsr-53-japanesewav2vec2-large-xlsr-53-polishwav2vec2-large-xlsr-53-dutchwav2vec2-indonesian-javanese-sundanesespeaker-diarization-community-1wav2vec2-large-xlsr-53-arabicwav2vec2-large-xlsr-53-hungarianwhisper-smallmms-300m-1130-forced-alignerwav2vec2-large-xlsr-53-portuguesewav2vec2-large-xlsr-53-russianromanian-wav2vec2wav2vec2-large-xlsr-53-teluguwav2vec2-large-xlsr-53-persianwav2vec2-large-voxrex-swedishwav2vec2-large-xls-r-300m-UrduWav2Vec2-large-xlsr-hindivoice-activity-detectionVoxtral-Mini-4B-Realtime-2602wav2vec2-xls-r-300m-hebrewwav2vec2-xls-r-300m-mixedwav2vec2-large-xlsr-53-thwhisper-tinywav2vec2-large-xlsr-53-chinese-zh-cnparakeet-tdt-0.6b-v2wav2vec2-xls-r-300m-bengalifaster-whisper-baseQwen3-ASR-1.7BQwen3-ASR-0.6Bparakeet-ctc-1.1bPhi-4-multimodal-instructGLM-ASR-Nano-2512whisper-large-v2whisper-medium.enwhisper-small.enmoonshine-baseparakeet-rnnt-0.6bwhisper-largewhisper-base.enparakeet-ctc-0.6bmoonshine-streaming-mediummoonshine-streaming-smallcanary-1b-flashdistil-large-v3.5parakeet-rnnt-1.1bARK-ASR-3BARK-ASR-0.6BMOSS-Transcribe-preview-2Bpingala-v1-universalstt-2.6b-en