WhisperKit

argmaxinc/whisperkit-coreml

published Feb 2024 · updated Apr 2026

WhisperKit is an automatic speech recognition (ASR) model that transcribes speech to text, optimized for on-device inference on Apple Silicon via CoreML.

status

coming soon

API providers

downloads / mo

specs

Task	Automatic Speech Recognition (ASR)
Architecture	OpenAI Whisper (CoreML optimized)
License	MIT
Platform	Apple Silicon

about this model

WhisperKit is an automatic speech recognition (ASR) model that provides on-device speech-to-text transcription using OpenAI Whisper, optimized for Apple Silicon via CoreML. It is part of the larger Argmax Open-Source SDK, which also includes SpeakerKit for speaker diarization (based on Pyannote) and TTSKit for text-to-speech (based on Qwen-TTS). The model was presented at ICML 2025.

WhisperKit is designed for efficient, low-latency inference on Apple hardware, requiring macOS 14.0 or later and Xcode 16.0 or later. The SDK is released under the MIT License (Copyright 2024 argmax, inc.). For production use cases requiring real-time transcription with speaker identification and custom vocabulary, the Argmax Pro SDK extends these capabilities with additional models and advanced features.

As a hosted API on gigarouter, WhisperKit delivers the same on-device ASR performance without the need to manage local infrastructure, providing OpenAI-compatible endpoints for seamless integration into existing workflows.

best for

·On-device speech-to-text on Apple devices
·Offline transcription of audio files (meetings, podcasts, voice memos)
·Building voice-controlled apps for macOS and iOS

FAQ

What is the primary use case for WhisperKit?

It is designed for on-device automatic speech recognition on Apple Silicon, converting speech to text without requiring cloud connectivity.

Is WhisperKit available as a hosted API?

Yes, it can be accessed via gigarouter's OpenAI-compatible endpoint with an API key.

What is the license for WhisperKit?

The model is released under the MIT License.

What are the system requirements for running WhisperKit?

It requires macOS 14.0 or later and Xcode 16.0 or later to run on Apple Silicon.

Does the open-source version of WhisperKit support real-time transcription?

No, the open-source version provides batch transcription. Real-time transcription with speaker diarization is available in Argmax Pro SDK.

not yet live

We're benchmarking and onboarding WhisperKit as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related speech-to-text models

compare all →

speaker-diarization-3.1

8.2M dl/mo

whisper-base

6.4M dl/mo

wav2vec2-large-xlsr-53-japanese

6.1M dl/mo

wav2vec2-large-xlsr-53-polish

4.7M dl/mo

wav2vec2-large-xlsr-53-dutch

4.1M dl/mo

wav2vec2-indonesian-javanese-sundanese

4.1M dl/mo