CLIP ViT g 14 laion2B s34B b88K
laion/CLIP-ViT-g-14-laion2B-s34B-b88K
published Mar 2023 · updated Jan 2025
A popular open zero-shot image model, with 28.1K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
CLIP-ViT-g-14-laion2B-s34B-b88K is a zero-shot image classification model that maps images and text into a shared embedding space, enabling classification without task-specific fine-tuning. It is a ViT-g/14 model trained on the 2-billion-sample English subset of LAION-5B using OpenCLIP, with training performed on JUWELS Booster and stability.ai AWS clusters as part of reproducible scaling law research (CVPR 2023).
Key capabilities
- Zero-shot image classification across arbitrary class taxonomies defined by natural language prompts.
- Image and text retrieval (e.g., COCO, Flickr).
- Fine-tuning and linear probe classification for downstream tasks.
Benchmark results
The model achieves 78.4% zero-shot top-1 accuracy on ImageNet-1k. This places it between the ViT-H-14 variant (78.0%) and the larger ViT-bigG-14 variant (80.1%), all trained on the same LAION-2B dataset with similar sample counts. Comprehensive evaluations across the VTAB+ benchmark suite (38 datasets) and retrieval tasks are documented in the LAION CLIP Benchmark notebook.
Training configuration
- Dataset: LAION-2B English subset of LAION-5B
- Samples seen: 34.5 billion (135M × 256 checkpoints)
- Global batch size: 88,800 (1,480 GPUs, local batch size 60)
- Learning rate: 1e-3 with cosine annealing, weight decay 0.2
Trained for research purposes on an uncurated web-scale dataset; users should evaluate in-domain performance and consider safety implications before deployment.
We're benchmarking and onboarding CLIP ViT g 14 laion2B s34B b88K as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.