skip to content
gigarouter gigarouter
models / zero-shot image · coming soon

CLIP ViT g 14 laion2B s34B b88K

laion/CLIP-ViT-g-14-laion2B-s34B-b88K

published Mar 2023 · updated Jan 2025

A popular open zero-shot image model, with 28.1K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
28.1K
license
mit

about this model

CLIP-ViT-g-14-laion2B-s34B-b88K is a zero-shot image classification model that maps images and text into a shared embedding space, enabling classification without task-specific fine-tuning. It is a ViT-g/14 model trained on the 2-billion-sample English subset of LAION-5B using OpenCLIP, with training performed on JUWELS Booster and stability.ai AWS clusters as part of reproducible scaling law research (CVPR 2023).

Key capabilities

  • Zero-shot image classification across arbitrary class taxonomies defined by natural language prompts.
  • Image and text retrieval (e.g., COCO, Flickr).
  • Fine-tuning and linear probe classification for downstream tasks.

Benchmark results

The model achieves 78.4% zero-shot top-1 accuracy on ImageNet-1k. This places it between the ViT-H-14 variant (78.0%) and the larger ViT-bigG-14 variant (80.1%), all trained on the same LAION-2B dataset with similar sample counts. Comprehensive evaluations across the VTAB+ benchmark suite (38 datasets) and retrieval tasks are documented in the LAION CLIP Benchmark notebook.

Training configuration

  • Dataset: LAION-2B English subset of LAION-5B
  • Samples seen: 34.5 billion (135M × 256 checkpoints)
  • Global batch size: 88,800 (1,480 GPUs, local batch size 60)
  • Learning rate: 1e-3 with cosine annealing, weight decay 0.2

Trained for research purposes on an uncurated web-scale dataset; users should evaluate in-domain performance and consider safety implications before deployment.

not yet live

We're benchmarking and onboarding CLIP ViT g 14 laion2B s34B b88K as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related zero-shot image models

compare all →