Hosted zero-shot image models
49 models · 0 live as APIs · benchmarked & compared
Zero-shot image models enable classification, search, and similarity matching without task-specific fine-tuning. By jointly embedding images and text into a shared vector space, they allow queries such as "find images similar to this product photo" or "label all user uploads as containing 'parking lot' or 'green space'." These capabilities are used in production for content moderation, visual product search, recommendation systems, and automated metadata tagging.
In production, models like openai/clip-vit-base-patch32 or google/siglip-so400m-patch14-384 are typically deployed as embedding endpoints—images are encoded into vectors, then matched against stored embeddings or compared with text prompts. The choice between models depends on a size–quality–speed trade-off: larger variants (e.g., openai/clip-vit-large-patch14) offer higher accuracy but slower inference and higher cost; smaller ones (e.g., openai/clip-vit-base-patch16) are faster and cheaper with slightly lower precision.
- Examples of models listed:
openai/clip-vit-base-patch32,openai/clip-vit-large-patch14,laion/CLIP-ViT-B-32-laion2B-s34B-b79K,yuvalkirstain/PickScore_v1,patrickjohncyh/fashion-clip.
For most call volumes, calling a hosted API through gigarouter eliminates the operational burden of managing GPU infrastructure, scaling, and model updates, while still providing OpenAI-compatible endpoints for seamless integration.
compare
| model | params | downloads/mo | price | status |
|---|---|---|---|---|
| openai/clip-vit-base-patch32 | - | 22.3M | at launch | coming soon |
| openai/clip-vit-large-patch14 | 427.6M | 12.4M | ~$0.094 / 1k images | coming soon |
| laion/CLIP-ViT-B-32-laion2B-s34B-b79K | 151.3M | 4M | ~$0.047 / 1k images | coming soon |
| openai/clip-vit-large-patch14-336 | - | 3.4M | at launch | coming soon |
| yuvalkirstain/PickScore_v1 | 986.1M | 3.2M | ~$0.235 / 1k images | coming soon |
| patrickjohncyh/fashion-clip | 151.3M | 2.9M | ~$0.047 / 1k images | coming soon |
| google/siglip-so400m-patch14-384 | 878M | 1.8M | ~$0.235 / 1k images | coming soon |
| openai/clip-vit-base-patch16 | - | 1.6M | at launch | coming soon |
| google/siglip2-giant-opt-patch16-384 | 1871.9M | 1.5M | ~$0.626 / 1k images | coming soon |
| google/siglip-base-patch16-224 | 203.2M | 1.4M | ~$0.094 / 1k images | coming soon |
| google/siglip2-base-patch16-naflex | 375.2M | 796.3K | ~$0.094 / 1k images | coming soon |
| google/siglip2-so400m-patch16-naflex | 1135.7M | 732.4K | ~$0.235 / 1k images | coming soon |
| google/siglip2-so400m-patch14-384 | 1136M | 692.9K | ~$0.235 / 1k images | coming soon |
| Marqo/marqo-fashionSigLIP | 203.2M | 642.8K | ~$0.094 / 1k images | coming soon |
| laion/CLIP-convnext_base_w-laion2B-s13B-b82K-augreg | - | 588.3K | at launch | coming soon |
| microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224 | - | 565.8K | at launch | coming soon |
| google/siglip2-so400m-patch16-256 | 1135.7M | 521.6K | ~$0.235 / 1k images | coming soon |
| laion/CLIP-ViT-H-14-laion2B-s32B-b79K | 986.1M | 416K | ~$0.235 / 1k images | coming soon |
| google/siglip2-base-patch16-224 | 375.2M | 408.5K | ~$0.094 / 1k images | coming soon |
| laion/CLIP-ViT-B-16-laion2B-s34B-b88K | - | 402.8K | at launch | coming soon |
| laion/CLIP-ViT-L-14-laion2B-s32B-b82K | 427.6M | 373.8K | ~$0.094 / 1k images | coming soon |
| google/siglip2-so400m-patch16-512 | 1136.6M | 312.9K | ~$0.235 / 1k images | coming soon |
| wkcn/TinyCLIP-ViT-8M-16-Text-3M-YFCC15M | 23.4M | 311.3K | ~$0.047 / 1k images | coming soon |
| facebook/PE-Core-L14-336 | - | 262.4K | at launch | coming soon |
| timm/ViT-SO400M-14-SigLIP-384 | - | 242.3K | at launch | coming soon |
| google/siglip2-base-patch16-256 | 375.2M | 212.4K | ~$0.094 / 1k images | coming soon |
| OFA-Sys/chinese-clip-vit-base-patch16 | - | 199.7K | at launch | coming soon |
| q-future/one-align | - | 188.9K | at launch | coming soon |
| apple/MobileCLIP-S2-OpenCLIP | - | 185.4K | at launch | coming soon |
| timm/ViT-B-16-SigLIP2-256 | - | 156.4K | at launch | coming soon |
| Xenova/clip-vit-base-patch32 | - | 154.7K | at launch | coming soon |
| google/siglip2-base-patch16-512 | 375.8M | 123.9K | ~$0.094 / 1k images | coming soon |
| laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup | - | 118.7K | at launch | coming soon |
| timm/ViT-SO400M-14-SigLIP | - | 116.5K | at launch | coming soon |
| BAAI/AltCLIP | - | 112.2K | at launch | coming soon |
| timm/ViT-B-16-SigLIP | - | 101.4K | at launch | coming soon |
| google/siglip2-large-patch16-256 | 881.5M | 100.8K | ~$0.235 / 1k images | coming soon |
| laion/CLIP-ViT-bigG-14-laion2B-39B-b160k | - | 100.7K | at launch | coming soon |
| laion/CLIP-ViT-L-14-DataComp.XL-s13B-b90K | - | 59K | at launch | coming soon |
| laion/CLIP-ViT-B-32-DataComp.XL-s13B-b90K | - | 47.1K | at launch | coming soon |
| google/siglip-large-patch16-384 | 652.5M | 39.4K | ~$0.235 / 1k images | coming soon |
| laion/CLIP-ViT-B-16-DataComp.XL-s13B-b90K | - | 37.3K | at launch | coming soon |
| google/siglip-large-patch16-256 | 652.2M | 30.2K | ~$0.235 / 1k images | coming soon |
| laion/CLIP-ViT-g-14-laion2B-s34B-b88K | - | 28.1K | at launch | coming soon |
| google/siglip-base-patch16-256 | 203.2M | 26.9K | ~$0.094 / 1k images | coming soon |
| google/siglip-base-patch16-256-multilingual | 370.6M | 22.6K | ~$0.094 / 1k images | coming soon |
| google/siglip-base-patch16-384 | 203.4M | 20.6K | ~$0.094 / 1k images | coming soon |
| kakaobrain/align-base | - | 12.3K | at launch | coming soon |
| google/siglip-base-patch16-512 | 203.8M | 8.5K | ~$0.094 / 1k images | coming soon |