siglip large patch16 256

google/siglip-large-patch16-256

published Jan 2024 · updated Sep 2024

A popular open zero-shot image model, with 30.2K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

est. price

~$0.235

/ 1k images · estimated, set at launch

API providers

downloads / mo

30.2K

license

apache-2.0

about this model

google/siglip-large-patch16-256 is a zero-shot image classification and image-text retrieval model that extends the CLIP architecture with a pairwise sigmoid loss function for language-image pre-training. Unlike standard contrastive learning with softmax normalization, the sigmoid loss operates solely on image-text pairs without requiring a global view of pairwise similarities for normalization, enabling effective training at both very large and small batch sizes.

Architecture and Training

The model is pre-trained on the English image-text pairs of the WebLI dataset at 256x256 resolution. Images are resized and normalized across RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5); text is tokenized and padded to 64 tokens. Training used 16 TPU-v4 chips over three days.

Key Strengths

The sigmoid loss decouples batch size from normalization, allowing scaling up to 1 million pairs while finding 32k sufficient for strong performance.
A SigLiT variant (SigLIP with Locked-image Tuning) achieves 84.5% ImageNet zero-shot accuracy using only 4 TPUv4 chips trained in 2 days.
Presented as an ICCV'23 Oral paper.

Benchmark Comparison

The following comparison from the paper shows SigLIP evaluation results against CLIP:

Evaluation results comparing SigLIP to CLIP across multiple benchmarks

Model Variants

Version v4 of the paper adds results for SigLIP Base, Large, and Shape-Optimized 400M model variants. This specific model is the large-sized variant at 256x256 resolution.

not yet live

We're benchmarking and onboarding siglip large patch16 256 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related zero-shot image models

compare all →

clip-vit-base-patch32

22.3M dl/mo

clip-vit-large-patch14

12.4M dl/mo

CLIP-ViT-B-32-laion2B-s34B-b79K

4M dl/mo

clip-vit-large-patch14-336