skip to content
gigarouter gigarouter
models / specialist model · coming soon

blip itm large coco

Salesforce/blip-itm-large-coco

published Dec 2022 · updated Feb 2025

A popular open specialist model model, with 4.6K downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.

status
coming soon
API providers
0
downloads / mo
4.6K
license
bsd-3-clause

about this model

BLIP (Bootstrapping Language-Image Pre-training) is a vision-language model that performs image-text matching (ITM) and retrieval, using a large ViT-L backbone fine-tuned on the COCO dataset. It outputs an ITM score and a cosine similarity score for a given image-text pair, enabling both fine-grained matching and ranking.

The model unifies understanding and generation tasks, leveraging a captioner-filter framework to bootstrap noisy web data. On the COCO image-text retrieval benchmark, it achieves state-of-the-art results: text retrieval recall@1 of 82.0, recall@5 of 95.8, and recall@10 of 98.1; image retrieval recall@1 of 64.5, recall@5 of 86.0, and recall@7 of 91.7. On the VQA v2 test-dev set it scores 78.23, and on COCO image captioning it achieves a CIDEr of 133.5 and BLEU@4 of 39.9. The architecture is based on the BLIP framework described in the paper arXiv:2201.12086.

Benchmark performance (from LAVIS evaluation)

Task Metric Score
COCO Text Retrieval Recall@1 82.0
COCO Text Retrieval Recall@5 95.8
COCO Text Retrieval Recall@10 98.1
COCO Image Retrieval Recall@1 64.5
COCO Image Retrieval Recall@5 86.0
COCO Image Retrieval Recall@7 91.7
VQA v2 Test-dev 78.23
COCO Captioning CIDEr 133.5
COCO Captioning BLEU@4 39.9

Gigarouter hosts this model as a managed API, eliminating the need for local setup. The official BLIP repository is deprecated; the model is integrated into the LAVIS library (BSD 3-Clause license).

not yet live

We're benchmarking and onboarding blip itm large coco as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →