skip to content
gigarouter gigarouter
models / specialist model · coming soon

BLIP ITM Base COCO

Salesforce/blip-itm-base-coco

published Dec 2022 · updated Feb 2025

BLIP ITM Base COCO is a vision-language model fine-tuned for image-text matching, using a ViT-B backbone and trained on the COCO dataset.

status
coming soon
API providers
0
downloads / mo
24.6K
license
bsd-3-clause

specs

TaskImage-Text Matching
ArchitectureBLIP with ViT-B backbone
ParametersNot specified
LicenseCreative Commons Attribution 4.0 International

best for

FAQ

What is the input format for this model?

The model expects an image and a text caption. Use the BlipProcessor to preprocess both into tensors.

How do I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending the image and text as part of the request.

What is the difference between the ITM head and cosine similarity scores?

The ITM head outputs a direct matching score, while cosine similarity uses the unimodal embeddings for a similarity measure.

What license does this model use?

It is released under the Creative Commons Attribution 4.0 International license.

Can this model be used for image captioning?

No, this specific checkpoint is for image-text matching only; other BLIP variants handle captioning.

not yet live

We're benchmarking and onboarding BLIP ITM Base COCO as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →