skip to content
gigarouter gigarouter
models / image segmentation · coming soon

CLIPSeg RD64 Refined

CIDAS/clipseg-rd64-refined

published Nov 2022 · updated Dec 2024

CLIPSeg RD64 Refined is a segmentation model that generates binary segmentation maps from text or image prompts without requiring task-specific training.

est. price
~$0.047
/ 1k images · estimated, set at launch
API providers
0
downloads / mo
1M
license
apache-2.0

specs

TaskImage Segmentation (zero-shot, one-shot, referring expression)
ArchitectureCLIP backbone with transformer-based decoder (reduce_dim=64, complex_trans_conv)
Parameters150.7M
LicenseApache 2.0

about this model

CIDAS/clipseg-rd64-refined is a zero-shot and one-shot image segmentation model that generates binary segmentation maps from arbitrary text or image prompts. It is part of the CLIPSeg family introduced in the paper "Image Segmentation Using Text and Image Prompts" (Lüddecke et al., CVPR 2022). The model builds upon CLIP with a transformer-based decoder and uses a reduce dimension of 64 with a refined, more complex convolutional decoder for fine-grained predictions.

Capabilities

Unlike traditional segmentation models that require training on fixed object classes, this model can segment any object, affordance, or property described by a free-text query or demonstrated by an image with a mask. It unifies three common tasks within a single trained model:

  • Referring expression segmentation
  • Zero-shot segmentation
  • One-shot segmentation

The model was trained on an extended version of the PhraseCut dataset and outputs a binary segmentation map for any prompt at test time. Its refined decoder (rd64) produces more accurate boundary delineation compared to earlier CLIPSeg weights.

Key details

The model contains 150.7 million parameters (603 MB, safetensors format) and is released under the Apache 2.0 license. It was accepted to CVPR 2022 and has been downloaded over 149 million times.

best for

FAQ

What tasks can CLIPSeg RD64 Refined perform?

It can perform zero-shot segmentation, one-shot segmentation, and referring expression segmentation using text or image prompts.

How large is the model?

It has 150.7 million parameters and a size of 603 MB.

What license is the model released under?

It is released under the Apache 2.0 license on Hugging Face.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, specifying the model ID and your prompt; the API will return a segmentation mask.

What input formats does the model support?

It accepts a text prompt (e.g., "dog") or an image prompt (with a reference mask) to define the segmentation target.

not yet live

We're benchmarking and onboarding CLIPSeg RD64 Refined as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image segmentation models

compare all →