models / image segmentation · coming soon

CLIPSeg RD64 Refined

CIDAS/clipseg-rd64-refined

published Nov 2022 · updated Dec 2024

CLIPSeg RD64 Refined is a segmentation model that generates binary segmentation maps from text or image prompts without requiring task-specific training.

est. price

~$0.047

/ 1k images · estimated, set at launch

API providers

downloads / mo

license

apache-2.0

specs

Task	Image Segmentation (zero-shot, one-shot, referring expression)
Architecture	CLIP backbone with transformer-based decoder (reduce_dim=64, complex_trans_conv)
Parameters	150.7M
License	Apache 2.0

about this model

CIDAS/clipseg-rd64-refined is a zero-shot and one-shot image segmentation model that generates binary segmentation maps from arbitrary text or image prompts. It is part of the CLIPSeg family introduced in the paper "Image Segmentation Using Text and Image Prompts" (Lüddecke et al., CVPR 2022). The model builds upon CLIP with a transformer-based decoder and uses a reduce dimension of 64 with a refined, more complex convolutional decoder for fine-grained predictions.

Capabilities

Unlike traditional segmentation models that require training on fixed object classes, this model can segment any object, affordance, or property described by a free-text query or demonstrated by an image with a mask. It unifies three common tasks within a single trained model:

Referring expression segmentation
Zero-shot segmentation
One-shot segmentation

The model was trained on an extended version of the PhraseCut dataset and outputs a binary segmentation map for any prompt at test time. Its refined decoder (rd64) produces more accurate boundary delineation compared to earlier CLIPSeg weights.

Key details

The model contains 150.7 million parameters (603 MB, safetensors format) and is released under the Apache 2.0 license. It was accepted to CVPR 2022 and has been downloaded over 149 million times.

best for

·Segmenting objects described by free-form text prompts
·One-shot segmentation using an example image with a mask
·Zero-shot segmentation for novel object categories not in training data

FAQ

What tasks can CLIPSeg RD64 Refined perform?

It can perform zero-shot segmentation, one-shot segmentation, and referring expression segmentation using text or image prompts.

How large is the model?

It has 150.7 million parameters and a size of 603 MB.

What license is the model released under?

It is released under the Apache 2.0 license on Hugging Face.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, specifying the model ID and your prompt; the API will return a segmentation mask.

What input formats does the model support?

It accepts a text prompt (e.g., "dog") or an image prompt (with a reference mask) to define the segmentation target.

not yet live

We're benchmarking and onboarding CLIPSeg RD64 Refined as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related image segmentation models