CLIPSeg RD64 Refined
CIDAS/clipseg-rd64-refined
published Nov 2022 · updated Dec 2024
CLIPSeg RD64 Refined is a segmentation model that generates binary segmentation maps from text or image prompts without requiring task-specific training.
specs
| Task | Image Segmentation (zero-shot, one-shot, referring expression) |
| Architecture | CLIP backbone with transformer-based decoder (reduce_dim=64, complex_trans_conv) |
| Parameters | 150.7M |
| License | Apache 2.0 |
about this model
CIDAS/clipseg-rd64-refined is a zero-shot and one-shot image segmentation model that generates binary segmentation maps from arbitrary text or image prompts. It is part of the CLIPSeg family introduced in the paper "Image Segmentation Using Text and Image Prompts" (Lüddecke et al., CVPR 2022). The model builds upon CLIP with a transformer-based decoder and uses a reduce dimension of 64 with a refined, more complex convolutional decoder for fine-grained predictions.
Capabilities
Unlike traditional segmentation models that require training on fixed object classes, this model can segment any object, affordance, or property described by a free-text query or demonstrated by an image with a mask. It unifies three common tasks within a single trained model:
- Referring expression segmentation
- Zero-shot segmentation
- One-shot segmentation
The model was trained on an extended version of the PhraseCut dataset and outputs a binary segmentation map for any prompt at test time. Its refined decoder (rd64) produces more accurate boundary delineation compared to earlier CLIPSeg weights.
Key details
The model contains 150.7 million parameters (603 MB, safetensors format) and is released under the Apache 2.0 license. It was accepted to CVPR 2022 and has been downloaded over 149 million times.
best for
- ·Segmenting objects described by free-form text prompts
- ·One-shot segmentation using an example image with a mask
- ·Zero-shot segmentation for novel object categories not in training data
FAQ
It can perform zero-shot segmentation, one-shot segmentation, and referring expression segmentation using text or image prompts.
It has 150.7 million parameters and a size of 603 MB.
It is released under the Apache 2.0 license on Hugging Face.
Use the OpenAI-compatible endpoint with your API key, specifying the model ID and your prompt; the API will return a segmentation mask.
It accepts a text prompt (e.g., "dog") or an image prompt (with a reference mask) to define the segmentation target.
We're benchmarking and onboarding CLIPSeg RD64 Refined as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.