YOLOS Fashionpedia
valentinafevu/yolos-fashionpedia
published Nov 2022 · updated Feb 2026
YOLOS Fashionpedia is a detection model fine-tuned from YOLOS to detect 46 fashion-related object categories including apparel, accessories, and garment parts.
specs
| Task | Object Detection |
| Architecture | YOLOS (You Only Look at One Sequence) |
| License | CC-BY-4.0 (dataset) |
about this model
valentinafevu/yolos-fashionpedia is an object detection model fine-tuned on the Fashionpedia dataset for detecting fashion items and accessories. It builds on the YOLOS architecture (You Only Look at One Sequence), a transformer-based approach that achieved state-of-the-art performance on COCO object detection.
The model was trained on Fashionpedia, a dataset of 46,781 images and 342,182 bounding-boxes covering 46 categories. The dataset includes 45,600 training images and 1,160 validation images, licensed under CC-BY-4.0. Its ontology comprises 27 main apparel categories (e.g., dress, jacket, pants), 19 apparel parts (e.g., collar, sleeve, pocket), and 294 fine-grained attributes. Research presented at ECCV 2020 demonstrated that instance segmentation models pre-trained on Fashionpedia achieve better transfer-learning performance on other fashion datasets than ImageNet pre-training.
The model detects categories such as shirt/blouse, top/t-shirt/sweatshirt, sweater, cardigan, jacket, vest, pants, shorts, skirt, coat, dress, jumpsuit, cape, glasses, hat, headband/head covering/hair accessory, tie, glove, watch, belt, leg warmer, tights/stockings, sock, shoe, bag/wallet, scarf, umbrella, hood, collar, lapel, epaulette, sleeve, pocket, neckline, buckle, zipper, applique, bead, bow, flower, fringe, ribbon, rivet, ruffle, sequin, and tassel.

best for
- ·Detecting and localizing fashion items like shirts, pants, shoes, and accessories in images
- ·Identifying garment parts such as collars, sleeves, pockets, and zippers for detailed fashion analysis
FAQ
It detects 46 categories including apparel (e.g., shirt, pants, dress), accessories (e.g., glasses, watch, bag), and garment parts (e.g., collar, sleeve, pocket).
It was fine-tuned on Fashionpedia, which contains 46,781 images and 342,182 bounding-boxes across 46 categories.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image for detection.
Input is an image; output is a list of detected objects with bounding boxes, class labels, and confidence scores.
The dataset is licensed under CC-BY-4.0; the model card does not specify a separate model license.
We're benchmarking and onboarding YOLOS Fashionpedia as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.