YOLOS Fashionpedia

valentinafevu/yolos-fashionpedia

published Nov 2022 · updated Feb 2026

YOLOS Fashionpedia is a detection model fine-tuned from YOLOS to detect 46 fashion-related object categories including apparel, accessories, and garment parts.

status

coming soon

API providers

downloads / mo

21.4K

license

mit

specs

Task	Object Detection
Architecture	YOLOS (You Only Look at One Sequence)
License	CC-BY-4.0 (dataset)

about this model

valentinafevu/yolos-fashionpedia is an object detection model fine-tuned on the Fashionpedia dataset for detecting fashion items and accessories. It builds on the YOLOS architecture (You Only Look at One Sequence), a transformer-based approach that achieved state-of-the-art performance on COCO object detection.

The model was trained on Fashionpedia, a dataset of 46,781 images and 342,182 bounding-boxes covering 46 categories. The dataset includes 45,600 training images and 1,160 validation images, licensed under CC-BY-4.0. Its ontology comprises 27 main apparel categories (e.g., dress, jacket, pants), 19 apparel parts (e.g., collar, sleeve, pocket), and 294 fine-grained attributes. Research presented at ECCV 2020 demonstrated that instance segmentation models pre-trained on Fashionpedia achieve better transfer-learning performance on other fashion datasets than ImageNet pre-training.

The model detects categories such as shirt/blouse, top/t-shirt/sweatshirt, sweater, cardigan, jacket, vest, pants, shorts, skirt, coat, dress, jumpsuit, cape, glasses, hat, headband/head covering/hair accessory, tie, glove, watch, belt, leg warmer, tights/stockings, sock, shoe, bag/wallet, scarf, umbrella, hood, collar, lapel, epaulette, sleeve, pocket, neckline, buckle, zipper, applique, bead, bow, flower, fringe, ribbon, rivet, ruffle, sequin, and tassel.

Example detection output from yolos-fashionpedia on a fashion image

best for

·Detecting and localizing fashion items like shirts, pants, shoes, and accessories in images
·Identifying garment parts such as collars, sleeves, pockets, and zippers for detailed fashion analysis

FAQ

What categories does YOLOS Fashionpedia detect?

It detects 46 categories including apparel (e.g., shirt, pants, dress), accessories (e.g., glasses, watch, bag), and garment parts (e.g., collar, sleeve, pocket).

What dataset was this model fine-tuned on?

It was fine-tuned on Fashionpedia, which contains 46,781 images and 342,182 bounding-boxes across 46 categories.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending an image URL or base64-encoded image for detection.

What is the input and output format?

Input is an image; output is a list of detected objects with bounding boxes, class labels, and confidence scores.

Is the model license open?

The dataset is licensed under CC-BY-4.0; the model card does not specify a separate model license.

not yet live

We're benchmarking and onboarding YOLOS Fashionpedia as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →

table-transformer-structure-recognition

1.8M dl/mo

table-transformer-detection

1.5M dl/mo

yolos-small

713.6K dl/mo

PP-DocLayoutV3_safetensors

341.1K dl/mo

rtdetr_v2_r50vd

309.8K dl/mo

rtdetr_r50vd_coco_o365

254.5K dl/mo