models / zero-shot image · coming soon

PickScore v1

yuvalkirstain/PickScore_v1

published Apr 2023 · updated May 2023

PickScore v1 is a zero-shot-image model that scores images generated from text based on human preference predictions.

est. price

~$0.235

/ 1k images · estimated, set at launch

API providers

downloads / mo

3.2M

specs

Task	Zero-shot image scoring for human preference prediction
Architecture	CLIP-H (ViT-H-14) fine-tuned on Pick-a-Pic dataset
Training Data	Pick-a-Pic v1 dataset
Paper	Pick-a-Pic (arXiv:2305.01569)

about this model

PickScore v1 is a zero-shot image scoring model that evaluates the alignment between a text prompt and a generated image, outputting a score that reflects how well the image matches the prompt. It was fine-tuned from CLIP-H (ViT-H-14) on the Pick-a-Pic dataset, a large open dataset of real user preferences for text-to-image generation. The model acts as a general scoring function for tasks such as human preference prediction, model evaluation, and image ranking.

Performance

PickScore exhibits superhuman performance on the task of predicting human preferences for generated images. According to the Pick-a-Pic paper, it correlates better with human rankings than other automatic evaluation metrics, making it a reliable tool for assessing text-to-image generation models without requiring human raters. The model is recommended for evaluating future text-to-image models and can be used to enhance existing models via ranking.

How It Works

The model takes a text prompt and one or more images as input. It computes embeddings for both the text and each image using a shared CLIP-H backbone, normalizes them, and calculates a score via the dot product scaled by the learned logit scale. When multiple images are supplied, softmax can be applied to obtain relative preference probabilities.

Training Data

PickScore was trained on the Pick-a-Pic dataset v1, which contains prompts and real user preferences over generated images collected through a dedicated web application. The dataset is publicly available.

Additional Resources

best for

·Predicting human preferences for generated images
·Ranking multiple generated images from a text prompt
·Evaluating text-to-image generation models
·Enhancing text-to-image models via reranking

FAQ

What is PickScore v1 best used for?

It is a scoring function for images generated from text, used for human preference prediction, image ranking, and model evaluation.

How does PickScore v1 compare to other evaluation metrics?

It correlates better with human rankings than other automatic metrics, as shown in the Pick-a-Pic paper.

What are the input and output formats?

Input: a text prompt and one or more images. Output: scores (logits) and probabilities for each image.

How can I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key. Refer to the gigarouter documentation for endpoint details.

Under what license is PickScore v1 released?

The model card does not specify a license; check the repository or paper for details.

not yet live

We're benchmarking and onboarding PickScore v1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related zero-shot image models

compare all →

clip-vit-base-patch32

22.3M dl/mo

clip-vit-large-patch14

12.4M dl/mo

CLIP-ViT-B-32-laion2B-s34B-b79K

4M dl/mo

clip-vit-large-patch14-336

3.4M dl/mo

fashion-clip

2.9M dl/mo

siglip-so400m-patch14-384

1.8M dl/mo