OneAlign

q-future/one-align

published Dec 2023 · updated May 2024

OneAlign is a zero-shot-image model that unifies image quality assessment, image aesthetic assessment, and video quality assessment into a single large multi-modality model using discrete text-defined rating levels.

status

coming soon

API providers

downloads / mo

188.9K

license

mit

specs

Task	Image Quality Assessment, Image Aesthetic Assessment, Video Quality Assessment
Architecture	mPLUG-Owl2-based LMM
Parameters	Not specified in card
License	LLaMA-2 license (for commercial use)

about this model

OneAlign is a large multi-modality model for zero-shot visual scoring that unifies image quality assessment (IQA), image aesthetic assessment (IAA), and video quality assessment (VQA) using a discrete-level-based syllabus. Developed by Nanyang Technological University, Shanghai Jiao Tong University, and Sensetime Research, the model is built on the mPLUG-Owl2 architecture and trained on a combination of datasets including KonIQ, SPAQ, KADID, AVA, and LSVQ. Instead of regressing direct scores, it emulates human subjective studies by teaching large multi-modality models (LMMs) with text-defined rating levels, achieving state-of-the-art or competitive performance across all three tasks.

The model scores images and videos on a 1–5 scale. Below is an example input image and the syllabus illustration used during training:

Example image used for a scoring demonstration with the model
Syllabus diagram showing the discrete-level-based training approach that maps text-defined levels to scores

Benchmark Performance

OneAlign achieves leading results on multiple IQA, IAA, and VQA benchmarks. The following table compares its performance against previous state-of-the-art methods across seven IQA datasets, using Spearman/Pearson/Kendall correlations:

Dataset	KonIQ (NR-IQA, seen)	SPAQ (NR-IQA, seen)	KADID (FR-IQA, seen)	LIVE-C (NR-IQA, unseen)	LIVE (FR-IQA, unseen)	CSIQ (FR-IQA, unseen)	AGIQA (AIGC, unseen)
Previous SOTA	0.916/0.928 (MUSIQ)	0.922/0.919 (LIQE)	0.934/0.937 (CONTRIQUE)	NA	NA	NA	NA
OneAlign	0.941/0.950/0.791	0.932/0.935/0.766	0.941/0.942/0.791	0.881/0.894/0.699	0.887/0.856/0.699	0.881/0.906/0.699	0.801/0.838/0.602

On the AVA_test aesthetic benchmark, OneAlign achieves a Spearman correlation of 0.823 and Pearson of 0.819, surpassing prior methods. For video quality assessment, it sets new state-of-the-art on LSVQ_test (0.886/0.886), LSVQ_1080p (0.803/0.837), KoNViD-1k (0.876/0.888), and MaxWell_test (0.781/0.786).

Users must comply with the LLaMA-2 license when using this model commercially. The model is hosted as a managed API on gigarouter, requiring no local installation.

best for

·Scoring image quality on a 1-5 scale
·Assessing image aesthetics
·Evaluating video quality

FAQ

What tasks does OneAlign support?

It supports image quality assessment (IQA), image aesthetic assessment (IAA), and video quality assessment (VQA) in a single model.

How do I call OneAlign via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, passing an image or video input and specifying the task (quality or aesthetics).

What is the output format?

The model outputs a score in the range [1, 5], where higher is better.

What license applies to OneAlign?

You must comply with LLaMA-2 licenses if using the checkpoints commercially.

How does OneAlign compare to Q-Align?

OneAlign unifies IQA, IAA, and VQA into one model, achieving state-of-the-art results across multiple benchmarks.

not yet live

We're benchmarking and onboarding OneAlign as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related zero-shot image models

compare all →

clip-vit-base-patch32

22.3M dl/mo

clip-vit-large-patch14

12.4M dl/mo

CLIP-ViT-B-32-laion2B-s34B-b79K

4M dl/mo

clip-vit-large-patch14-336