skip to content
gigarouter gigarouter
models / zero-shot image · coming soon

OneAlign

q-future/one-align

published Dec 2023 · updated May 2024

OneAlign is a zero-shot-image model that unifies image quality assessment, image aesthetic assessment, and video quality assessment into a single large multi-modality model using discrete text-defined rating levels.

status
coming soon
API providers
0
downloads / mo
188.9K
license
mit

specs

TaskImage Quality Assessment, Image Aesthetic Assessment, Video Quality Assessment
ArchitecturemPLUG-Owl2-based LMM
ParametersNot specified in card
LicenseLLaMA-2 license (for commercial use)

about this model

OneAlign is a large multi-modality model for zero-shot visual scoring that unifies image quality assessment (IQA), image aesthetic assessment (IAA), and video quality assessment (VQA) using a discrete-level-based syllabus. Developed by Nanyang Technological University, Shanghai Jiao Tong University, and Sensetime Research, the model is built on the mPLUG-Owl2 architecture and trained on a combination of datasets including KonIQ, SPAQ, KADID, AVA, and LSVQ. Instead of regressing direct scores, it emulates human subjective studies by teaching large multi-modality models (LMMs) with text-defined rating levels, achieving state-of-the-art or competitive performance across all three tasks.

The model scores images and videos on a 1–5 scale. Below is an example input image and the syllabus illustration used during training:

Example image used for a scoring demonstration with the model
Syllabus diagram showing the discrete-level-based training approach that maps text-defined levels to scores

Benchmark Performance

OneAlign achieves leading results on multiple IQA, IAA, and VQA benchmarks. The following table compares its performance against previous state-of-the-art methods across seven IQA datasets, using Spearman/Pearson/Kendall correlations:

DatasetKonIQ (NR-IQA, seen)SPAQ (NR-IQA, seen)KADID (FR-IQA, seen)LIVE-C (NR-IQA, unseen)LIVE (FR-IQA, unseen)CSIQ (FR-IQA, unseen)AGIQA (AIGC, unseen)
Previous SOTA0.916/0.928 (MUSIQ)0.922/0.919 (LIQE)0.934/0.937 (CONTRIQUE)NANANANA
OneAlign0.941/0.950/0.7910.932/0.935/0.7660.941/0.942/0.7910.881/0.894/0.6990.887/0.856/0.6990.881/0.906/0.6990.801/0.838/0.602

On the AVA_test aesthetic benchmark, OneAlign achieves a Spearman correlation of 0.823 and Pearson of 0.819, surpassing prior methods. For video quality assessment, it sets new state-of-the-art on LSVQ_test (0.886/0.886), LSVQ_1080p (0.803/0.837), KoNViD-1k (0.876/0.888), and MaxWell_test (0.781/0.786).

Users must comply with the LLaMA-2 license when using this model commercially. The model is hosted as a managed API on gigarouter, requiring no local installation.

best for

FAQ

What tasks does OneAlign support?

It supports image quality assessment (IQA), image aesthetic assessment (IAA), and video quality assessment (VQA) in a single model.

How do I call OneAlign via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, passing an image or video input and specifying the task (quality or aesthetics).

What is the output format?

The model outputs a score in the range [1, 5], where higher is better.

What license applies to OneAlign?

You must comply with LLaMA-2 licenses if using the checkpoints commercially.

How does OneAlign compare to Q-Align?

OneAlign unifies IQA, IAA, and VQA into one model, achieving state-of-the-art results across multiple benchmarks.

not yet live

We're benchmarking and onboarding OneAlign as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related zero-shot image models

compare all →