skip to content
gigarouter gigarouter
models / object detection · coming soon

YOLOE

jameslahm/yoloe

published Mar 2025 · updated Mar 2025

YOLOE is a real-time open-vocabulary object detection and segmentation model that supports text, visual, and prompt-free prompts.

status
coming soon
API providers
0
downloads / mo
4.3K
license
agpl-3.0

specs

TaskObject Detection and Segmentation
ArchitectureYOLOv8 / YOLO11 with RepRTA, SAVPE, LRPC
Parameters10M to 50M (depending on variant)
LicenseAGPL-3.0

about this model

jameslahm/yoloe is a real-time object detection and segmentation model that supports text, visual, and prompt-free paradigms within a single unified architecture. It is hosted on Gigarouter as an OpenAI-compatible API.

Key capabilities

  • RepRTA (Re-parameterizable Region-Text Alignment) for text prompts: refines textual embeddings via a lightweight auxiliary network that can be re-parameterized, adding zero inference and transfer overhead.
  • SAVPE (Semantic-Activated Visual Prompt Encoder) for visual prompts: decouples semantic and activation branches to improve visual embedding accuracy with minimal complexity.
  • LRPC (Lazy Region-Prompt Contrast) for prompt-free detection: uses a built-in large vocabulary and specialized embedding to identify all objects without costly language model dependency.

Benchmark highlights

On the LVIS minival set (zero-shot detection):

ModelSizeAP (text)AP (visual)
YOLOE-v8-S64027.926.2
YOLOE-v8-M64032.631.0
YOLOE-v8-L64035.934.2

YOLOE-v8-S surpasses YOLO-Worldv2-S by 3.5 AP on LVIS, with 3× less training cost and 1.4× inference speedup (T4 TensorRT: 305.8 FPS). On COCO downstream transfer (full tuning, 80 epochs), YOLOE-v8-L achieves 53.0 AP and 42.7 AP, outperforming closed-set YOLOv8-L by +0.6 AP and +0.4 AP with nearly 4× less training time. Prompt-free evaluation on LVIS yields up to 27.2 AP (YOLOE-v8-L) at 25.3 FPS (T4 PyTorch).

Comparison of performance, training cost, and inference efficiency between YOLOE and YOLO-Worldv2 in terms of open text prompts.

The model is licensed under AGPL-3.0. The underlying research has been accepted at ICCV 2025. No installation or local setup is required – the model is accessed through Gigarouter's API endpoint.

best for

FAQ

What prompt types does YOLOE support?

YOLOE supports text prompts, visual prompts (e.g., an image patch), and a prompt-free mode where it detects all objects using a built-in vocabulary.

How does YOLOE compare to YOLO-World in terms of speed and accuracy?

YOLOE achieves higher accuracy with less training cost and faster inference; e.g., YOLOE-v8-S surpasses YOLO-Worldv2-S by 3.5 AP on LVIS with 3x less training cost and 1.4x inference speedup.

What is the input format for the hosted API on gigarouter?

Input is an image file or URL along with optional text or visual prompts. The API endpoint is OpenAI-compatible; use an API key for authentication.

What license governs use of YOLOE?

The model is released under the AGPL-3.0 license.

Can YOLOE be used for both detection and segmentation?

Yes, YOLOE jointly outputs bounding boxes and instance masks, supporting both tasks in a single model.

not yet live

We're benchmarking and onboarding YOLOE as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related object detection models

compare all →