skip to content
gigarouter gigarouter

E5 Omni 7B

Haon-Chen/e5-omni-7B

published Jan 2026 · updated Apr 2026

E5 Omni 7B is a visual-document-retrieval model that produces unified embeddings for text, images, audio, and video, enabling cross-modal retrieval.

status
coming soon
API providers
0
downloads / mo
61.9K
license
mit

specs

TaskVisual Document Retrieval
ArchitectureQwen2.5-Omni-7B
Parameters7B

about this model

e5-omni-7B is an omni-modal embedding model built on Qwen2.5-Omni-7B that produces a unified embedding space for text, images, audio, and video, enabling cross-modal retrieval with a single model.

Key Strengths

Unlike models that rely on implicit alignment from vision-language backbones, e5-omni applies three explicit alignment techniques:

  • modality-aware temperature calibration to align similarity scales across modalities
  • a controllable negative curriculum with debiasing that focuses on confusing negatives while reducing false negative impact
  • batch whitening with covariance regularization to match cross-modal geometry in the shared embedding space

These components address common issues in omni-modal embeddings: inconsistent score sharpness, imbalanced in-batch negative hardness, and mismatched first- and second-order statistics across modalities.

Benchmark Performance

The model achieves strong results on the MMEB-V2 and AudioCaps benchmarks.

Performance comparison on MMEB-V2 benchmark showing radar chart of per-modality scores Performance comparison on AudioCaps benchmark showing bar chart of recall metrics

Full experimental results and comparisons to bi-modal and omni-modal baselines are documented in the associated paper (arXiv:2601.03666). The explicit alignment recipe also transfers to other VLM backbones.

best for

FAQ

What modalities does E5 Omni 7B support?

It supports text, image, audio, and video.

How can I use this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with an API key; refer to gigarouter documentation for details.

What license is this model released under?

The model card does not specify a license.

not yet live

We're benchmarking and onboarding E5 Omni 7B as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related visual document retrieval models

compare all →