models / visual document retrieval · coming soon

OmniEmbed v0.1

Tevatron/OmniEmbed-v0.1

published Apr 2025 · updated Apr 2026

OmniEmbed v0.1 is a multi-modal embedding model that unifies text, image, video, and audio retrieval, built on Qwen2.5-Omni-7B.

status

coming soon

API providers

downloads / mo

381

license

mit

specs

Task	Visual-Document-Retrieval (Multi-modal Retrieval)
Architecture	Qwen2.5-Omni-7B
Parameters	7B
License	Apache-2.0

about this model

OmniEmbed is a multimodal embedding model for visual-document-retrieval, built on Qwen2.5-Omni-7B via the Tevatron toolkit. It generates unified embeddings across multilingual text, images, audio, and video, enabling cross-modal retrieval. The model is described as the first embedding model to unify all four modalities and has been accepted at SIGIR 2025 (Demo track).

OmniEmbed achieves strong results on standard benchmarks, performing comparably to models optimized for individual tasks:

Benchmark	Task	Metric	OmniEmbed	Baseline
BEIR-13	Text Retrieval	nDCG@10	58.2	MistralE5 (59.0)
MIRACL	Multilingual Retrieval	nDCG@10	69.1	BGE‑M3 (69.2)
VIDORE	Image Document Retrieval	nDCG@5	85.8	DSE‑QWen2 (85.8)
MSRVTT	Video Retrieval	R@1	51.3	CLIP (31.2)
AudioCaps	Audio Retrieval	R@1	34.0	23.1

The underlying model is a LoRA fine-tuned version of Qwen2.5-Omni-7B. Training data and code are fully open-source. As a hosted API on gigarouter, OmniEmbed requires no local installation—simply call the OpenAI-compatible endpoint to embed queries and documents of any supported modality.

best for

·Cross-modal search across text, images, audio, and video
·Multilingual text retrieval
·Image document retrieval (charts, PDFs, screenshots)
·Video retrieval for tutorials or instructional content

FAQ

What modalities does OmniEmbed support?

It supports text, image, audio, video, and unified multimodal inputs (e.g., text+video).

How does OmniEmbed compare to CLIP on video retrieval?

It achieves 51.3 R@1 on MSRVTT, significantly outperforming CLIP (31.2).

What is the base model and size?

It is built on Qwen2.5-Omni-7B with 7 billion parameters.

What license is OmniEmbed released under?

The Tevatron toolkit is Apache-2.0; OmniEmbed follows the same license.

How can I call it via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, following standard embedding API conventions.

not yet live

We're benchmarking and onboarding OmniEmbed v0.1 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

OmniEmbed v0.1

specs

about this model

best for

FAQ

related visual document retrieval models