hosted inference · 0-provider models
The most-downloaded models nobody serves — now an API.
Rerankers and embedders that everyone self-hosts because no provider offers them. Priced in the unit that fits — per document, per token — and OpenAI-compatible, so you swap a URL and go.
30-second start
# rerank documents against a query — no GPU, no self-hosting curl https://gigarouter.ai/v1/rerank \ -H "Authorization: Bearer $GR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2","query":"capital of France", "documents":["Paris is the capital of France.","Bananas are yellow."]}'
live catalog
all models →cross-encoder/ms-marco-MiniLM-L6-v2
reranker
$0.008 / 1k docs
jinaai/jina-reranker-v2-base-multilingual
reranker
$0.008 / 1k docs
Qwen/Qwen3-Reranker-0.6B
reranker
$0.008 / 1k docs
Qwen/Qwen3-Embedding-0.6B
embeddings
$0.008 / 1M tokens
BAAI/bge-reranker-base
reranker
$0.008 / 1k docs
Only seller
These models have zero hosted providers — the ones with millions of downloads and no API. We serve them so you don't run a GPU for a call you make a hundred times a day.
Right unit
Rerank bills per document, not per awkward token. Embeddings per token. You pay for what the task actually is.
Drop-in
OpenAI-style endpoints. Point your existing client at the base URL, keep your code.