documentation

Docs

Base URL https://gigarouter.ai/v1. Authenticate with Authorization: Bearer <key>. Get a key (free credit included).

Reranking and embeddings are live below. Grounding, detection, OCR, and the rest of the specialist catalog are being benchmarked and onboarded — see the models page, and tell us what you need at [email protected].

machine-readable spec: /v1/openapi.json (OpenAPI 3.0 — paste it into your tooling or hand it to a coding agent)

Rerank (curl)

# scores documents by relevance to the query; billed per document
curl https://gigarouter.ai/v1/rerank \
  -H "Authorization: Bearer $GR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2",
       "query":"how do I reset my password",
       "documents":["Password reset steps...","Billing FAQ..."]}'

Rerank (Python)

# plain requests - no SDK needed
import requests
r = requests.post("https://gigarouter.ai/v1/rerank",
  headers={"Authorization": f"Bearer {KEY}"},
  json={"model": "cross-encoder/ms-marco-MiniLM-L6-v2",
        "query": query,
        "documents": docs})
for hit in r.json()["results"]:
    print(hit["index"], hit["relevance_score"])

Embeddings (OpenAI client)

# the OpenAI SDK works - just change base_url
from openai import OpenAI
client = OpenAI(base_url="https://gigarouter.ai/v1", api_key=KEY)
v = client.embeddings.create(
  model="Qwen/Qwen3-Embedding-0.6B", input=["hello world"])
print(v.data[0].embedding[:4])

endpoints

method	path	what
POST	/v1/rerank	score documents against a query, billed per document
POST	/v1/embeddings	embed text (OpenAI-compatible), billed per token
GET	/v1/models	list available models with live pricing

headers

header	effect
Authorization: Bearer <key>	required. your gr- key.
X-Tier: batch	bill the discounted flexible tier instead of realtime (see pricing).
Prefer: wait=30	if the model is cold, hold the request up to N seconds for it to warm, instead of a 503.

status codes

code	meaning
200	success. usage + charge are in the response and your dashboard.
401	missing or unknown API key.
402	out of credit. add credit.
404	model not found or not currently served.
429	rate or daily-budget limit. back off and retry (Retry-After).
503	model is warming (cold start ~90s). retry, or send Prefer: wait=30.

billing

Prepaid — you're only charged per call; failed requests cost nothing. See your balance and per-call usage on the dashboard.
Units — rerank is billed per document, embeddings per 1M tokens. Each model's rate is on its page.
Top up — self-serve via card on the dashboard (Stripe). Credit never expires.
Cold start — rarely-used models sleep to keep costs down; the first call warms them (~90s). Popular models stay warm.