documentation
Docs
Base URL https://gigarouter.ai/v1. Authenticate with Authorization: Bearer <key>. Get a key (free credit included).
Reranking and embeddings are live below. Grounding, detection, OCR, and the rest of the specialist catalog are being benchmarked and onboarded — see the models page, and tell us what you need at [email protected].
machine-readable spec: /v1/openapi.json (OpenAPI 3.0 — paste it into your tooling or hand it to a coding agent)
Rerank (curl)
# scores documents by relevance to the query; billed per document curl https://gigarouter.ai/v1/rerank \ -H "Authorization: Bearer $GR_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"cross-encoder/ms-marco-MiniLM-L6-v2", "query":"how do I reset my password", "documents":["Password reset steps...","Billing FAQ..."]}'
Rerank (Python)
# plain requests - no SDK needed import requests r = requests.post("https://gigarouter.ai/v1/rerank", headers={"Authorization": f"Bearer {KEY}"}, json={"model": "cross-encoder/ms-marco-MiniLM-L6-v2", "query": query, "documents": docs}) for hit in r.json()["results"]: print(hit["index"], hit["relevance_score"])
Embeddings (OpenAI client)
# the OpenAI SDK works - just change base_url from openai import OpenAI client = OpenAI(base_url="https://gigarouter.ai/v1", api_key=KEY) v = client.embeddings.create( model="Qwen/Qwen3-Embedding-0.6B", input=["hello world"]) print(v.data[0].embedding[:4])
endpoints
| method | path | what |
|---|---|---|
| POST | /v1/rerank | score documents against a query, billed per document |
| POST | /v1/embeddings | embed text (OpenAI-compatible), billed per token |
| GET | /v1/models | list available models with live pricing |
headers
| header | effect |
|---|---|
| Authorization: Bearer <key> | required. your gr- key. |
| X-Tier: batch | bill the discounted flexible tier instead of realtime (see pricing). |
| Prefer: wait=30 | if the model is cold, hold the request up to N seconds for it to warm, instead of a 503. |
status codes
| code | meaning |
|---|---|
| 200 | success. usage + charge are in the response and your dashboard. |
| 401 | missing or unknown API key. |
| 402 | out of credit. add credit. |
| 404 | model not found or not currently served. |
| 429 | rate or daily-budget limit. back off and retry (Retry-After). |
| 503 | model is warming (cold start ~90s). retry, or send Prefer: wait=30. |
billing
- Prepaid — you're only charged per call; failed requests cost nothing. See your balance and per-call usage on the dashboard.
- Units — rerank is billed per document, embeddings per 1M tokens. Each model's rate is on its page.
- Top up — self-serve via card on the dashboard (Stripe). Credit never expires.
- Cold start — rarely-used models sleep to keep costs down; the first call warms them (~90s). Popular models stay warm.