STSB DistilRoBERTa Base

cross-encoder/stsb-distilroberta-base

published Mar 2022 · updated Apr 2025

STSB DistilRoBERTa Base is a cross-encoder rerank model that predicts a semantic similarity score between 0 and 1 for pairs of sentences, trained on the STS benchmark dataset.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

95.4K

license

apache-2.0

specs

Task	text-ranking
Architecture	Cross-Encoder (DistilRoBERTa)
Parameters	82M
License	Apache 2.0
Language	English

about this model

cross-encoder/stsb-distilroberta-base is a text-ranking cross-encoder model that computes a semantic similarity score between 0 and 1 for pairs of sentences. It is hosted on gigarouter as a managed, OpenAI-compatible API for reranking tasks.

Model architecture and training

Built on the distilroberta-base backbone (82 million parameters), the model was fine-tuned on the SentenceTransformers STS benchmark dataset (sentence-transformers/stsb). It is English-only and licensed under Apache 2.0.

Performance and use

The model accepts two sentences and returns a single similarity score, making it suitable for semantic reranking pipelines where candidate passages must be scored against a query. Its output is calibrated for the STS benchmark, a widely used evaluation for semantic textual similarity.

With over 4.5 million total downloads and support for PyTorch, JAX, ONNX, and OpenVINO, the model has been validated across environments. ONNX and quantized variants are available, enabling efficient deployment.

As a cross-encoder, it directly compares both sentences through a transformer, producing more accurate relevance scores than bi-encoder alternatives at the cost of higher latency per pair. Gigarouter hosts the model in an optimized inference stack, so developers benefit from low-latency API calls without managing infrastructure.

best for

·Semantic similarity scoring of sentence pairs
·Reranking search results by relevance
·Duplicate question detection
·Paraphrase identification

FAQ

What is the output range of this model?

The model outputs a score between 0 and 1, where higher indicates greater semantic similarity.

What input format does the model expect?

It accepts pairs of two sentences as input, e.g., ("Sentence 1", "Sentence 2").

Is the model suitable for languages other than English?

No, it was trained on the English STS benchmark dataset and is intended for English text only.

What is the license for this model?

The model is released under the Apache 2.0 license.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key; request format follows the rerank API pattern.

not yet live

We're benchmarking and onboarding STSB DistilRoBERTa Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

ms-marco-MiniLM-L4-v2

4.8M dl/mo

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo