STSB DistilRoBERTa Base
cross-encoder/stsb-distilroberta-base
published Mar 2022 · updated Apr 2025
STSB DistilRoBERTa Base is a cross-encoder rerank model that predicts a semantic similarity score between 0 and 1 for pairs of sentences, trained on the STS benchmark dataset.
specs
| Task | text-ranking |
| Architecture | Cross-Encoder (DistilRoBERTa) |
| Parameters | 82M |
| License | Apache 2.0 |
| Language | English |
about this model
cross-encoder/stsb-distilroberta-base is a text-ranking cross-encoder model that computes a semantic similarity score between 0 and 1 for pairs of sentences. It is hosted on gigarouter as a managed, OpenAI-compatible API for reranking tasks.
Model architecture and training
Built on the distilroberta-base backbone (82 million parameters), the model was fine-tuned on the SentenceTransformers STS benchmark dataset (sentence-transformers/stsb). It is English-only and licensed under Apache 2.0.
Performance and use
The model accepts two sentences and returns a single similarity score, making it suitable for semantic reranking pipelines where candidate passages must be scored against a query. Its output is calibrated for the STS benchmark, a widely used evaluation for semantic textual similarity.
With over 4.5 million total downloads and support for PyTorch, JAX, ONNX, and OpenVINO, the model has been validated across environments. ONNX and quantized variants are available, enabling efficient deployment.
As a cross-encoder, it directly compares both sentences through a transformer, producing more accurate relevance scores than bi-encoder alternatives at the cost of higher latency per pair. Gigarouter hosts the model in an optimized inference stack, so developers benefit from low-latency API calls without managing infrastructure.
best for
- ·Semantic similarity scoring of sentence pairs
- ·Reranking search results by relevance
- ·Duplicate question detection
- ·Paraphrase identification
FAQ
The model outputs a score between 0 and 1, where higher indicates greater semantic similarity.
It accepts pairs of two sentences as input, e.g., ("Sentence 1", "Sentence 2").
No, it was trained on the English STS benchmark dataset and is intended for English text only.
The model is released under the Apache 2.0 license.
Use the gigarouter OpenAI-compatible endpoint with your API key; request format follows the rerank API pattern.
We're benchmarking and onboarding STSB DistilRoBERTa Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.