STSB RoBERTa Base
cross-encoder/stsb-roberta-base
published Mar 2022 · updated Apr 2025
STSB RoBERTa Base is a cross-encoder rerank model that scores the semantic similarity between sentence pairs, trained on the STS benchmark dataset.
specs
| Task | Text Ranking |
| Architecture | RoBERTa Base |
| Parameters | 124.6M |
| License | Apache-2.0 |
| Language | English |
about this model
cross-encoder/stsb-roberta-base is a cross-encoder reranking model that computes a semantic similarity score between two sentences, outputting a value between 0 and 1. It is built on the FacebookAI/roberta-base architecture and was trained on the STS benchmark dataset (sentence-transformers/stsb). The model is designed to evaluate the degree of semantic equivalence between a query and a candidate document, making it well suited for reranking tasks in search and retrieval pipelines.
Key Strengths
- High precision on semantic similarity: Directly optimized for the STS benchmark, the model provides calibrated similarity scores that align closely with human judgment.
- Efficient cross-encoder: Processes pairs of texts jointly, allowing deep interaction between the two inputs for more accurate relevance assessment than bi-encoder alternatives.
- Proven adoption: Over 5.3 million downloads on Hugging Face (as of April 2025) and compatibility with PyTorch, JAX, ONNX, Safetensors, and OpenVINO frameworks.
Model Details
| Attribute | Value |
|---|---|
| Parameters | 124,646,915 |
| Model file size | ~498.6 MB |
| Language | English |
| License | Apache-2.0 |
| Created | 2022-03-02 |
Usage via API
gigarouter hosts cross-encoder/stsb-roberta-base as a managed, OpenAI-compatible API endpoint. Submit pairs of texts and receive a similarity score – no local installation or model loading required. The model is also available in a quantized version for lower latency.
best for
- ·Reranking search results by semantic relevance
- ·Duplicate detection in text pairs
- ·Semantic similarity scoring for question-answering pairs
FAQ
It is best used for semantic textual similarity tasks such as reranking search results or identifying duplicate sentences based on a similarity score from 0 to 1.
It has approximately 124.6 million parameters.
It is released under the Apache-2.0 license.
The model expects sentence pairs as input, each pair as two strings. It outputs a similarity score between 0 and 1.
Use the gigarouter OpenAI-compatible endpoint with your API key, sending the sentence pairs in the request body.
We're benchmarking and onboarding STSB RoBERTa Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.