STSB RoBERTa Base

cross-encoder/stsb-roberta-base

published Mar 2022 · updated Apr 2025

STSB RoBERTa Base is a cross-encoder rerank model that scores the semantic similarity between sentence pairs, trained on the STS benchmark dataset.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

182.5K

license

apache-2.0

specs

Task	Text Ranking
Architecture	RoBERTa Base
Parameters	124.6M
License	Apache-2.0
Language	English

about this model

cross-encoder/stsb-roberta-base is a cross-encoder reranking model that computes a semantic similarity score between two sentences, outputting a value between 0 and 1. It is built on the FacebookAI/roberta-base architecture and was trained on the STS benchmark dataset (sentence-transformers/stsb). The model is designed to evaluate the degree of semantic equivalence between a query and a candidate document, making it well suited for reranking tasks in search and retrieval pipelines.

Key Strengths

High precision on semantic similarity: Directly optimized for the STS benchmark, the model provides calibrated similarity scores that align closely with human judgment.
Efficient cross-encoder: Processes pairs of texts jointly, allowing deep interaction between the two inputs for more accurate relevance assessment than bi-encoder alternatives.
Proven adoption: Over 5.3 million downloads on Hugging Face (as of April 2025) and compatibility with PyTorch, JAX, ONNX, Safetensors, and OpenVINO frameworks.

Model Details

Attribute	Value
Parameters	124,646,915
Model file size	~498.6 MB
Language	English
License	Apache-2.0
Created	2022-03-02

Usage via API

gigarouter hosts cross-encoder/stsb-roberta-base as a managed, OpenAI-compatible API endpoint. Submit pairs of texts and receive a similarity score – no local installation or model loading required. The model is also available in a quantized version for lower latency.

best for

·Reranking search results by semantic relevance
·Duplicate detection in text pairs
·Semantic similarity scoring for question-answering pairs

FAQ

What is STSB RoBERTa Base best used for?

It is best used for semantic textual similarity tasks such as reranking search results or identifying duplicate sentences based on a similarity score from 0 to 1.

How many parameters does the model have?

It has approximately 124.6 million parameters.

What license is this model released under?

It is released under the Apache-2.0 license.

What input format does the model expect?

The model expects sentence pairs as input, each pair as two strings. It outputs a similarity score between 0 and 1.

How can I call this model via the gigarouter API?

Use the gigarouter OpenAI-compatible endpoint with your API key, sending the sentence pairs in the request body.

not yet live

We're benchmarking and onboarding STSB RoBERTa Base as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

ms-marco-MiniLM-L4-v2

4.8M dl/mo

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo