STSB RoBERTa Large

cross-encoder/stsb-roberta-large

published Mar 2022 · updated Apr 2025

STSB RoBERTa Large is a cross-encoder model that computes a semantic similarity score between 0 and 1 for a pair of sentences.

est. price

~$0.008

/ 1k docs · estimated, set at launch

API providers

downloads / mo

286.8K

license

apache-2.0

specs

Task	Semantic Textual Similarity / Text Ranking
Architecture	RoBERTa Large (cross-encoder)
Parameters	355 million
License	Apache 2.0
Model Size	~1.42 GB (safetensors)

about this model

cross-encoder/stsb-roberta-large is a cross-encoder model for reranking that predicts a semantic textual similarity score between 0 and 1 for a given pair of sentences.

The model is fine-tuned from FacebookAI/roberta-large (355M parameters, ~1.42 GB in safetensors format) on the STS benchmark dataset (STSbenchmark). It assigns a single scalar similarity score to each input pair, making it suitable for reranking candidate passages or documents by their relevance to a query.

Key Characteristics

Licensed under Apache 2.0.
Supported backends: PyTorch, JAX, ONNX, Safetensors, and OpenVINO.
Quantized version available (isQuantized: true).
Originally created on 2022-03-02 (last modified 2025-04-15).

As a cross-encoder, it directly models the interaction between two inputs, yielding higher accuracy for semantic similarity scoring than bi-encoder approaches, though it is computationally more expensive per pair. The model outputs a score on a continuous 0–1 scale.

best for

·Reranking search results by semantic relevance
·Detecting duplicate or semantically equivalent sentences
·Paraphrase identification and scoring

FAQ

What is this model best used for?

Reranking sentence pairs for semantic similarity, e.g., for information retrieval or duplicate detection.

How does this model compare to smaller cross-encoders?

It uses the RoBERTa Large architecture with 355M parameters, offering higher accuracy than base variants but requiring more compute and memory.

What license is the model under?

Apache 2.0.

What input format does the model expect?

The model expects two sentences as a pair and outputs a similarity score between 0 and 1.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your gigarouter API key; send a request with the model name and the sentence pair.

not yet live

We're benchmarking and onboarding STSB RoBERTa Large as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related reranker models

compare all →

ms-marco-MiniLM-L6-v2

81.5M dl/mo · live

ms-marco-MiniLM-L4-v2

4.8M dl/mo

gte-reranker-modernbert-base

2.7M dl/mo

ms-marco-MiniLM-L12-v2

2.3M dl/mo

jina-reranker-v2-base-multilingual

1.8M dl/mo · live

Qwen3-Reranker-4B

1.8M dl/mo