skip to content
gigarouter gigarouter
models / embeddings · coming soon

LENS-d8000

yibinlei/LENS-d8000

published Dec 2024 · updated Jan 2025

LENS-d8000 is a lexicon-based text embedding model that produces 8000-dimensional representations where each dimension corresponds to a token cluster.

est. price
~$0.008
/ 1M tokens · estimated, set at launch
API providers
0
downloads / mo
108
license
apache-2.0

specs

TaskText Embedding
ArchitectureMistral-based bidirectional
LicenseApache-2.0

about this model

LENS-d8000 is a text embedding model that produces 8000-dimensional lexicon-based embeddings (LENS) where each dimension corresponds to a token cluster of semantically similar tokens, leveraging large language models. The model consolidates the vocabulary space of LLMs through token embedding clustering to address token redundancy, enabling compact representations with dimensionality comparable to dense embeddings.

Key Strengths

  • Outperforms dense embeddings on the Massive Text Embedding Benchmark (MTEB) while delivering similar feature sizes.
  • Supports efficient embedding dimension pruning without specialized objectives such as Matryoshka Representation Learning.
  • When combined with dense embeddings, achieves state-of-the-art performance on the retrieval subset of MTEB (BEIR).
  • Uses bidirectional attention and max-pooling over query tokens to produce high-quality embeddings for retrieval and classification tasks.

Benchmark Results

TaskMetricScore
AmazonCounterfactualClassificationAccuracy93.69
AmazonPolarityClassificationAccuracy97.07
AmazonReviewsClassificationAccuracy63.61
ArguAna RetrievalMAP@1069.89
ArguAna RetrievalMRR@1070.23

Additional MTEB results are available in the model's paper and Hugging Face model-index.

Model Details

best for

FAQ

What is LENS-d8000 used for?

It produces 8000-dimensional text embeddings for tasks like retrieval, clustering, and classification, with each dimension aligned to a token cluster.

How does LENS-d8000 compare to dense embeddings?

It achieves competitive performance on MTEB while being lexicon-based, and can be combined with dense embeddings for state-of-the-art results on BEIR.

What is the license for LENS-d8000?

Apache-2.0.

How can I use LENS-d8000 via the gigarouter API?

Send requests to the OpenAI-compatible endpoint with your API key, using the model name "yibinlei/LENS-d8000".

What is the input format for this model?

Tokenized text with a special <instruct> and <query> format; instructions can be provided for task-specific prompts.

not yet live

We're benchmarking and onboarding LENS-d8000 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →