LLM2Vec Sheared LLaMA Supervised

McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-supervised

published Apr 2024 · updated Apr 2024

LLM2Vec Sheared LLaMA Supervised is an embed model that converts decoder-only LLMs into text encoders using bidirectional attention and supervised contrastive learning.

status

coming soon

API providers

downloads / mo

440

license

mit

specs

Task	Text Embedding
Architecture	Sheared-LLaMA (decoder-only transformer converted to bidirectional encoder)
License	MIT

about this model

LLM2Vec-Sheared-LLaMA-mntp-supervised is an embedding model that transforms a decoder-only large language model into a text encoder. It is part of the LLM2Vec recipe developed by McGill NLP, accepted at COLM 2024.

Approach

The model applies three steps to the base Sheared-LLaMA decoder: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning. The supervised variant adds supervised contrastive fine-tuning on top of the MNTP step, optimizing for embedding quality.

Performance

LLM2Vec achieves state-of-the-art results on the Massive Text Embeddings Benchmark (MTEB) among models trained exclusively on publicly available data (as of May 24, 2024). It also establishes a new unsupervised state-of-the-art on MTEB. Comparisons against encoder-only models show significant gains on word-level tasks.

Architecture

Built on Sheared-LLaMA, the model uses mean pooling and a maximum sequence length of 512 tokens. The project supports additional base architectures including Meta-Llama-3, Gemma, and Qwen-2. The model is released under the MIT license.

best for

·Semantic text search and retrieval
·Document and query embedding for RAG
·Sentence similarity and clustering

FAQ

What input format does LLM2Vec Sheared LLaMA Supervised accept?

It accepts text strings. For queries, prepend an instruction; for documents, no instruction is needed. Maximum sequence length is 512 tokens.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending a POST request with the model name and input text in the standard embedding format.

What is the license for this model?

The model is released under the MIT license.

How does LLM2Vec compare to other embedding models?

It achieves state-of-the-art on MTEB among models trained only on publicly available data (as of May 2024) and outperforms encoder-only models on word-level tasks.

What pooling strategy does the model use by default?

The default pooling method is mean pooling, applied to the token representations.

not yet live

We're benchmarking and onboarding LLM2Vec Sheared LLaMA Supervised as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5