LLM2Vec Sheared LLaMA Supervised
McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-supervised
published Apr 2024 · updated Apr 2024
LLM2Vec Sheared LLaMA Supervised is an embed model that converts decoder-only LLMs into text encoders using bidirectional attention and supervised contrastive learning.
specs
| Task | Text Embedding |
| Architecture | Sheared-LLaMA (decoder-only transformer converted to bidirectional encoder) |
| License | MIT |
about this model
LLM2Vec-Sheared-LLaMA-mntp-supervised is an embedding model that transforms a decoder-only large language model into a text encoder. It is part of the LLM2Vec recipe developed by McGill NLP, accepted at COLM 2024.
Approach
The model applies three steps to the base Sheared-LLaMA decoder: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning. The supervised variant adds supervised contrastive fine-tuning on top of the MNTP step, optimizing for embedding quality.
Performance
LLM2Vec achieves state-of-the-art results on the Massive Text Embeddings Benchmark (MTEB) among models trained exclusively on publicly available data (as of May 24, 2024). It also establishes a new unsupervised state-of-the-art on MTEB. Comparisons against encoder-only models show significant gains on word-level tasks.
Architecture
Built on Sheared-LLaMA, the model uses mean pooling and a maximum sequence length of 512 tokens. The project supports additional base architectures including Meta-Llama-3, Gemma, and Qwen-2. The model is released under the MIT license.
best for
- ·Semantic text search and retrieval
- ·Document and query embedding for RAG
- ·Sentence similarity and clustering
FAQ
It accepts text strings. For queries, prepend an instruction; for documents, no instruction is needed. Maximum sequence length is 512 tokens.
Use the OpenAI-compatible endpoint with your API key, sending a POST request with the model name and input text in the standard embedding format.
The model is released under the MIT license.
It achieves state-of-the-art on MTEB among models trained only on publicly available data (as of May 2024) and outperforms encoder-only models on word-level tasks.
The default pooling method is mean pooling, applied to the token representations.
We're benchmarking and onboarding LLM2Vec Sheared LLaMA Supervised as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.