LLM2Vec Llama 2 7B Chat (Supervised)
McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised
published Apr 2024 · updated Apr 2024
LLM2Vec Llama 2 7B Chat (Supervised) is a text embedding model that converts a decoder-only LLM into a powerful text encoder using bidirectional attention, masked next token prediction, and supervised contrastive learning.
specs
| Task | Embedding |
| Architecture | Decoder-only LLM with bidirectional attention (LLM2Vec) + LoRA |
| Parameters | 7B |
| License | MIT |
| Max Sequence Length | 512 tokens |
| Pooling | Mean |
about this model
McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised is a text embedding model that converts the Llama-2-7b-chat decoder-only language model into a bidirectional text encoder using the LLM2Vec recipe, followed by supervised contrastive fine-tuning on publicly available data.
Method
The LLM2Vec approach (BehnamGhader et al., COLM 2024) transforms any decoder-only LLM into a universal text encoder through three steps: enabling bidirectional attention, training with masked next token prediction (MNTP), and unsupervised contrastive learning (SimCSE). The supervised variant adds a second LoRA adapter trained on public E5 data with supervised contrastive learning, building on the MNTP LoRA weights.
Performance
The unsupervised LLM2Vec recipe achieves a new state-of-the-art on the Massive Text Embeddings Benchmark (MTEB). When combined with supervised contrastive learning—as in this model—it reaches state-of-the-art on MTEB among models trained exclusively on publicly available data (as of May 24, 2024). The model uses mean pooling with a maximum sequence length of 512 tokens and is optimized for embedding tasks such as retrieval, semantic similarity, and classification.
Architecture and Usage
Built on Llama-2-7b-chat-hf with bidirectional attention enabled, the model uses LoRA adapters (MNTP + supervised) merged into the base model. It is available under the MIT license and is hosted on gigarouter as a managed API, requiring only a simple API call for inference.
best for
- ·Web search query and passage retrieval
- ·Semantic similarity between texts
- ·Document clustering and classification
FAQ
LLM2Vec is a recipe to convert decoder-only LLMs into text encoders by enabling bidirectional attention, masked next token prediction, and contrastive learning.
The supervised variant adds trained LoRA weights on top of the MNTP model, using supervised contrastive learning on public data, achieving state-of-the-art on MTEB.
The model supports a maximum sequence length of 512 tokens.
Call the OpenAI-compatible endpoint with your API key, passing text inputs in the standard embeddings request format.
The model is released under the MIT license.
We're benchmarking and onboarding LLM2Vec Llama 2 7B Chat (Supervised) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.