LLM2Vec Llama 2 7B Chat (Supervised)

McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised

published Apr 2024 · updated Apr 2024

LLM2Vec Llama 2 7B Chat (Supervised) is a text embedding model that converts a decoder-only LLM into a powerful text encoder using bidirectional attention, masked next token prediction, and supervised contrastive learning.

status

coming soon

API providers

downloads / mo

license

mit

specs

Task	Embedding
Architecture	Decoder-only LLM with bidirectional attention (LLM2Vec) + LoRA
Parameters	7B
License	MIT
Max Sequence Length	512 tokens
Pooling	Mean

about this model

McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-supervised is a text embedding model that converts the Llama-2-7b-chat decoder-only language model into a bidirectional text encoder using the LLM2Vec recipe, followed by supervised contrastive fine-tuning on publicly available data.

Method

The LLM2Vec approach (BehnamGhader et al., COLM 2024) transforms any decoder-only LLM into a universal text encoder through three steps: enabling bidirectional attention, training with masked next token prediction (MNTP), and unsupervised contrastive learning (SimCSE). The supervised variant adds a second LoRA adapter trained on public E5 data with supervised contrastive learning, building on the MNTP LoRA weights.

Performance

The unsupervised LLM2Vec recipe achieves a new state-of-the-art on the Massive Text Embeddings Benchmark (MTEB). When combined with supervised contrastive learning—as in this model—it reaches state-of-the-art on MTEB among models trained exclusively on publicly available data (as of May 24, 2024). The model uses mean pooling with a maximum sequence length of 512 tokens and is optimized for embedding tasks such as retrieval, semantic similarity, and classification.

Architecture and Usage

Built on Llama-2-7b-chat-hf with bidirectional attention enabled, the model uses LoRA adapters (MNTP + supervised) merged into the base model. It is available under the MIT license and is hosted on gigarouter as a managed API, requiring only a simple API call for inference.

best for

·Web search query and passage retrieval
·Semantic similarity between texts
·Document clustering and classification

FAQ

What is LLM2Vec?

LLM2Vec is a recipe to convert decoder-only LLMs into text encoders by enabling bidirectional attention, masked next token prediction, and contrastive learning.

What is the difference between the supervised and unsupervised variants?

The supervised variant adds trained LoRA weights on top of the MNTP model, using supervised contrastive learning on public data, achieving state-of-the-art on MTEB.

What is the maximum input length for this embedding model?

The model supports a maximum sequence length of 512 tokens.

How can I use this model via the gigarouter API?

Call the OpenAI-compatible endpoint with your API key, passing text inputs in the standard embeddings request format.

What license does this model use?

The model is released under the MIT license.

not yet live

We're benchmarking and onboarding LLM2Vec Llama 2 7B Chat (Supervised) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5