LLM2Vec Meta Llama 3 8B Instruct (Unsupervised SimCSE)
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse
published Apr 2024 · updated Apr 2024
LLM2Vec Meta Llama 3 8B Instruct (Unsupervised SimCSE) is a text embedding model that converts a decoder-only LLM into a bidirectional text encoder using masked next token prediction and unsupervised contrastive learning.
specs
| Task | Text embedding / encoding |
| Architecture | Llama 3 8B with bidirectional attention, MNTP and SimCSE LoRA adapters |
| Parameters | 8 billion (base model) |
| License | MIT |
| Max Sequence Length | 512 tokens |
about this model
LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse is an unsupervised text embedding model that transforms a decoder-only large language model into a powerful text encoder using a three-step recipe: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning (SimCSE). Based on the Meta-Llama-3-8B-Instruct architecture, this variant applies MNTP LoRA weights merged into the base model, then adds unsupervised SimCSE LoRA weights.
The method, introduced in the paper “LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders” (accepted at COLM 2024), demonstrates that decoder-only LLMs can be effectively adapted into universal text encoders without expensive supervised data or synthetic generations. On the Massive Text Embeddings Benchmark (MTEB), this unsupervised variant reaches a new state-of-the-art unsupervised performance. The default maximum sequence length is 512 tokens.
Architecture and Usage
The model uses mean pooling and is optimized for instruction-based queries. For retrieval tasks, queries are prefixed with a task instruction (e.g., “Given a web search query, retrieve relevant passages that answer the query:”) while documents are encoded without instructions. Cosine similarity between query and document embeddings is used for ranking.
Key Strengths
- Transforms any compatible decoder-only LLM into a text encoder with minimal parameter overhead via LoRA.
- Bidirectional attention enables rich contextualized representations from a decoder architecture.
- Unsupervised training eliminates the need for labeled data; further supervised fine-tuning is possible (a supervised variant is also available).
- Supports Llama 3.1, 3.2, Gemma, and Qwen-2 architectures as base models.
Licensed under MIT. For research and production embedding tasks requiring high-quality unsupervised representations, this model provides a strong, parameter-efficient alternative to encoder-only architectures.
best for
- ·Retrieving relevant passages for web search queries
- ·Measuring semantic similarity between sentences
- ·Clustering or classifying documents by meaning
FAQ
The supervised variant is trained with supervised contrastive learning on public E5 data, while this unsupervised version uses only unsupervised contrastive learning (SimCSE).
Queries should be prefixed with an instruction (e.g., "Given a web search query, retrieve relevant passages that answer the query:") followed by the query text. Documents do not require instructions.
The model outputs a fixed-size embedding vector. Cosine similarity can be used to compare query and document vectors.
Use the OpenAI-compatible endpoint with your API key, setting the model name to "llm2vec-meta-llama-3-8b-instruct-unsup-simcse" and passing input text as a prompt with the appropriate instruction format.
It is released under the MIT license.
We're benchmarking and onboarding LLM2Vec Meta Llama 3 8B Instruct (Unsupervised SimCSE) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.