skip to content
gigarouter gigarouter
models / embeddings · coming soon

LLM2Vec Meta Llama 3 8B Instruct (Unsupervised SimCSE)

McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse

published Apr 2024 · updated Apr 2024

LLM2Vec Meta Llama 3 8B Instruct (Unsupervised SimCSE) is a text embedding model that converts a decoder-only LLM into a bidirectional text encoder using masked next token prediction and unsupervised contrastive learning.

status
coming soon
API providers
0
downloads / mo
446
license
mit

specs

TaskText embedding / encoding
ArchitectureLlama 3 8B with bidirectional attention, MNTP and SimCSE LoRA adapters
Parameters8 billion (base model)
LicenseMIT
Max Sequence Length512 tokens

about this model

LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-unsup-simcse is an unsupervised text embedding model that transforms a decoder-only large language model into a powerful text encoder using a three-step recipe: enabling bidirectional attention, masked next token prediction (MNTP), and unsupervised contrastive learning (SimCSE). Based on the Meta-Llama-3-8B-Instruct architecture, this variant applies MNTP LoRA weights merged into the base model, then adds unsupervised SimCSE LoRA weights.

The method, introduced in the paper “LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders” (accepted at COLM 2024), demonstrates that decoder-only LLMs can be effectively adapted into universal text encoders without expensive supervised data or synthetic generations. On the Massive Text Embeddings Benchmark (MTEB), this unsupervised variant reaches a new state-of-the-art unsupervised performance. The default maximum sequence length is 512 tokens.

Architecture and Usage

The model uses mean pooling and is optimized for instruction-based queries. For retrieval tasks, queries are prefixed with a task instruction (e.g., “Given a web search query, retrieve relevant passages that answer the query:”) while documents are encoded without instructions. Cosine similarity between query and document embeddings is used for ranking.

Key Strengths

  • Transforms any compatible decoder-only LLM into a text encoder with minimal parameter overhead via LoRA.
  • Bidirectional attention enables rich contextualized representations from a decoder architecture.
  • Unsupervised training eliminates the need for labeled data; further supervised fine-tuning is possible (a supervised variant is also available).
  • Supports Llama 3.1, 3.2, Gemma, and Qwen-2 architectures as base models.

Licensed under MIT. For research and production embedding tasks requiring high-quality unsupervised representations, this model provides a strong, parameter-efficient alternative to encoder-only architectures.

best for

FAQ

What is the difference between this model and the supervised version of LLM2Vec?

The supervised variant is trained with supervised contrastive learning on public E5 data, while this unsupervised version uses only unsupervised contrastive learning (SimCSE).

What input format does the model expect?

Queries should be prefixed with an instruction (e.g., "Given a web search query, retrieve relevant passages that answer the query:") followed by the query text. Documents do not require instructions.

What is the output of this model?

The model outputs a fixed-size embedding vector. Cosine similarity can be used to compare query and document vectors.

How can I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, setting the model name to "llm2vec-meta-llama-3-8b-instruct-unsup-simcse" and passing input text as a prompt with the appropriate instruction format.

What is the license of this model?

It is released under the MIT license.

not yet live

We're benchmarking and onboarding LLM2Vec Meta Llama 3 8B Instruct (Unsupervised SimCSE) as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →