LLM2Vec Meta Llama 3 8B Instruct Supervised
McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised
published Apr 2024 · updated Apr 2024
LLM2Vec Meta Llama 3 8B Instruct Supervised is a text embedding model that converts a decoder-only LLM into a bidirectional encoder using masked next token prediction and supervised contrastive learning.
specs
| Task | Text Embedding (dense retrieval) |
| Architecture | Meta-Llama-3-8B-Instruct with bidirectional attention and LoRA adapters |
| Parameters | 8B |
| License | MIT |
| Pooling | Mean (default) |
| Max Sequence Length | 512 tokens |
about this model
LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervised is a text embedding model that converts a decoder-only large language model into a bidirectional encoder using the LLM2Vec recipe, then fine-tuned with supervised contrastive learning. It is built on Meta-Llama-3-8B-Instruct and integrates bidirectional attention, masked next token prediction, and unsupervised contrastive learning, with an additional supervised LoRA adapter trained on public E5 data.
Key Strengths
The LLM2Vec approach enables any decoder-only LLM (from 1.3B to 8B parameters) to serve as a universal text encoder without expensive adaptation or synthetic data. This specific model achieves state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB) among models trained exclusively on publicly available data (as of May 24, 2024). It also sets a new unsupervised state-of-the-art on MTEB and outperforms encoder-only models by a large margin on word-level tasks.
Benchmark Performance
On MTEB, the model reaches state-of-the-art results in both unsupervised and supervised (public-data) settings. Word-level benchmarks show substantial gains over traditional encoder-only architectures. All evaluations are documented in the LLM2Vec paper (accepted to COLM 2024).
Technical Details
- Pooling: mean pooling, default max sequence length 512 tokens.
- Bidirectional attention enabled by default.
- License: MIT.
- Repository: github.com/McGill-NLP/llm2vec
- Paper: arXiv:2404.05961
best for
- ·Semantic search and retrieval-augmented generation (RAG)
- ·Document clustering and similarity comparison
- ·Sentence and passage embedding for classification
FAQ
It accepts pairs of instruction and query text for queries, and plain text for documents. Both are encoded as sequences of up to 512 tokens with mean pooling.
Use the OpenAI-compatible endpoint with your API key, setting the model to "LLM2Vec Meta Llama 3 8B Instruct Supervised" and sending a request with input text and optional instruction.
The maximum sequence length is 512 tokens (default in the LLM2Vec wrapper).
It is released under the MIT License.
It achieves state-of-the-art results on the MTEB benchmark among models trained only on publicly available data, outperforming encoder-only models on word-level tasks.
We're benchmarking and onboarding LLM2Vec Meta Llama 3 8B Instruct Supervised as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.