Llama Nemotron Embed 1B V2

nvidia/llama-nemotron-embed-1b-v2

published Oct 2025 · updated May 2026

Llama Nemotron Embed 1B V2 is a multilingual embedding model optimized for question-answering retrieval with support for long documents up to 8192 tokens and dynamic embedding sizes.

est. price

~$0.008

/ 1M tokens · estimated, set at launch

API providers

downloads / mo

658.5K

license

other

specs

Task	Embedding
Architecture	Fine-tuned Llama 3.2 1B Transformer
Parameters	1B
Context Length	8192 tokens
License	NVIDIA Open Model License

about this model

llama-nemotron-embed-1b-v2 is an embedding model optimized for multilingual and cross-lingual question-answering retrieval, supporting long documents up to 8192 tokens and dynamic embedding sizes (Matryoshka embeddings). It is a fine-tuned version of Llama 3.2 1B, using a transformer encoder architecture with a bi-encoder setup for contrastive learning.

Key Capabilities

Multilingual support across 26 languages, including English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, and Turkish.
Context length of up to 8192 tokens, enabling retrieval over longer passages without chunking.
Dynamic embedding size configurable to 384, 512, 768, 1024, or 2048, reducing storage footprint by up to 35× compared to fixed-size embeddings.
Trained on a blend of public QA datasets with commercial-friendly licenses, using 12M samples for semi-supervised pre-training and 1M samples for fine-tuning.

Evaluation Results

The model was evaluated offline on A100 GPUs using PyTorch checkpoints. The table below reports Recall@5 on four QA benchmarks (NQ, HotpotQA, FiQA, TechQA).

Open & Commercial Retrieval Models	Average Recall@5 (NQ, HotpotQA, FiQA, TechQA)
llama-nemotron-embed-1b-v2 (embedding dim 2048)	68.60%
llama-nemotron-embed-1b-v2 (embedding dim 384)	64.48%
llama-3.2-nv-embedqa-1b-v1 (embedding dim 2048)	68.97%
nv-embedqa-mistral-7b-v2	72.97%
nv-embedqa-mistral-7B-v1	64.93%
nv-embedqa-e5-v5	62.07%
nv-embedqa-e5-v4	57.65%
e5-large-unsupervised	48.03%
BM25	44.67%

The model is part of the NVIDIA NeMo Retriever collection, designed for production-ready information retrieval pipelines with enterprise support. It is available as a hosted, OpenAI-compatible API through gigarouter, eliminating the need for local installation or hardware management.

best for

·Multilingual question-answering over large text corpora
·Retrieval-Augmented Generation (RAG) pipelines
·Cross-lingual document retrieval

FAQ

What is this model best used for?

It is designed for multilingual and cross-lingual text retrieval, particularly question-answering over large document collections.

How do I call this model via the gigarouter API?

Use the OpenAI-compatible endpoint with your API key, sending text input as a list of strings.

What input and output formats does the model support?

Input is a list of strings; output is a list of float arrays representing embedding vectors of configurable dimension (384, 512, 768, 1024, or 2048).

Which languages does the model support?

It was evaluated on 26 languages including English, Arabic, Chinese, French, German, Japanese, Spanish, and many others.

What is the license for this model?

It is governed by the NVIDIA Open Model License Agreement, with additional terms from the Llama 3.2 Community License.

not yet live

We're benchmarking and onboarding Llama Nemotron Embed 1B V2 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related embeddings models

compare all →

nomic-embed-text-v1.5