ColBERT Zero supervised
lightonai/ColBERT-Zero-supervised
published Feb 2026 · updated Feb 2026
A popular open embeddings model, with 84 downloads a month. gigarouter benchmarks and hosts it as an OpenAI-compatible API.
about this model
Controlled Comparison
The training was deliberately designed to isolate the impact of multi‑vector pre‑training. By using the same public data, the same ModernBERT base, and the same pipeline as the dense ModernBERT‑embed model, the only variable is the contrastive objective. The dense baseline scores 52.89 nDCG@10 on BEIR; ColBERT‑Zero closes a 2.4‑point data quality gap to reach 55.43.
Three‑Phase Pipeline
ColBERT‑Zero‑supervised represents the output of Phase 2 (supervised contrastive fine‑tuning with mined hard negatives). The full pipeline includes:
- Phase 1 – Unsupervised contrastive pre‑training with effective batch sizes of ~16k via GradCache and cross‑GPU gathering.
- Phase 2 – Supervised fine‑tuning on Nomic’s supervised data with mined hard negatives.
- Phase 3 – Knowledge distillation from a Gemma‑based teacher using the MaxSim operator.
Key Findings
Supervised contrastive fine‑tuning followed by distillation (the combination that includes this checkpoint) achieves 55.12 nDCG@10 – 99.4% of the full model’s performance at roughly 10% of the compute cost (~40 vs. ~408 GH200‑hours). Prompt alignment is critical: mismatched prompts can quietly degrade performance by over 0.8 points.
This supervised checkpoint is released for researchers studying the incremental impact of each training phase and prompt alignment.
We're benchmarking and onboarding ColBERT Zero supervised as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.