DeepSeek V4 Pro

deepseek-ai/DeepSeek-V4-Pro-DSpark

published Jun 2026 · updated Jul 2026

DeepSeek V4 Pro is a Mixture-of-Experts text-generation model that supports up to one million tokens context length, with 1.6T total parameters and 49B activated parameters.

status

coming soon

API providers

downloads / mo

9.4K

license

mit

specs

Task	Text Generation
Architecture	Mixture-of-Experts (MoE) with Hybrid Attention (Compressed Sparse Attention + Heavily Compressed Attention)
Parameters	1.6 trillion total, 49 billion activated
License	MIT
Context Length	1 million tokens

about this model

DeepSeek-V4-Pro-DSpark is a text-generation model from the DeepSeek-V4 series, a Mixture-of-Experts (MoE) language model with 1.6 trillion total parameters (49B activated) supporting a context length of one million tokens. It incorporates a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency, Manifold-Constrained Hyper-Connections (mHC) for enhanced signal propagation, and the Muon optimizer for training stability. The model was pre-trained on over 32 trillion tokens and post-trained via a two-stage paradigm of domain-specific expert cultivation followed by unified consolidation through on-policy distillation.

Key Strengths

In the 1M-token context setting, DeepSeek-V4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. The model supports three reasoning effort modes: Non-think (fast, intuitive), Think High (conscious logical analysis), and Think Max (maximum reasoning).

Benchmark Results

DeepSeek-V4-Pro-Max achieves competitive results against frontier models across knowledge, reasoning, coding, and agentic benchmarks:

LiveCodeBench (Pass@1): 93.5 (highest among compared models)
Codeforces (Rating): 3206 (highest among compared models)
Apex Shortlist (Pass@1): 90.2 (highest among compared models)
MMLU-Pro (EM): 87.5
GPQA Diamond (Pass@1): 90.1
SWE Verified (Resolved): 80.6
LongBench-V2 (EM): 51.5 (base model)

DeepSeek-V4-Pro-DSpark is the same checkpoint with an additional speculative decoding module attached for faster inference. It is hosted as an OpenAI-compatible API on gigarouter.

best for

·Complex reasoning and problem-solving
·Code generation and analysis
·Long-document analysis and summarization
·Agentic workflow automation

FAQ

What is DeepSeek V4 Pro best for?

It excels at complex reasoning, coding, long-context tasks, and agentic workflows.

What context length does DeepSeek V4 Pro support?

It supports up to one million tokens.

How does DeepSeek V4 Pro compare to DeepSeek V3.2 in efficiency?

In the 1M-token context setting, DeepSeek V4 Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek V3.2.

What is the license for DeepSeek V4 Pro?

It is licensed under MIT.

How can I call DeepSeek V4 Pro via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key.

not yet live

We're benchmarking and onboarding DeepSeek V4 Pro as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related text generation models

tiny-Qwen2ForCausalLM-2.5

9.2M dl/mo

deepseek-v4-gguf

6.4M dl/mo

Qwen3.6-35B-A3B-NVFP4

6.2M dl/mo

gemma-3-270m

5.1M dl/mo