Krea 2 Raw
krea/Krea-2-Raw
published Jun 2026 · updated Jun 2026
Krea 2 Raw is a text-to-image diffusion model that generates images from text descriptions, designed as a base checkpoint for fine-tuning and post-training.
specs
| Task | Text-to-Image |
| Architecture | Diffusion Transformer (MMDiT) with 48 attention heads, 12 KV heads, Qwen3-VL text encoder, and Qwen-Image VAE |
| Parameters | 12 billion |
| License | Krea 2 Community License |
about this model
Krea-2-Raw is a text-to-image diffusion model with 12 billion parameters, built on a Diffusion Transformer architecture (single-stream MMDiT) and serving as the base checkpoint of the Krea 2 model family. It is designed primarily for fine-tuning, post-training, and LoRA training, offering high diversity and malleability prior to distillation. The model can generate images up to 1k resolution and is recommended for use with 52 denoising steps and a classifier-free guidance scale of 3.5.
Key Strengths
- Ranked #1 text-to-image model from an independent lab on the Artificial Analysis leaderboard (source: Krea 2 repository).
- Uses a Qwen3-VL text encoder and Qwen-Image VAE, with grouped-query attention (48 attention heads, 12 KV heads) and sigmoid-gated attention for efficient processing.
- Trained through a multi-stage pipeline including pretraining, supervised fine-tuning, preference optimization, and reinforcement learning.
- Supports tensor parallelism for distributed inference across 1, 2, or 4 devices.
- Integrates with SGLang’s Cache-DiT acceleration for caching during inference.
Architecture and Training
The model employs a 12-billion-parameter Diffusion Transformer with lightweight timestep modulation and multilayer feature aggregation for text-encoder features. Training data combines publicly available, licensed, and synthetic datasets, filtered for quality and safety. The Raw checkpoint is released under the Krea 2 Community License; commercial licensing is available by contacting the developer.
Risks and Limitations
Krea-2-Raw is a new technology. Outputs may not always match prompts, and the model is not intended for factual information. Deployers must implement content filtering as required by the license. Safety evaluations were conducted for categories including sexually explicit content, non-consensual imagery, and child safety, with high resilience observed against violative inputs.
best for
- ·Fine-tuning and LoRA training for custom image generation
- ·Post-training and domain-specific adaptation
- ·Foundation for building specialized text-to-image models
FAQ
It is a base checkpoint intended for fine-tuning and LoRA training, not for direct inference. Use the Turbo variant for high-quality generation.
It uses a Diffusion Transformer (MMDiT) with 12 billion parameters, 48 attention heads, grouped-query attention, and a Qwen3-VL text encoder.
It is released under the Krea 2 Community License, which imposes content filtering obligations on deployers.
The Raw model generates images up to 1k resolution (e.g., 1024x1024).
Use the gigarouter OpenAI-compatible endpoint with your API key, specifying the model ID `krea/Krea-2-Raw` and standard text-to-image parameters.
We're benchmarking and onboarding Krea 2 Raw as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.