Stable Diffusion V1.5
Comfy-Org/stable-diffusion-v1-5-archive
published Aug 2024 · updated Dec 2025
Stable Diffusion V1.5 is a text-to-image latent diffusion model that generates 512x512 images from text prompts, archived for legacy compatibility.
specs
| Task | Text-to-Image Generation |
| Architecture | Latent Diffusion Model with 860M UNet and CLIP ViT-L/14 text encoder |
| Parameters | 860M (UNet) + 123M (CLIP text encoder) |
| License | CreativeML OpenRAIL-M |
about this model
Capabilities and Architecture
Stable Diffusion v1.5 operates in a compressed latent space, enabling efficient high-resolution image synthesis. It was initialized from v1-2 weights and fine-tuned for 225k steps at 512x512 resolution on the LAION-Aesthetics V2 5+ dataset, with 10% text-conditioning dropout. The model is based on the CVPR 2022 paper "High-Resolution Image Synthesis With Latent Diffusion Models" (arXiv 2112.10752).
Key Strengths
- Widely adopted foundation model: 872 fine-tuned models are based on this archive, and it received over 5.8 million monthly downloads on Hugging Face.
- Proven baseline for legacy model testing and reproducibility, with exact hash-identical weights preserved from the original RunwayML upload.
- Available in both FP32 and FP16 precision formats for flexible deployment.
Benchmark and Usage Context
As a 2022-generation model, Stable Diffusion v1.5 is several major upgrades behind current state-of-the-art systems. It is best suited for legacy model testing, reproducibility studies, and as a baseline for fine-tuning. The model is English-only and licensed under CreativeML OpenRAIL-M. It is not currently deployed by any major inference provider.
Model Variants
| File | Description |
|---|---|
v1-5-pruned-emaonly.safetensors | Exact hash-identical original model as uploaded by RunwayML (FP32). |
v1-5-pruned-emaonly-fp16.safetensors | FP16 conversion with added metadata header. |
Academic Reference
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis With Latent Diffusion Models. CVPR. arXiv:2112.10752.
best for
- ·Generating 512x512 images from English text prompts for legacy model testing
- ·Reproducing results from the original Stable Diffusion v1.5 research
FAQ
It is best for generating 512x512 images from English text prompts, particularly for legacy model testing and reproducing original Stable Diffusion v1.5 results.
It requires at least 10GB of VRAM on a GPU to run the 860M parameter UNet and 123M parameter CLIP text encoder.
It uses the CreativeML OpenRAIL-M license.
Use the gigarouter OpenAI-compatible endpoint with your API key to send text prompts and receive generated images.
It accepts English text prompts as input and outputs 512x512 images.
We're benchmarking and onboarding Stable Diffusion V1.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.