Stable Diffusion V1.5

Comfy-Org/stable-diffusion-v1-5-archive

published Aug 2024 · updated Dec 2025

Stable Diffusion V1.5 is a text-to-image latent diffusion model that generates 512x512 images from text prompts, archived for legacy compatibility.

status

coming soon

API providers

downloads / mo

5.8M

license

creativeml-openrail-m

specs

Task	Text-to-Image Generation
Architecture	Latent Diffusion Model with 860M UNet and CLIP ViT-L/14 text encoder
Parameters	860M (UNet) + 123M (CLIP text encoder)
License	CreativeML OpenRAIL-M

about this model

Stable Diffusion v1.5 is a latent text-to-image diffusion model that generates 512x512 images from English text prompts. Developed by Robin Rombach and Patrick Esser, it is an archival re-upload of the original RunwayML release, preserved for legacy model testing and technical accessibility. The model uses an 860M parameter UNet with a frozen 123M parameter CLIP ViT-L/14 text encoder, requiring at least 10GB VRAM GPU for inference.

Capabilities and Architecture

Stable Diffusion v1.5 operates in a compressed latent space, enabling efficient high-resolution image synthesis. It was initialized from v1-2 weights and fine-tuned for 225k steps at 512x512 resolution on the LAION-Aesthetics V2 5+ dataset, with 10% text-conditioning dropout. The model is based on the CVPR 2022 paper "High-Resolution Image Synthesis With Latent Diffusion Models" (arXiv 2112.10752).

Key Strengths

Widely adopted foundation model: 872 fine-tuned models are based on this archive, and it received over 5.8 million monthly downloads on Hugging Face.
Proven baseline for legacy model testing and reproducibility, with exact hash-identical weights preserved from the original RunwayML upload.
Available in both FP32 and FP16 precision formats for flexible deployment.

Benchmark and Usage Context

As a 2022-generation model, Stable Diffusion v1.5 is several major upgrades behind current state-of-the-art systems. It is best suited for legacy model testing, reproducibility studies, and as a baseline for fine-tuning. The model is English-only and licensed under CreativeML OpenRAIL-M. It is not currently deployed by any major inference provider.

Model Variants

File	Description
`v1-5-pruned-emaonly.safetensors`	Exact hash-identical original model as uploaded by RunwayML (FP32).
`v1-5-pruned-emaonly-fp16.safetensors`	FP16 conversion with added metadata header.

Academic Reference

Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-Resolution Image Synthesis With Latent Diffusion Models. CVPR. arXiv:2112.10752.

best for

·Generating 512x512 images from English text prompts for legacy model testing
·Reproducing results from the original Stable Diffusion v1.5 research

FAQ

What is Stable Diffusion V1.5 best used for?

It is best for generating 512x512 images from English text prompts, particularly for legacy model testing and reproducing original Stable Diffusion v1.5 results.

What are the hardware requirements for this model?

It requires at least 10GB of VRAM on a GPU to run the 860M parameter UNet and 123M parameter CLIP text encoder.

What license does this model use?

It uses the CreativeML OpenRAIL-M license.

How do I call this model via the API?

Use the gigarouter OpenAI-compatible endpoint with your API key to send text prompts and receive generated images.

What input and output formats does the model support?

It accepts English text prompts as input and outputs 512x512 images.

not yet live

We're benchmarking and onboarding Stable Diffusion V1.5 as a hosted, OpenAI-compatible API. Sign in for free credit and be ready when it lands, or tell us you want it and we'll prioritize it.

related specialist model models

compare all →

electra-base-discriminator

wespeaker-voxceleb-resnet34-LM

6.8M dl/mo

unidepth-v2-vitl14

6.3M dl/mo