Stable Diffusion is a free, open-source AI image generation model that runs on your own hardware. Unlike Midjourney or DALL-E, there are no per-image costs, no content filtering beyond what you choose, and your prompts stay private on your machine. AUTOMATIC1111’s Stable Diffusion Web UI is the most popular interface for running it — a feature-rich browser-based UI that works on Windows and Linux. This guide gets you from zero to generating images.

Hardware Requirements

GPU	Performance	Notes
NVIDIA RTX 3060 (12 GB)	Good	~4–8 sec per 512×512 image
NVIDIA RTX 4070 (12 GB)	Excellent	~2–4 sec per 512×512 image
NVIDIA RTX 4090 (24 GB)	Outstanding	<1 sec per 512×512 image
AMD RX 7800 XT (16 GB)	Good	~5–10 sec (slower than CUDA)
Apple M2/M3/M4	Decent	Metal acceleration, 16+ GB unified memory helpful
CPU only	Very slow	Minutes per image — not recommended

VRAM is the key constraint. For SDXL (the 1024×1024 standard model), you need at least 8 GB VRAM. For standard SD 1.5 models (512×512 base), 4 GB VRAM is sufficient with --medvram flag.

Installation on Windows

Prerequisites

Install Python 3.10.x from python.org (check “Add to PATH” during install)
Install Git from git-scm.com
Install NVIDIA drivers and CUDA if using NVIDIA GPU (most RTX users already have this)

Clone the Repository

cd C:\
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui

Launch for First Time

webui-user.bat

This script automatically:

Creates a Python virtual environment
Installs all dependencies (PyTorch, transformers, etc.)
Downloads a starter model (SD 1.5) if no model is found

First launch takes 5–15 minutes. Subsequent launches take 30–60 seconds. When ready, the terminal shows Running on local URL: http://127.0.0.1:7860 — open that in your browser.

Installation on Linux

# Install dependencies (Debian/Ubuntu)
sudo apt install wget git python3 python3-venv libgl1 libglib2.0-0

# Clone the repository
cd ~
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui

# Launch
./webui.sh

Downloading Models

The stock SD 1.5 model is functional but limited. The Stable Diffusion community has released thousands of fine-tuned models.

Downloading from Civitai

Civitai (civitai.com) is the largest repository of Stable Diffusion models, LoRAs, and embeddings.

Browse to a model (e.g., “Realistic Vision v6”, “DreamShaper XL”)
Click Download — save to stable-diffusion-webui/models/Stable-diffusion/

After downloading, click Refresh in the WebUI and select the model from the checkpoint dropdown.

Recommended Starter Models

RealisticVision v6.0 (SD 1.5 based) — excellent photorealistic portraits
DreamShaper XL (SDXL based) — versatile, handles both photorealism and stylized art
SDXL 1.0 Base + Refiner — Stability AI’s official 1024×1024 model
Pony Diffusion XL — popular for anime/illustration style

Basic Image Generation

In the WebUI, go to the txt2img tab:

Positive Prompt: Describe what you want to see

portrait of a woman, professional photograph, studio lighting, sharp focus, 8k

Negative Prompt: Describe what to avoid

blurry, deformed, ugly, watermark, text, low quality, worst quality, bad anatomy

Key settings:

Sampling Method: DPM++ 2M Karras or Euler a — both work well
Sampling Steps: 20–30 for most models (more steps = slower, diminishing returns after 30)
CFG Scale: 7–9 — how strongly to follow the prompt (higher = more literal, less creative)
Width/Height: 512×512 for SD 1.5; 1024×1024 for SDXL
Batch size: Generate multiple images per run

Click Generate — your image appears in seconds to minutes depending on your GPU.

ControlNet Extension

ControlNet is the most powerful extension — it allows you to control the pose, depth, edges, and composition of generated images using reference images:

Install via Extensions → Install from URL → paste: https://github.com/Mikubill/sd-webui-controlnet

After installing, ControlNet panels appear in txt2img. Use cases:

OpenPose: Generate a person in a specific pose from a reference image
Canny/Lineart: Use an edge map or line drawing to guide composition
Depth: Match the depth map of a reference scene

img2img: Transform Existing Images

The img2img tab takes an existing image and transforms it with a prompt:

Upload an image
Write a prompt describing how to change it
Set Denoising Strength (0.0 = no change; 1.0 = complete regeneration; 0.5–0.7 is the sweet spot for transformation)

This is useful for upscaling, style transfer, and fixing specific parts of generated images.

Inpainting

The inpainting tool lets you repaint specific regions:

In img2img, upload your image
Use the mask brush to paint over the area to change
Prompt describes the replacement content
Generate — only the masked area regenerates, blending naturally with the surroundings

VRAM Optimization Flags

If you get CUDA out-of-memory errors, add flags to webui-user.bat:

set COMMANDLINE_ARGS=--medvram

Or for very limited VRAM:

set COMMANDLINE_ARGS=--lowvram

--medvram reduces VRAM usage at a moderate speed cost. --lowvram runs on 4 GB GPUs but is significantly slower.

Prompt Engineering Tips

Effective Stable Diffusion prompts follow this structure:

[Subject], [Style/Medium], [Lighting], [Camera/Lens], [Quality tags]

Example:
A medieval knight in armor, oil painting, dramatic lighting, highly detailed, 
artstation, 8k resolution, Greg Rutkowski style

Use parentheses to increase weight: (sharp focus:1.4) and square brackets to decrease: [blurry].

Generating locally with AUTOMATIC1111 is completely free after setup — no API costs, no monthly fees, no usage limits. With a mid-range GPU, you can generate hundreds of images per hour in full privacy.