Stable Diffusion is a free, open-source AI image generation model that runs on your own hardware. Unlike Midjourney or DALL-E, there are no per-image costs, no content filtering beyond what you choose, and your prompts stay private on your machine. AUTOMATIC1111’s Stable Diffusion Web UI is the most popular interface for running it — a feature-rich browser-based UI that works on Windows and Linux. This guide gets you from zero to generating images.
Hardware Requirements
| GPU | Performance | Notes |
|---|---|---|
| NVIDIA RTX 3060 (12 GB) | Good | ~4–8 sec per 512×512 image |
| NVIDIA RTX 4070 (12 GB) | Excellent | ~2–4 sec per 512×512 image |
| NVIDIA RTX 4090 (24 GB) | Outstanding | <1 sec per 512×512 image |
| AMD RX 7800 XT (16 GB) | Good | ~5–10 sec (slower than CUDA) |
| Apple M2/M3/M4 | Decent | Metal acceleration, 16+ GB unified memory helpful |
| CPU only | Very slow | Minutes per image — not recommended |
VRAM is the key constraint. For SDXL (the 1024×1024 standard model), you need at least 8 GB VRAM. For standard SD 1.5 models (512×512 base), 4 GB VRAM is sufficient with --medvram flag.
Installation on Windows
Prerequisites
- Install Python 3.10.x from python.org (check “Add to PATH” during install)
- Install Git from git-scm.com
- Install NVIDIA drivers and CUDA if using NVIDIA GPU (most RTX users already have this)
Clone the Repository
cd C:\
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
Launch for First Time
webui-user.bat
This script automatically:
- Creates a Python virtual environment
- Installs all dependencies (PyTorch, transformers, etc.)
- Downloads a starter model (SD 1.5) if no model is found
First launch takes 5–15 minutes. Subsequent launches take 30–60 seconds. When ready, the terminal shows Running on local URL: http://127.0.0.1:7860 — open that in your browser.
Installation on Linux
# Install dependencies (Debian/Ubuntu)
sudo apt install wget git python3 python3-venv libgl1 libglib2.0-0
# Clone the repository
cd ~
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
cd stable-diffusion-webui
# Launch
./webui.sh
Downloading Models
The stock SD 1.5 model is functional but limited. The Stable Diffusion community has released thousands of fine-tuned models.
Downloading from Civitai
Civitai (civitai.com) is the largest repository of Stable Diffusion models, LoRAs, and embeddings.
- Browse to a model (e.g., “Realistic Vision v6”, “DreamShaper XL”)
- Click Download — save to
stable-diffusion-webui/models/Stable-diffusion/
After downloading, click Refresh in the WebUI and select the model from the checkpoint dropdown.
Recommended Starter Models
- RealisticVision v6.0 (SD 1.5 based) — excellent photorealistic portraits
- DreamShaper XL (SDXL based) — versatile, handles both photorealism and stylized art
- SDXL 1.0 Base + Refiner — Stability AI’s official 1024×1024 model
- Pony Diffusion XL — popular for anime/illustration style
Basic Image Generation
In the WebUI, go to the txt2img tab:
Positive Prompt: Describe what you want to see
portrait of a woman, professional photograph, studio lighting, sharp focus, 8k
Negative Prompt: Describe what to avoid
blurry, deformed, ugly, watermark, text, low quality, worst quality, bad anatomy
Key settings:
- Sampling Method:
DPM++ 2M KarrasorEuler a— both work well - Sampling Steps: 20–30 for most models (more steps = slower, diminishing returns after 30)
- CFG Scale: 7–9 — how strongly to follow the prompt (higher = more literal, less creative)
- Width/Height: 512×512 for SD 1.5; 1024×1024 for SDXL
- Batch size: Generate multiple images per run
Click Generate — your image appears in seconds to minutes depending on your GPU.
ControlNet Extension
ControlNet is the most powerful extension — it allows you to control the pose, depth, edges, and composition of generated images using reference images:
Install via Extensions → Install from URL → paste:
https://github.com/Mikubill/sd-webui-controlnet
After installing, ControlNet panels appear in txt2img. Use cases:
- OpenPose: Generate a person in a specific pose from a reference image
- Canny/Lineart: Use an edge map or line drawing to guide composition
- Depth: Match the depth map of a reference scene
img2img: Transform Existing Images
The img2img tab takes an existing image and transforms it with a prompt:
- Upload an image
- Write a prompt describing how to change it
- Set Denoising Strength (0.0 = no change; 1.0 = complete regeneration; 0.5–0.7 is the sweet spot for transformation)
This is useful for upscaling, style transfer, and fixing specific parts of generated images.
Inpainting
The inpainting tool lets you repaint specific regions:
- In img2img, upload your image
- Use the mask brush to paint over the area to change
- Prompt describes the replacement content
- Generate — only the masked area regenerates, blending naturally with the surroundings
VRAM Optimization Flags
If you get CUDA out-of-memory errors, add flags to webui-user.bat:
set COMMANDLINE_ARGS=--medvram
Or for very limited VRAM:
set COMMANDLINE_ARGS=--lowvram
--medvram reduces VRAM usage at a moderate speed cost. --lowvram runs on 4 GB GPUs but is significantly slower.
Prompt Engineering Tips
Effective Stable Diffusion prompts follow this structure:
[Subject], [Style/Medium], [Lighting], [Camera/Lens], [Quality tags]
Example:
A medieval knight in armor, oil painting, dramatic lighting, highly detailed,
artstation, 8k resolution, Greg Rutkowski style
Use parentheses to increase weight: (sharp focus:1.4) and square brackets to decrease: [blurry].
Generating locally with AUTOMATIC1111 is completely free after setup — no API costs, no monthly fees, no usage limits. With a mid-range GPU, you can generate hundreds of images per hour in full privacy.