本地AI绘画替代Midjourney：Stable Diffusion/FLUX完全指南

每月花$10-$60订阅Midjourney？DALL-E 3按token烧钱？如果你有一张显卡，完全可以把这些钱省下来，用开源工具在本地实现同样（甚至更好）的AI绘画效果。本文带你从零开始搭建本地AI图像生成环境。

一、付费工具定价分析

Midjourney

Basic Plan：$10/月，约200张图/月（约3.3小时GPU时间）
Standard Plan：$30/月，15小时快速生成
Pro Plan：$60/月，30小时快速生成 + 隐身模式
换算成本：按Standard计算，一年$360，三年$1080

DALL-E 3

通过ChatGPT Plus使用：$20/月（包含在订阅中）
API调用：标准1024x1024约$0.04/张，HD 1024x1792约$0.08/张
日均生成50张HD图：约$120/月

痛点总结：月费叠加成本高，生成数量受限，风格受平台控制，图片版权归属模糊。

二、免费替代方案介绍

1. Stable Diffusion WebUI（AUTOMATIC1111）

最成熟的开源AI绘画方案，社区生态最完善。支持LoRA、ControlNet、图像修复等大量插件。GitHub仓库：https://github.com/AUTOMATIC1111/stable-diffusion-webui

2. ComfyUI

节点式工作流引擎，适合进阶用户。可精确控制每一步处理，支持复杂工作流编排，性能优化好。GitHub仓库：https://github.com/comfyanonymous/ComfyUI

3. FLUX（Black Forest Labs）

由Stable Diffusion原班人马开发的新一代模型。FLUX.1-dev质量接近Midjourney v6，文字渲染能力远超SD系列。GitHub仓库：https://github.com/black-forest-labs/flux

4. Fooocus

最简化的选择，界面类似Midjourney，一键生成。适合不想折腾的用户，开箱即用。GitHub仓库：https://github.com/lllyasviel/Fooocus

三、硬件需求

硬件	最低要求	推荐配置
GPU显存	6GB（SD 1.5）	12GB+（SDXL/FLUX）
内存	16GB	32GB
硬盘	20GB	100GB+（多模型）
GPU型号	GTX 1060 6GB	RTX 3060 12GB+

四、安装部署步骤

方案一：Stable Diffusion WebUI（推荐新手）

# 1. 安装基础依赖（Ubuntu/Debian）
sudo apt update && sudo apt install -y wget git python3 python3-venv libgl1 libglib2.0-0

# 2. 克隆仓库
cd ~
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
cd stable-diffusion-webui

# 3. 下载模型（以SD 1.5为例）
mkdir -p models/Stable-diffusion
wget -O models/Stable-diffusion/v1-5-pruned-emaonly.safetensors \
  https://huggingface.co/runwayml/stable-diffusion-v15/resolve/main/v1-5-pruned-emaonly.safetensors

# 4. 启动WebUI（首次会自动安装依赖）
./webui.sh --xformers --api

启动后访问 http://localhost:7860 即可使用。

方案二：Fooocus（最简安装）

# 1. 克隆仓库
cd ~
git clone https://github.com/lllyasviel/Fooocus.git
cd Fooocus

# 2. 创建虚拟环境并安装
python3 -m venv venv
source venv/bin/activate
pip install -r requirements_versions.txt

# 3. 启动（首次运行自动下载模型）
python entry_with_update.py --preset default

方案三：ComfyUI（进阶用户）

# 1. 克隆仓库
cd ~
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI

# 2. 安装依赖
python3 -m venv venv
source venv/bin/activate
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt

# 3. 下载SDXL模型
wget -O models/checkpoints/sd_xl_base_1.0.safetensors \
  https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# 4. 启动
python main.py --listen 0.0.0.0

方案四：FLUX安装

# 在ComfyUI基础上，下载FLUX模型
cd ~/ComfyUI
mkdir -p models/unet models/clip

# 下载FLUX.1-schnell（快速版，无需token）
wget -O models/unet/flux1-schnell.safetensors \
  https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors

# 或下载FLUX.1-dev（需要HuggingFace token）
pip install huggingface_hub
huggingface-cli download black-forest-labs/FLUX.1-dev --local-dir models/flux-dev

五、使用示例

Stable Diffusion WebUI 文生图

import requests
import base64

# 调用本地API
txt2img_url = "http://127.0.0.1:7860/sdapi/v1/txt2img"
payload = {
    "prompt": "a cyberpunk city at night, neon lights, rain, cinematic lighting",
    "negative_prompt": "blurry, low quality, deformed",
    "steps": 30,
    "width": 512,
    "height": 512,
    "cfg_scale": 7.5,
    "sampler_name": "DPM++ 2M Karras"
}

response = requests.post(txt2img_url, json=payload)
result = response.json()

# 保存图片
with open("output.png", "wb") as f:
    f.write(base64.b64decode(result["images"][0]))
print("图片已保存到 output.png")

ComfyUI API调用

import requests
import json
import uuid

workflow = {
    "client_id": str(uuid.uuid4()),
    "prompt": {
        "1": {
            "class_type": "KSampler",
            "inputs": {
                "seed": 42, "steps": 25, "cfg": 7.5,
                "sampler_name": "dpmpp_2m", "scheduler": "karras",
                "denoise": 1.0, "model": ["2", 0],
                "positive": ["3", 0], "negative": ["4", 0],
                "latent_image": ["5", 0]
            }
        },
        "2": {"class_type": "CheckpointLoaderSimple",
              "inputs": {"ckpt_name": "sd_xl_base_1.0.safetensors"}},
        "3": {"class_type": "CLIPTextEncode",
              "inputs": {"text": "a beautiful sunset over mountains", "clip": ["2", 1]}},
        "4": {"class_type": "CLIPTextEncode",
              "inputs": {"text": "ugly, blurry", "clip": ["2", 1]}},
        "5": {"class_type": "EmptyLatentImage",
              "inputs": {"width": 1024, "height": 1024, "batch_size": 1}},
        "6": {"class_type": "VAEDecode",
              "inputs": {"samples": ["1", 0], "vae": ["2", 2]}},
        "7": {"class_type": "SaveImage",
              "inputs": {"filename_prefix": "comfyui", "images": ["6", 0]}}
    }
}

resp = requests.post("http://127.0.0.1:8188/prompt", json=workflow)
print(f"任务已提交: {resp.json()}")

六、功能对比表

功能	Midjourney	DALL-E 3	SD WebUI	ComfyUI	Fooocus	FLUX
月费	$10-60	$20+/API	免费	免费	免费	免费
文字理解	优秀	顶级	中等	中等	中等	优秀
图片质量	顶级	优秀	优秀	优秀	优秀	顶级
文字渲染	差	优秀	差	差	差	优秀
LoRA支持	❌	❌	✅	✅	✅	✅
ControlNet	❌	❌	✅	✅	✅	✅
API接口	✅	✅	✅	✅	✅	✅
本地运行	❌	❌	✅	✅	✅	✅
图片隐私	云端	云端	完全本地	完全本地	完全本地	完全本地
上手难度	低	低	中	高	低	中
扩展插件	❌	❌	极丰富	丰富	有限	通过ComfyUI
硬件要求	无	无	6GB+	6GB+	6GB+	12GB+

七、省钱计算器

假设你每月使用情况：生成图片约300张，使用Standard级别质量。

方案	月成本	年成本	3年总成本
Midjourney Standard	$30	$360	$1,080
DALL-E 3 API	$12	$144	$432
本地SD/FLUX（含电费）	~$3电费	~$36	~$108
本地SD/FLUX（含GPU投入）	RTX 3060约$250	-	$250+$108=$358

结论：即使算上显卡成本，本地方案在第一年就能回本，长期使用基本零成本。如果已经有游戏显卡，那就是纯省钱。

八、进阶玩法

LoRA微调：用自己的照片训练专属模型，生成特定风格或人物
ControlNet：精确控制图片构图、姿势、边缘
ComfyUI工作流：构建复杂的图片处理管线，批量生产
AI放大：配合Real-ESRGAN将小图放大4-8倍，质量不损失
图片转视频：配合AnimateDiff将静态图转为短视频

总结

开源AI绘画工具已经非常成熟，对于个人用户和小团队来说，完全不需要为Midjourney或DALL-E付费。唯一的门槛是一张像样的显卡，但考虑到长期使用成本，买一张RTX 3060 12GB（二手约$200）是最划算的投资。

如果你只是偶尔用用，Fooocus是最省心的选择；如果想要最大灵活性，ComfyUI是终极方案；如果追求最新最强的模型质量，FLUX值得一试。