Running sd-server on any legacy GPU via Vulkan + OpenWebUI integration — full Windows walkthrough (no CUDA/ROCm) #1685

aivisionslab-studios · 2026-06-21T03:49:11Z

aivisionslab-studios
Jun 21, 2026

Local AI stack on any GPU via Vulkan — LLM + image generation + OpenWebUI (Windows, no CUDA/ROCm)

This is a complete local AI stack — text generation + image generation — running entirely on your GPU through Vulkan. No CUDA. No ROCm. No cloud. Works on AMD, Nvidia, and Intel GPUs, including older cards dropped from official AI support.

What you'll have at the end: a chat interface (OpenWebUI) connected to a local LLM and a local image generator, both accelerated by your GPU.

How this works

The standard AI stack (PyTorch, ROCm, CUDA) locks you out if your GPU isn't on the supported list. This guide goes around it entirely.

llama.cpp and stable-diffusion.cpp have a Vulkan compute backend — an open graphics standard that every GPU vendor ships drivers for, even for hardware they've officially dropped from their AI stacks. If your GPU has a working Vulkan driver, it runs inference.

Terminal rules — read this first

This guide uses two different terminals for different steps. Using the wrong one is the most common cause of errors.

Compilation (building the engines): use Developer PowerShell for VS 2022 — search it in the Start menu. This is NOT regular PowerShell.
Everything else (running servers, Docker, scripts): use regular PowerShell or CMD.
WSL2 is not needed for this guide. Everything runs natively on Windows.

Part 1 — Install prerequisites

Install these in order. All free.

Visual Studio Community 2022 — during install, select the workload "Desktop development with C++". Nothing else required.
CMake — download from cmake.org. During install, check "Add CMake to system PATH for all users".
Git for Windows — official installer at git-scm.com, default options.
Vulkan SDK — download from vulkan.lunarg.com. Install to the default path (C:\VulkanSDK\). Do not change the path.
Docker Desktop — download from docker.com. Required for OpenWebUI.

After installing, validate Vulkan detected your GPU. Open regular PowerShell:

vulkaninfo --summary

Your GPU should appear under Physical Devices. If the command isn't found — restart your machine and try again. If your GPU doesn't appear — update your GPU driver.

Part 2 — Build the LLM engine (llama.cpp)

Open Developer PowerShell for VS 2022 and run:

cd E:\
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release -j8

Replace -j8 with your core count — more threads = faster compile.

Confirm your GPU was detected:

cd build\bin\Release
.\llama-cli.exe --list-devices

Expected output:
Vulkan0: [YOUR GPU NAME]

If nothing appears — your build didn't include Vulkan. Verify the cmake line included -DGGML_VULKAN=ON.

Part 3 — Get an LLM model

You need a .gguf file. Download from HuggingFace — search for any model with Q4_K_M quantization.

VRAM	Recommended model
4GB or less	Qwen3 1.7B Q4_K_M or Phi-3 Mini Q4_K_M
6–8GB	Mistral 7B Q4_K_M or Llama 3 8B Q4_K_M
8GB+	anything up to the size your VRAM fits

Save the .gguf file on SSD or NVMe — not HDD. Load time difference is significant.

Part 4 — Start the LLM server

Open regular PowerShell:

C:\llama.cpp\build\bin\Release\llama-server.exe -m "E:\models\your-model.gguf" --host 0.0.0.0 --port 8081 --device Vulkan0

Confirm GPU is active in the output:
ggml_vulkan: Found 1 Vulkan device(s)

ggml_vulkan: 0 = [YOUR GPU NAME] | VRAM: [X]MB

llama server listening at http://0.0.0.0:8081/

If you see 3–5 tok/s with no ggml_vulkan line — it fell back to CPU. The --device Vulkan0 flag is missing or the build didn't include Vulkan.

Part 5 — Build the image engine (stable-diffusion.cpp)

Open Developer PowerShell for VS 2022:

cd E:\
git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
mkdir build
cd build
cmake .. -DGGML_VULKAN=ON -DCMAKE_BUILD_TYPE=Release
cmake --build . --config Release -j8

The --recursive flag is required — it pulls the ggml submodule. Do not skip it.

Successful build ends with:
-- Found Vulkan: C:/VulkanSDK/.../vulkan-1.lib

[100%] Built target sd-server

Part 6 — Get an image model

Download DreamShaper 8 from Civitai in .safetensors format. SD 1.5 model — fast, low VRAM, works on anything with 4GB+.

Save on SSD/NVMe. Then convert to GGUF (optional but recommended):

E:\stable-diffusion.cpp\build\bin\Release\sd-cli.exe -M convert -m "E:\models\DreamShaper_8.safetensors" -o "E:\models\dreamshaper8.gguf" --type q8_0

Part 7 — Start the image server

Open regular PowerShell:

E:
cd "E:\stable-diffusion.cpp\build\bin\Release"
sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 -m "E:\models\dreamshaper8.gguf"

Confirm GPU detected:
ggml_vulkan: Found 1 Vulkan device(s)

ggml_vulkan: 0 = [YOUR GPU NAME] | VRAM: [X]MB

Server listening on http://0.0.0.0:7860/

Flag note: Older builds use --host and --port. Newer builds (master-600+) use --listen-ip and --listen-port. Run sd-server.exe --help if you get an unknown argument error.

Part 8 — Install OpenWebUI

Open regular PowerShell:

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

Open http://localhost:3000 in your browser. Create an admin account on first launch.

Part 9 — Connect LLM to OpenWebUI

Go to: Admin Panel → Settings → Connections

Under OpenAI API, click +:

URL: http://host.docker.internal:8081/v1
API Key: sk-local

Click the refresh icon — green badge confirms the connection.

Why host.docker.internal and not 127.0.0.1: Docker runs in an isolated network. From inside the container, 127.0.0.1 points to the container itself, not your machine. host.docker.internal is Docker's built-in bridge to the host.

Part 10 — Connect image server to OpenWebUI

Go to: Admin Panel → Settings → Images

Engine: Automatic1111
URL: http://192.168.x.x:7860/ — use your machine's actual LAN IP (find it with ipconfig in CMD), NOT 127.0.0.1, with trailing slash

Green badge = connected.

Part 11 — Fix Windows Firewall

Windows Defender blocks Docker's internal subnet (172.x.x.x) by default. Without this rule, OpenWebUI can't reach sd-server even with the correct IP.

Open PowerShell as Administrator:

New-NetFirewallRule -DisplayName "sd-server local" -Direction Inbound -Protocol TCP -LocalPort 7860 -Action Allow

Restart sd-server after adding the rule.

Part 12 — Automate startup

Save this as start_ai.bat on your Desktop:

@echo off
title Local AI Stack
cls
taskkill /f /im llama-server.exe 2>nul
taskkill /f /im sd-server.exe 2>nul
timeout /t 2 /nobreak >nul

start "LLM Server" C:\llama.cpp\build\bin\Release\llama-server.exe -m "E:\models\your-model.gguf" --host 0.0.0.0 --port 8081 --device Vulkan0

timeout /t 3 /nobreak >nul

E:
cd "E:\stable-diffusion.cpp\build\bin\Release"
sd-server.exe --listen-ip 0.0.0.0 --listen-port 7860 -m "E:\models\dreamshaper8.gguf"
pause

CMD rules that matter:

Never use .\ before executables in CMD — it breaks execution
Always jump drive with E: before cd — CMD doesn't change drives automatically
taskkill at the start clears stuck processes holding VRAM or ports

Troubleshooting

Image generation returns no results / terminal freezes
Known bug in sd-server with random seed (-1). Fix: set a fixed integer seed in OpenWebUI advanced image options — 42, 1337, any number works.

LLM running at 3–5 tok/s
It fell back to CPU. Confirm --device Vulkan0 is in your llama-server command and the build included -DGGML_VULKAN=ON.

OpenWebUI can't reach the servers
Two possible causes: wrong IP (use host.docker.internal for LLM, LAN IP for image server) or Windows Firewall blocking port 7860. See Part 11.

FLUX model fails to load with "new_sd_ctx_t failed"
You downloaded a GGUF from the city96 repository — those only work in ComfyUI. For sd-server, download FLUX weights from the leejet repository on HuggingFace instead.

DirectML / ROCm
DirectML crashes with OpaqueTensorImpl errors and hasn't had a meaningful update since Sep 2024. ROCm dropped support for older AMD architectures in v5.x and has no Windows support. Both are dead ends — Vulkan is the working path.

Performance reference

Tested on an AMD RX 580 8GB (2017, officially unsupported for AI) as a worst-case baseline:

Workload	Result
LLM — Mistral 7B Q4_K_M	17–18 tok/s via Vulkan (vs 3–5 tok/s CPU-only)
Image — SD 1.5 DreamShaper 512x512 20 steps	~72 seconds
Image — FLUX.1-schnell hybrid GPU+CPU 1024x1024	~14 minutes

If your GPU is newer or has more VRAM, expect better numbers. On 4GB VRAM, stick to 512x512 for image generation.

Questions about specific hardware, error messages, or model recommendations — reply here.

wbruna · 2026-06-21T14:46:50Z

wbruna
Jun 21, 2026

Compilation (building the engines)

Why not trying the pre-compiled releases first? Both llama.cpp and stable-diffusion.cpp provide Vulkan binaries.

Alternatively, if you are already using Docker for OpenWebUI, you could run the official Docker images for llama.cpp and stable-diffusion.cpp too.

Known bug in sd-server with random seed (-1). Fix: set a fixed integer seed in OpenWebUI advanced image options — 42, 1337, any number works.

If you are referring to a known sd.cpp bug, please link to the issue, so this section can be removed when it gets fixed.

This guide would also benefit from direct links to the models: hunting them through huggingface or civitai can take a lot of time, especially for beginners.

1 reply

aivisionslab-studios Jun 21, 2026
Author

Really good feedback, thanks for taking the time — going point by point:
Pre-compiled releases: Fair point, and honestly the guide should lead with that as the default path. Source build was chosen mainly to guarantee the Vulkan flag is actually set (some release artifacts only ship CPU/CUDA variants depending on platform) and to have full control over build flags, but for most people the pre-compiled Vulkan binaries are simpler and skip the whole VS Build Tools + CMake setup entirely. I'll restructure this section to recommend the binary first, with source build as the fallback for anyone who needs a specific flag combination or the binaries don't cover their platform.
Docker images for llama.cpp/stable-diffusion.cpp: Agreed this simplifies things if you're already in Docker for OpenWebUI. The reason the guide kept the GPU-bound services native on Windows is that Docker Desktop's GPU passthrough is built mainly around the NVIDIA/CUDA path through WSL2 — Vulkan passthrough for AMD cards specifically has been less consistent in my testing. I'll add the official Docker images as an alternative path and note that caveat, since for NVIDIA users it'll likely just work.
Seed -1 bug / issue link: Good catch, and I went looking before replying — I couldn't find an existing tracked issue in the repo that matches this specific behavior. That means either it's already fixed upstream and I'm describing stale behavior, or it was never filed. I'll re-test on the latest build, and if it's still reproducible I'll file an issue properly and link it here; if it's fixed, I'll pull the section. Appreciate you pushing on this instead of letting an uncited claim sit.
Direct model links: Agreed, will add direct HF links for the specific models referenced (DreamShaper 8, the leejet FLUX weights, etc.) so people aren't hunting through search results.
Will push a revision with all of this. Thanks for the thorough read.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running sd-server on any legacy GPU via Vulkan + OpenWebUI integration — full Windows walkthrough (no CUDA/ROCm) #1685

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running sd-server on any legacy GPU via Vulkan + OpenWebUI integration — full Windows walkthrough (no CUDA/ROCm) #1685

Uh oh!

aivisionslab-studios Jun 21, 2026

Local AI stack on any GPU via Vulkan — LLM + image generation + OpenWebUI (Windows, no CUDA/ROCm)

How this works

Terminal rules — read this first

Part 1 — Install prerequisites

Part 2 — Build the LLM engine (llama.cpp)

Part 3 — Get an LLM model

Part 4 — Start the LLM server

Part 5 — Build the image engine (stable-diffusion.cpp)

Part 6 — Get an image model

Part 7 — Start the image server

Part 8 — Install OpenWebUI

Part 9 — Connect LLM to OpenWebUI

Part 10 — Connect image server to OpenWebUI

Part 11 — Fix Windows Firewall

Part 12 — Automate startup

Troubleshooting

Performance reference

Replies: 1 comment · 1 reply

Uh oh!

wbruna Jun 21, 2026

Uh oh!

aivisionslab-studios Jun 21, 2026 Author

aivisionslab-studios
Jun 21, 2026

Replies: 1 comment 1 reply

wbruna
Jun 21, 2026

aivisionslab-studios Jun 21, 2026
Author