Running sd-server on any legacy GPU via Vulkan + OpenWebUI integration — full Windows walkthrough (no CUDA/ROCm) #1685
Replies: 1 comment 1 reply
-
Why not trying the pre-compiled releases first? Both llama.cpp and stable-diffusion.cpp provide Vulkan binaries. Alternatively, if you are already using Docker for OpenWebUI, you could run the official Docker images for llama.cpp and stable-diffusion.cpp too.
If you are referring to a known sd.cpp bug, please link to the issue, so this section can be removed when it gets fixed. This guide would also benefit from direct links to the models: hunting them through huggingface or civitai can take a lot of time, especially for beginners. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Local AI stack on any GPU via Vulkan — LLM + image generation + OpenWebUI (Windows, no CUDA/ROCm)
This is a complete local AI stack — text generation + image generation — running entirely on your GPU through Vulkan. No CUDA. No ROCm. No cloud. Works on AMD, Nvidia, and Intel GPUs, including older cards dropped from official AI support.
What you'll have at the end: a chat interface (OpenWebUI) connected to a local LLM and a local image generator, both accelerated by your GPU.
How this works
The standard AI stack (PyTorch, ROCm, CUDA) locks you out if your GPU isn't on the supported list. This guide goes around it entirely.
llama.cpp and stable-diffusion.cpp have a Vulkan compute backend — an open graphics standard that every GPU vendor ships drivers for, even for hardware they've officially dropped from their AI stacks. If your GPU has a working Vulkan driver, it runs inference.
Terminal rules — read this first
This guide uses two different terminals for different steps. Using the wrong one is the most common cause of errors.
Developer PowerShell for VS 2022— search it in the Start menu. This is NOT regular PowerShell.Part 1 — Install prerequisites
Install these in order. All free.
C:\VulkanSDK\). Do not change the path.After installing, validate Vulkan detected your GPU. Open regular PowerShell:
vulkaninfo --summaryYour GPU should appear under Physical Devices. If the command isn't found — restart your machine and try again. If your GPU doesn't appear — update your GPU driver.
Part 2 — Build the LLM engine (llama.cpp)
Open
Developer PowerShell for VS 2022and run:Replace
-j8with your core count — more threads = faster compile.Confirm your GPU was detected:
Expected output:
Vulkan0: [YOUR GPU NAME]
If nothing appears — your build didn't include Vulkan. Verify the cmake line included
-DGGML_VULKAN=ON.Part 3 — Get an LLM model
You need a
.gguffile. Download from HuggingFace — search for any model with Q4_K_M quantization.Save the
.gguffile on SSD or NVMe — not HDD. Load time difference is significant.Part 4 — Start the LLM server
Open regular PowerShell:
Confirm GPU is active in the output:
ggml_vulkan: Found 1 Vulkan device(s)
ggml_vulkan: 0 = [YOUR GPU NAME] | VRAM: [X]MB
llama server listening at http://0.0.0.0:8081/
If you see 3–5 tok/s with no
ggml_vulkanline — it fell back to CPU. The--device Vulkan0flag is missing or the build didn't include Vulkan.Part 5 — Build the image engine (stable-diffusion.cpp)
Open
Developer PowerShell for VS 2022:The
--recursiveflag is required — it pulls the ggml submodule. Do not skip it.Successful build ends with:
-- Found Vulkan: C:/VulkanSDK/.../vulkan-1.lib
[100%] Built target sd-server
Part 6 — Get an image model
Download DreamShaper 8 from Civitai in
.safetensorsformat. SD 1.5 model — fast, low VRAM, works on anything with 4GB+.Save on SSD/NVMe. Then convert to GGUF (optional but recommended):
Part 7 — Start the image server
Open regular PowerShell:
Confirm GPU detected:
ggml_vulkan: Found 1 Vulkan device(s)
ggml_vulkan: 0 = [YOUR GPU NAME] | VRAM: [X]MB
Server listening on http://0.0.0.0:7860/
Part 8 — Install OpenWebUI
Open regular PowerShell:
Open http://localhost:3000 in your browser. Create an admin account on first launch.
Part 9 — Connect LLM to OpenWebUI
Go to: Admin Panel → Settings → Connections
Under OpenAI API, click
+:http://host.docker.internal:8081/v1sk-localClick the refresh icon — green badge confirms the connection.
Part 10 — Connect image server to OpenWebUI
Go to: Admin Panel → Settings → Images
Automatic1111http://192.168.x.x:7860/— use your machine's actual LAN IP (find it withipconfigin CMD), NOT127.0.0.1, with trailing slashGreen badge = connected.
Part 11 — Fix Windows Firewall
Windows Defender blocks Docker's internal subnet (172.x.x.x) by default. Without this rule, OpenWebUI can't reach sd-server even with the correct IP.
Open PowerShell as Administrator:
Restart sd-server after adding the rule.
Part 12 — Automate startup
Save this as
start_ai.baton your Desktop:CMD rules that matter:
.\before executables in CMD — it breaks executionE:beforecd— CMD doesn't change drives automaticallytaskkillat the start clears stuck processes holding VRAM or portsTroubleshooting
Image generation returns no results / terminal freezes
Known bug in sd-server with random seed (
-1). Fix: set a fixed integer seed in OpenWebUI advanced image options —42,1337, any number works.LLM running at 3–5 tok/s
It fell back to CPU. Confirm
--device Vulkan0is in your llama-server command and the build included-DGGML_VULKAN=ON.OpenWebUI can't reach the servers
Two possible causes: wrong IP (use
host.docker.internalfor LLM, LAN IP for image server) or Windows Firewall blocking port 7860. See Part 11.FLUX model fails to load with "new_sd_ctx_t failed"
You downloaded a GGUF from the city96 repository — those only work in ComfyUI. For sd-server, download FLUX weights from the leejet repository on HuggingFace instead.
DirectML / ROCm
DirectML crashes with
OpaqueTensorImplerrors and hasn't had a meaningful update since Sep 2024. ROCm dropped support for older AMD architectures in v5.x and has no Windows support. Both are dead ends — Vulkan is the working path.Performance reference
Tested on an AMD RX 580 8GB (2017, officially unsupported for AI) as a worst-case baseline:
If your GPU is newer or has more VRAM, expect better numbers. On 4GB VRAM, stick to 512x512 for image generation.
Questions about specific hardware, error messages, or model recommendations — reply here.
Beta Was this translation helpful? Give feedback.
All reactions