Skip to content

fix: disable VMM on WSL2 to prevent cuMemAddressReserve crash#610

Open
wzgrx wants to merge 2 commits into
withcatai:masterfrom
wzgrx:fix/wsl2-vmm-crash
Open

fix: disable VMM on WSL2 to prevent cuMemAddressReserve crash#610
wzgrx wants to merge 2 commits into
withcatai:masterfrom
wzgrx:fix/wsl2-vmm-crash

Conversation

@wzgrx
Copy link
Copy Markdown

@wzgrx wzgrx commented May 17, 2026

Problem

When running on WSL2 (especially with CUDA), node-llama-cpp compilation or runtime often fails with a hard crash:

CUDA error: out of memory
cuMemAddressReserve failure

This is documented in Issue #580.

The root cause is that WSL2/WDDM drivers cannot allocate the 32GB virtual memory pool that llama.cpp attempts to reserve by default. Even if physical VRAM is available, the virtual address reservation fails, leading to an abort.

Solution

This PR automatically detects WSL2 environments and injects -DGGML_CUDA_NO_VMM=ON into the CMake build options when compiling for CUDA.

Changes

  1. WSL2 Detection: Added wsl2 flag to getPlatformInfo() by checking /proc/sys/kernel/osrelease for microsoft or wsl substrings.
  2. Automatic VMM Disabling: Modified BuildCommand.ts to set GGML_CUDA_NO_VMM=ON when gpu=cuda and platformInfo.wsl2 is true.

Testing

  • Tested on WSL2 (Ubuntu 24.04, RTX 5090, CUDA 13.2).
  • Embedding tasks now complete successfully without OOM crashes.
  • GPU offloading remains fully functional.

Fixes #580

wzgrx added 2 commits May 17, 2026 15:02
WSL2/WDDM drivers fail to reserve the 32GB virtual memory pool
that llama.cpp attempts to allocate by default, causing a hard crash
with 'cuMemAddressReserve failure' (Issue withcatai#580).

This change detects WSL2 environments and automatically passes
-DGGML_CUDA_NO_VMM=ON to CMake when building for CUDA,
ensuring stable operation on WSL2 without user intervention.

Fixes: withcatai#580
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Windows CUDA: cuMemAddressReserve failure in VMM pool causes hard abort (GGML_CUDA_NO_VMM workaround)

1 participant