fix: disable VMM on WSL2 to prevent cuMemAddressReserve crash#610
Open
wzgrx wants to merge 2 commits into
Open
fix: disable VMM on WSL2 to prevent cuMemAddressReserve crash#610wzgrx wants to merge 2 commits into
wzgrx wants to merge 2 commits into
Conversation
WSL2/WDDM drivers fail to reserve the 32GB virtual memory pool that llama.cpp attempts to allocate by default, causing a hard crash with 'cuMemAddressReserve failure' (Issue withcatai#580). This change detects WSL2 environments and automatically passes -DGGML_CUDA_NO_VMM=ON to CMake when building for CUDA, ensuring stable operation on WSL2 without user intervention. Fixes: withcatai#580
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When running on WSL2 (especially with CUDA),
node-llama-cppcompilation or runtime often fails with a hard crash:This is documented in Issue #580.
The root cause is that WSL2/WDDM drivers cannot allocate the 32GB virtual memory pool that
llama.cppattempts to reserve by default. Even if physical VRAM is available, the virtual address reservation fails, leading to an abort.Solution
This PR automatically detects WSL2 environments and injects
-DGGML_CUDA_NO_VMM=ONinto the CMake build options when compiling for CUDA.Changes
wsl2flag togetPlatformInfo()by checking/proc/sys/kernel/osreleaseformicrosoftorwslsubstrings.BuildCommand.tsto setGGML_CUDA_NO_VMM=ONwhengpu=cudaandplatformInfo.wsl2is true.Testing
Fixes #580