Git commit
563137a (master-691-563137a)
Operating System & Version
Debian 13
GGML backends
Vulkan, HIP
Command-line arguments used
sd-cli --backend ROCm0 -p 'sunset' --diffusion-model z_image_turbo-Q8_0.gguf --llm Qwen3-4B-UD-Q4_K_XL.gguf --vae ae_bf16.safetensors --cfg-scale 1 --steps 9 -W 1024 -H 1024 --fa --mmap --offload-to-cpu
Steps to reproduce
Since master-691-563137a, offloading with mmap is using a lot of resident memory, instead of the expected shared memory. As measured by top during inference:
|
RES |
SHR |
| master-690 |
7,7g |
<1.0G |
| master-690 --mmap |
10.0G |
9.4G |
| master-691 |
7,6g |
<1.0G |
| master-691 --mmap |
16.1G |
<1.0G |
(note that it's expected that RES shows up larger with mmap, since it counts the memory-mapped areas too, and those are never released from memory; but it shouldn't result in a plain increase of resident memory usage)
The log keeps showing mmap being used for the models:
[INFO ] stable-diffusion.cpp:491 - Version: Z-Image
[DEBUG] model_loader.cpp:813 - using mmap for I/O
[INFO ] model_loader.cpp:819 - using mmap for 'z_image_turbo-Q8_0.gguf'
[DEBUG] model_loader.cpp:813 - using mmap for I/O
[INFO ] model_loader.cpp:819 - using mmap for 'Qwen3-4B-UD-Q4_K_XL.gguf'
[DEBUG] model_loader.cpp:813 - using mmap for I/O
[INFO ] model_loader.cpp:819 - using mmap for 'ae_bf16.safetensors'
Later releases show different values, but similar behavior: increased RES usage with low shared memory.
What you expected to happen
shared memory being used, to reduce RAM pressure
What actually happened
increased RAM usage
Logs / error messages / stack trace
No response
Additional context / environment details
No response
Git commit
563137a (
master-691-563137a)Operating System & Version
Debian 13
GGML backends
Vulkan, HIP
Command-line arguments used
sd-cli --backend ROCm0 -p 'sunset' --diffusion-model z_image_turbo-Q8_0.gguf --llm Qwen3-4B-UD-Q4_K_XL.gguf --vae ae_bf16.safetensors --cfg-scale 1 --steps 9 -W 1024 -H 1024 --fa --mmap --offload-to-cpu
Steps to reproduce
Since
master-691-563137a, offloading with mmap is using a lot of resident memory, instead of the expected shared memory. As measured bytopduring inference:(note that it's expected that RES shows up larger with mmap, since it counts the memory-mapped areas too, and those are never released from memory; but it shouldn't result in a plain increase of resident memory usage)
The log keeps showing mmap being used for the models:
Later releases show different values, but similar behavior: increased RES usage with low shared memory.
What you expected to happen
shared memory being used, to reduce RAM pressure
What actually happened
increased RAM usage
Logs / error messages / stack trace
No response
Additional context / environment details
No response