Using a 7900xt with 20gb vram, having tried for HIP and Vulkan backends, I cannot get noticeable speed increases with the default recommended settings in the guide:
"$BEE_SERVER" --host 0.0.0.0 --port $PORT
-m $MODEL -md $DRAFT
--jinja --chat-template-kwargs '{"enable_thinking":true}'
-ngld all -ngl all -np 1 --reasoning on --cache-ram 0
--spec-type dflash --spec-dflash-cross-ctx 512
--kv-unified -b 2048 -ub 256
--spec-draft-n-max 3
--log-timestamps --log-prefix --log-colors off
--no-mmap --mlock --no-host \
--temp 0.6 --top-k 20 --min-p 0.0
-ctk turbo3 -ctv turbo3
-fa on --metrics -c 64000
MODEL=Qwen3.6-27B-Q4_K_M.gguf
DRAFT=dflash-draft-3.6-q4_k_m.gguf
I have context set at 64k because this is the minimum that hermes requires for usage.
Using a 7900xt with 20gb vram, having tried for HIP and Vulkan backends, I cannot get noticeable speed increases with the default recommended settings in the guide:
"$BEE_SERVER" --host 0.0.0.0 --port $PORT
-m $MODEL -md $DRAFT
--jinja --chat-template-kwargs '{"enable_thinking":true}'
-ngld all -ngl all -np 1 --reasoning on --cache-ram 0
--spec-type dflash --spec-dflash-cross-ctx 512
--kv-unified -b 2048 -ub 256
--spec-draft-n-max 3
--log-timestamps --log-prefix --log-colors off
--no-mmap --mlock --no-host \
--temp 0.6 --top-k 20 --min-p 0.0
-ctk turbo3 -ctv turbo3
-fa on --metrics -c 64000
MODEL=Qwen3.6-27B-Q4_K_M.gguf
DRAFT=dflash-draft-3.6-q4_k_m.gguf
I have context set at 64k because this is the minimum that hermes requires for usage.