fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds#9626
fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds#9626russell wants to merge 2 commits intomudler:masterfrom
Conversation
| export CC=$(ROCM_HOME)/llvm/bin/clang | ||
| AMDGPU_TARGETS?=gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201 | ||
| ifeq ($(strip $(AMDGPU_TARGETS)),) | ||
| $(error AMDGPU_TARGETS is empty — set it to a comma-separated list of gfx targets e.g. gfx1100,gfx1101) |
There was a problem hiding this comment.
this is to turn any empty value into an error
so you'll be able to pickup misconfigurations where the default value in this file will be ignored because an empty string is being passed in
There was a problem hiding this comment.
Good catch, this is a very useful addition to catch regressions. Thanks!
There was a problem hiding this comment.
This now seems to fire on the CI for this PR https://github.com/mudler/LocalAI/actions/runs/25190720180/job/73859791925
d64 = arm64 ']'
0.061 + '[' hipblas = hipblas ']'
0.061 + cd /LocalAI/backend/cpp/llama-cpp
0.061 + make llama-cpp-fallback
0.062 Makefile:38: *** AMDGPU_TARGETS is empty — set it to a comma-separated list of gfx targets e.g. gfx1100,gfx1101. Stop.
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile.llama-cpp:258
--------------------
257 |
258 | >>> RUN <<'EOT' bash
259 | >>> set -euxo pipefail
260 | >>>
There was a problem hiding this comment.
Thanks, sorry, I missed the other workflow file that runs on branches, but now those runs seem to be hung
Edit: sorry here is a link https://github.com/mudler/LocalAI/actions/runs/25203597690/job/73899405891?pr=9626#logs
There was a problem hiding this comment.
Actually, i can just force push to trigger another run
There was a problem hiding this comment.
I've been retriggering manually too, but seems there are issues on the Ubuntu repositories and unrelated to the PR changes. This happens time to time, will retrigger it and make sure it gets in 👍
b9e3a02 to
8180fb4
Compare
399c1de wired amdgpu-targets through the backend_build workflow_call interface, intending the input's default value to cover matrix entries that don't specify targets. However, GitHub Actions only applies a workflow_call input default when the caller omits the input entirely. When backend.yml passes `amdgpu-targets: ${{ matrix.amdgpu-targets }}` and the matrix entry has no amdgpu-targets key, the expression evaluates to an empty string, which is treated as an explicit value — bypassing the default. The result is Docker receiving AMDGPU_TARGETS="" which in turn causes Make's ?= default to be skipped (since the variable is already set in the environment, even to empty), and cmake gets -DAMDGPU_TARGETS= with no targets, so the HIP backend compiles for an indeterminate target rather than the intended GPU list. Fix this at two levels: 1. backend.yml: use a || fallback in the expression so that an undefined matrix.amdgpu-targets never reaches the reusable workflow as an empty string. The target list is the canonical default and lives here. 2. backend_build.yml: remove the now-misleading default value from the input declaration. The default never fired due to the above bug, so keeping it implied a guarantee that didn't exist. 3. backend/cpp/llama-cpp/Makefile: add an explicit $(error ...) guard after the ?= assignment so that if AMDGPU_TARGETS is empty (whether from environment or any future CI wiring mistake) the build fails immediately with a clear message rather than silently producing a binary compiled for an unknown GPU target. Assisted-by: Claude Code:claude-sonnet-4-6 Signed-off-by: Russell Sim <rsl@simopolis.xyz>
8180fb4 to
83de3c8
Compare
Reduces ~30% flakiness from transient network timeouts against Ubuntu mirrors and the git-core PPA. Also drops the redundant apt-get update between software-properties-common install and add-apt-repository. Assisted-by: Claude Sonnet 4.6 [claude-code] Signed-off-by: Russell Sim <rsl@simopolis.xyz>
b385b47 to
5eb75f0
Compare
TLDR; matrix valued caused a whitespace string to be passed through,
I'm validating your images by checking them with
The latest image is still broken :( The test above might seem janky, but seems to work quite well, my custom complied value shows the 2 gpus i target.
399c1de wired amdgpu-targets through the backend_build workflow_call interface, intending the input's default value to cover matrix entries that don't specify targets. However, GitHub Actions only applies a workflow_call input default when the caller omits the input entirely. When backend.yml passes
amdgpu-targets: ${{ matrix.amdgpu-targets }}and the matrix entry has no amdgpu-targets key, the expression evaluates to an empty string, which is treated as an explicit value — bypassing the default. The result is Docker receiving AMDGPU_TARGETS="" which in turn causes Make's ?= default to be skipped (since the variable is already set in the environment, even to empty), and cmake gets -DAMDGPU_TARGETS= with no targets, so the HIP backend compiles for an indeterminate target rather than the intended GPU list.Fix this at two levels:
backend.yml: use a || fallback in the expression so that an undefined matrix.amdgpu-targets never reaches the reusable workflow as an empty string. The target list is the canonical default and lives here.
backend_build.yml: remove the now-misleading default value from the input declaration. The default never fired due to the above bug, so keeping it implied a guarantee that didn't exist.
backend/cpp/llama-cpp/Makefile: add an explicit $(error ...) guard after the ?= assignment so that if AMDGPU_TARGETS is empty (whether from environment or any future CI wiring mistake) the build fails immediately with a clear message rather than silently producing a binary compiled for an unknown GPU target.
Assisted-by: Claude Code:claude-sonnet-4-6
Signed commits