Skip to content

fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds#9626

Open
russell wants to merge 2 commits intomudler:masterfrom
russell:fix-default-amdgpu-variants
Open

fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds#9626
russell wants to merge 2 commits intomudler:masterfrom
russell:fix-default-amdgpu-variants

Conversation

@russell
Copy link
Copy Markdown
Contributor

@russell russell commented Apr 30, 2026

TLDR; matrix valued caused a whitespace string to be passed through,

I'm validating your images by checking them with

$ strings rocm-llama-cpp-development/llama-cpp-fallback | grep gfx | sort | uniq 
amdgcn-amd-amdhsa--gfx906
hipv4-amdgcn-amd-amdhsa--gfx906

The latest image is still broken :( The test above might seem janky, but seems to work quite well, my custom complied value shows the 2 gpus i target.

399c1de wired amdgpu-targets through the backend_build workflow_call interface, intending the input's default value to cover matrix entries that don't specify targets. However, GitHub Actions only applies a workflow_call input default when the caller omits the input entirely. When backend.yml passes amdgpu-targets: ${{ matrix.amdgpu-targets }} and the matrix entry has no amdgpu-targets key, the expression evaluates to an empty string, which is treated as an explicit value — bypassing the default. The result is Docker receiving AMDGPU_TARGETS="" which in turn causes Make's ?= default to be skipped (since the variable is already set in the environment, even to empty), and cmake gets -DAMDGPU_TARGETS= with no targets, so the HIP backend compiles for an indeterminate target rather than the intended GPU list.

Fix this at two levels:

  1. backend.yml: use a || fallback in the expression so that an undefined matrix.amdgpu-targets never reaches the reusable workflow as an empty string. The target list is the canonical default and lives here.

  2. backend_build.yml: remove the now-misleading default value from the input declaration. The default never fired due to the above bug, so keeping it implied a guarantee that didn't exist.

  3. backend/cpp/llama-cpp/Makefile: add an explicit $(error ...) guard after the ?= assignment so that if AMDGPU_TARGETS is empty (whether from environment or any future CI wiring mistake) the build fails immediately with a clear message rather than silently producing a binary compiled for an unknown GPU target.

Assisted-by: Claude Code:claude-sonnet-4-6

Signed commits

  • Yes, I signed my commits.

export CC=$(ROCM_HOME)/llvm/bin/clang
AMDGPU_TARGETS?=gfx908,gfx90a,gfx942,gfx950,gfx1030,gfx1100,gfx1101,gfx1102,gfx1151,gfx1200,gfx1201
ifeq ($(strip $(AMDGPU_TARGETS)),)
$(error AMDGPU_TARGETS is empty — set it to a comma-separated list of gfx targets e.g. gfx1100,gfx1101)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is to turn any empty value into an error

so you'll be able to pickup misconfigurations where the default value in this file will be ignored because an empty string is being passed in

Copy link
Copy Markdown
Owner

@mudler mudler May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, this is a very useful addition to catch regressions. Thanks!

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now seems to fire on the CI for this PR https://github.com/mudler/LocalAI/actions/runs/25190720180/job/73859791925

d64 = arm64 ']'
0.061 + '[' hipblas = hipblas ']'
0.061 + cd /LocalAI/backend/cpp/llama-cpp
0.061 + make llama-cpp-fallback
0.062 Makefile:38: *** AMDGPU_TARGETS is empty — set it to a comma-separated list of gfx targets e.g. gfx1100,gfx1101.  Stop.
------
WARNING: No output specified with docker-container driver. Build result will only remain in the build cache. To push result image into registry use --push or to load image into docker use --load
Dockerfile.llama-cpp:258
--------------------
 257 |     
 258 | >>> RUN <<'EOT' bash
 259 | >>> set -euxo pipefail
 260 | >>> 

Copy link
Copy Markdown
Contributor Author

@russell russell May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, sorry, I missed the other workflow file that runs on branches, but now those runs seem to be hung

Edit: sorry here is a link https://github.com/mudler/LocalAI/actions/runs/25203597690/job/73899405891?pr=9626#logs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, i can just force push to trigger another run

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been retriggering manually too, but seems there are issues on the Ubuntu repositories and unrelated to the PR changes. This happens time to time, will retrigger it and make sure it gets in 👍

@russell russell force-pushed the fix-default-amdgpu-variants branch from b9e3a02 to 8180fb4 Compare May 1, 2026 05:26
399c1de wired amdgpu-targets through the backend_build workflow_call
interface, intending the input's default value to cover matrix entries
that don't specify targets. However, GitHub Actions only applies a
workflow_call input default when the caller omits the input entirely.
When backend.yml passes `amdgpu-targets: ${{ matrix.amdgpu-targets }}`
and the matrix entry has no amdgpu-targets key, the expression evaluates
to an empty string, which is treated as an explicit value — bypassing
the default. The result is Docker receiving AMDGPU_TARGETS="" which in
turn causes Make's ?= default to be skipped (since the variable is
already set in the environment, even to empty), and cmake gets
-DAMDGPU_TARGETS= with no targets, so the HIP backend compiles for an
indeterminate target rather than the intended GPU list.

Fix this at two levels:

1. backend.yml: use a || fallback in the expression so that an undefined
   matrix.amdgpu-targets never reaches the reusable workflow as an empty
   string. The target list is the canonical default and lives here.

2. backend_build.yml: remove the now-misleading default value from the
   input declaration. The default never fired due to the above bug, so
   keeping it implied a guarantee that didn't exist.

3. backend/cpp/llama-cpp/Makefile: add an explicit $(error ...) guard
   after the ?= assignment so that if AMDGPU_TARGETS is empty (whether
   from environment or any future CI wiring mistake) the build fails
   immediately with a clear message rather than silently producing a
   binary compiled for an unknown GPU target.

Assisted-by: Claude Code:claude-sonnet-4-6
Signed-off-by: Russell Sim <rsl@simopolis.xyz>
@russell russell force-pushed the fix-default-amdgpu-variants branch from 8180fb4 to 83de3c8 Compare May 1, 2026 12:56
Reduces ~30% flakiness from transient network timeouts against Ubuntu
mirrors and the git-core PPA. Also drops the redundant apt-get update
between software-properties-common install and add-apt-repository.

Assisted-by: Claude Sonnet 4.6 [claude-code]
Signed-off-by: Russell Sim <rsl@simopolis.xyz>
@russell russell force-pushed the fix-default-amdgpu-variants branch from b385b47 to 5eb75f0 Compare May 1, 2026 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants