Skip to content

Add config selection method when more than ipxeBootConfigList/httpBootConfigList returned#312

Open
atd9876 wants to merge 6 commits intomainfrom
272-improve-the-handling-of-bootconfiguration-when-multiple-are-present
Open

Add config selection method when more than ipxeBootConfigList/httpBootConfigList returned#312
atd9876 wants to merge 6 commits intomainfrom
272-improve-the-handling-of-bootconfiguration-when-multiple-are-present

Conversation

@atd9876
Copy link
Copy Markdown
Contributor

@atd9876 atd9876 commented Apr 20, 2026

Proposed Changes

  • Add Config selector for when more than one ipxeBootConfigList/httpBootConfigList returned

Fixes #
272

Summary by CodeRabbit

  • New Features

    • Generic boot-config selection that deterministically picks the correct config when multiple candidates exist, preferring maintenance configs during maintenance mode.
  • Bug Fixes

    • Improved error handling: handlers log clear errors and return 500 when selection fails; status updates now reflect the actual selected config instead of assuming a first item.
  • Tests

    • Expanded unit tests and test setup to cover selection logic, owner-resolution, orphaned candidates, and related error cases.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaced implicit Items[0] usage with a generic selectBootConfig that resolves owning ServerBootConfiguration and target Server to choose between maintenance vs workload configs; handlers now return 500 on selection failure and update status with the selected config object.

Changes

Cohort / File(s) Summary
Boot server handlers
server/bootserver.go
Replaced direct Items[0] usage with selectBootConfig(...) in IPXE/HTTP/ignition handlers; added error handling returning 500 on selection failure; pass the selected config into status updates.
Boot config selection logic
server/config_selector.go
New generic selection utilities: selectBootConfig[T], preferredBootConfigIndex, resolveServer, ownerSBCName, and toPointers — resolve SBC owners, discard orphaned configs, and prefer maintenance vs workload configs.
Test setup
server/bootserver_suit_test.go
Added newTestClient(...) helper creating a fake controller-runtime client with corev1, bootv1alpha1, and metalv1alpha1 schemes; register metal types in BeforeSuite.
Unit tests
server/bootserver_test.go
Added ConfigSelector test suite covering single-item passthrough, maintenance vs workload preference, orphaned-candidate handling, and SBC/Server resolution error cases.

Sequence Diagram(s)

sequenceDiagram
    participant Handler as Handler (bootserver.go)
    participant Selector as selectBootConfig
    participant K8s as K8s Client
    participant SBC as ServerBootConfiguration
    participant Server as metalv1alpha1.Server

    Handler->>Selector: selectBootConfig(ctx, client, log, configs)
    alt single item
        Selector-->>Handler: return config
    else multiple items
        Selector->>SBC: extract owner SBC names (from OwnerReferences)
        loop for each owner
            Selector->>K8s: Get SBC by name
            K8s-->>Selector: SBC or NotFound
            alt SBC found
                Selector->>K8s: Get Server referenced by SBC
                K8s-->>Selector: Server or NotFound
            end
        end
        Selector->>Server: inspect spec.bootConfigurationRef / spec.maintenanceBootConfigurationRef
        alt maintenance ref matches
            Selector-->>Handler: return maintenance-backed config
        else workload ref matches
            Selector-->>Handler: return workload-backed config
        else no valid match
            Selector-->>Handler: return error (orphaned/unresolvable)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

enhancement, size/L

Suggested reviewers

  • hardikdr
  • defo89
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 38.46% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding a config selection method to handle multiple boot config list returns, which is the core purpose of the PR.
Description check ✅ Passed The description matches the template structure and covers the main objective (adding a config selector for multiple ipxeBootConfigList/httpBootConfigList returns) and includes issue reference #272.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 272-improve-the-handling-of-bootconfiguration-when-multiple-are-present

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
server/config_selector.go (2)

138-155: Distinguish NotFound from transient errors in resolveServer.

k8sClient.Get at Line 144 silently continues on any error, including network blips, permission errors, or API-server hiccups. That can hide real infrastructure problems behind the generic “none of the N boot configs have a resolvable ServerBootConfiguration owner” error. Only NotFound should trigger the “try next SBC” fallback; other errors should surface.

♻️ Proposed fix
 import (
 	"context"
 	"fmt"

 	"github.com/go-logr/logr"
 	bootv1alpha1 "github.com/ironcore-dev/boot-operator/api/v1alpha1"
 	metalv1alpha1 "github.com/ironcore-dev/metal-operator/api/v1alpha1"
+	apierrors "k8s.io/apimachinery/pkg/api/errors"
 	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
 	"k8s.io/apimachinery/pkg/types"
 	"sigs.k8s.io/controller-runtime/pkg/client"
 )
 		sbc := &metalv1alpha1.ServerBootConfiguration{}
 		if err := k8sClient.Get(ctx, types.NamespacedName{Name: name, Namespace: namespace}, sbc); err != nil {
-			// This SBC might be an orphan that's already been deleted.
-			// Try the next one.
-			continue
+			if apierrors.IsNotFound(err) {
+				// This SBC might be an orphan that's already been deleted. Try the next one.
+				continue
+			}
+			return nil, fmt.Errorf("failed to get ServerBootConfiguration %q: %w", name, err)
 		}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config_selector.go` around lines 138 - 155, In resolveServer, don't
swallow all errors from the first k8sClient.Get call for
ServerBootConfiguration: when the Get for sbc returns an error, check
apierrors.IsNotFound(err) and only continue to the next sbc in that case; for
any other error (network/API/perm issues) return that error wrapped with context
so it surfaces instead of being treated as an orphan; keep the existing behavior
for the second Get that looks up the Server (returning a wrapped error on
failure).

21-54: Consider consolidating selectIPXEBootConfig and selectHTTPBootConfig via generics.

The two functions are byte-for-byte identical except for the slice element type. Since Go 1.18+ generics are available, a single helper parameterized over client.Object with GetNamespace()/GetOwnerReferences() accessors (both IPXEBootConfig and HTTPBootConfig already implement them) would remove the duplication and keep future tweaks in one place. Optional — the current code is clear enough to ship.

Also minor: items[0].Namespace at Line 30/49 will panic if len(items) == 0. Current callers guard with an empty-list check, but a defensive if len(items) == 0 { return zero, nil-or-err } would make these helpers safe regardless of the caller.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config_selector.go` around lines 21 - 54, Consolidate the nearly
identical selectIPXEBootConfig and selectHTTPBootConfig into one generic
function (e.g., selectBootConfig[T client.Object]) parameterized over the
concrete types IPXEBootConfig and HTTPBootConfig; inside the generic use T's
GetNamespace and GetOwnerReferences (already implemented) to build sbcNames via
ownerSBCName and call preferredBootConfigIndex, then return items[idx]; update
the two original functions to call the generic wrapper. Also add a defensive
check for len(items) == 0 at the top of the generic (return the zero value and
an appropriate error) to avoid panics when accessing items[0]. Ensure references
to ownerSBCName and preferredBootConfigIndex remain unchanged.
server/bootserver.go (1)

139-144: Selector integration looks good.

All four call sites consistently replace the fragile list.Items[0] assumption with selectIPXEBootConfig/selectHTTPBootConfig, log the failure with enough context, and return 500. Symmetric and easy to follow.

One optional thought: for the "all orphaned / no recognized config" failure mode, 404 Not Found arguably describes the situation better than 500 (from the client’s perspective no valid boot config exists for this UUID/IP). Not a blocker — 500 is defensible since it signals a server-side data inconsistency — but worth considering if you want clients/clusters to treat it as a non-retriable lookup miss rather than a transient server error.

Also applies to: 216-221, 328-333, 483-488

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/bootserver.go` around lines 139 - 144, The handlers calling
selectIPXEBootConfig and selectHTTPBootConfig should distinguish "no valid
config found" from other failures and return 404 for the former; update the
selector functions (selectIPXEBootConfig/selectHTTPBootConfig) to return a
sentinel error (e.g., ErrNoConfigFound) or wrap errors so they can be detected
with errors.Is, then in the call sites in bootserver.go (the blocks currently
logging "Failed to select IPXEBootConfig"/"Failed to select HTTPBootConfig")
check if errors.Is(err, ErrNoConfigFound) and respond with http.StatusNotFound,
otherwise log and respond with http.StatusInternalServerError as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/config_selector_test.go`:
- Around line 187-192: The test description is inaccurate: the It block
currently calls preferredBootConfigIndex with sbcNames := []string{"", ""} which
triggers resolveServer and the "resolvable ServerBootConfiguration owner" error,
but does not exercise selectIPXEBootConfig/ownerSBCName on
IPXEBootConfig/HTTPBootConfig objects; update the test by renaming the It
description to reflect the actual behavior (for example: It("returns an error
when all candidate SBC names are empty", ...) ), or alternatively extend the
test to construct IPXEBootConfig/HTTPBootConfig items with empty
ServerBootConfiguration owner refs and pass them through selectIPXEBootConfig so
ownerSBCName is exercised; refer to preferredBootConfigIndex, resolveServer,
selectIPXEBootConfig, and ownerSBCName when making the change.
- Around line 21-46: This test file creates a standalone Ginkgo suite and
duplicate helpers; instead move the tests and helpers into the existing
bootserver_suit_test.go suite so they run under the shared RunSpecs setup:
remove the top-level Describe in config_selector_test.go and relocate its tests
into an existing *_test.go in the same package that already uses RunSpecs, reuse
the existing newTestClient helper (or merge this newTestClient and ownerRef into
the shared helpers in the suite) and drop any standalone scheme/setup
duplication; ensure tests reference the existing shared context/log and keep
function names newTestClient and ownerRef unchanged so callers remain valid.

---

Nitpick comments:
In `@server/bootserver.go`:
- Around line 139-144: The handlers calling selectIPXEBootConfig and
selectHTTPBootConfig should distinguish "no valid config found" from other
failures and return 404 for the former; update the selector functions
(selectIPXEBootConfig/selectHTTPBootConfig) to return a sentinel error (e.g.,
ErrNoConfigFound) or wrap errors so they can be detected with errors.Is, then in
the call sites in bootserver.go (the blocks currently logging "Failed to select
IPXEBootConfig"/"Failed to select HTTPBootConfig") check if errors.Is(err,
ErrNoConfigFound) and respond with http.StatusNotFound, otherwise log and
respond with http.StatusInternalServerError as before.

In `@server/config_selector.go`:
- Around line 138-155: In resolveServer, don't swallow all errors from the first
k8sClient.Get call for ServerBootConfiguration: when the Get for sbc returns an
error, check apierrors.IsNotFound(err) and only continue to the next sbc in that
case; for any other error (network/API/perm issues) return that error wrapped
with context so it surfaces instead of being treated as an orphan; keep the
existing behavior for the second Get that looks up the Server (returning a
wrapped error on failure).
- Around line 21-54: Consolidate the nearly identical selectIPXEBootConfig and
selectHTTPBootConfig into one generic function (e.g., selectBootConfig[T
client.Object]) parameterized over the concrete types IPXEBootConfig and
HTTPBootConfig; inside the generic use T's GetNamespace and GetOwnerReferences
(already implemented) to build sbcNames via ownerSBCName and call
preferredBootConfigIndex, then return items[idx]; update the two original
functions to call the generic wrapper. Also add a defensive check for len(items)
== 0 at the top of the generic (return the zero value and an appropriate error)
to avoid panics when accessing items[0]. Ensure references to ownerSBCName and
preferredBootConfigIndex remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 71c31639-6619-4339-bd4f-4ab0254c2b01

📥 Commits

Reviewing files that changed from the base of the PR and between 7b346d6 and 7fdd43c.

📒 Files selected for processing (3)
  • server/bootserver.go
  • server/config_selector.go
  • server/config_selector_test.go

Comment thread server/config_selector_test.go Outdated
Comment thread server/config_selector_test.go Outdated
@atd9876
Copy link
Copy Markdown
Contributor Author

atd9876 commented Apr 20, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 20, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/bootserver_test.go`:
- Around line 128-147: Add focused tests for the multi-item branches of
selectIPXEBootConfig and selectHTTPBootConfig: create test cases with multiple
IPXEBootConfig/HTTPBootConfig objects (at least two) where one or more have
OwnerReferences pointing to a ServerBootConfiguration and verify the selector
returns the item associated with the ServerBootConfiguration (and returns an
error or fallback when no matching owner-ref exists). Use the existing test
helpers (ctx, newTestClient(), log) and call
selectIPXEBootConfig/selectHTTPBootConfig directly; also add a case that
exercises preferredBootConfigIndex/owner-ref extraction to ensure the correct
item is chosen when more than one candidate exists.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9d668e2-7f26-474e-82b6-ae2474910a4c

📥 Commits

Reviewing files that changed from the base of the PR and between 7fdd43c and 008abda.

📒 Files selected for processing (2)
  • server/bootserver_suit_test.go
  • server/bootserver_test.go

Comment thread server/bootserver_test.go Outdated
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/config_selector.go`:
- Around line 31-35: The selector currently drops namespace context by using
ownerSBCName(items[i].GetOwnerReferences()) and then calling
preferredBootConfigIndex(ctx, k8sClient, log, items[0].GetNamespace(),
sbcNames); change this to collect and pass per-item namespace/name pairs (e.g.,
slice of structs {Namespace, Name}) instead of a flat name list, call
preferredBootConfigIndex with each item's namespace (or accept the slice so it
can match by namespace), and update the matching logic that checks
Server.Spec.BootConfigurationRef and Server.Spec.MaintenanceBootConfigurationRef
to compare both Namespace and Name (honor the Namespace field when present) so
lookups and comparisons always include namespace scoping (update the code paths
around ownerSBCName, preferredBootConfigIndex, and the matching branches that
previously assumed items[0].GetNamespace()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4415be92-7707-42a8-b37c-7796f0e47ff6

📥 Commits

Reviewing files that changed from the base of the PR and between 008abda and 8d287a2.

📒 Files selected for processing (3)
  • server/bootserver.go
  • server/bootserver_test.go
  • server/config_selector.go

Comment thread server/config_selector.go Outdated
@github-actions github-actions Bot added size/XL and removed size/L labels Apr 21, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/config_selector.go`:
- Around line 168-171: resolveServer currently aborts selection when
k8sClient.Get fails to fetch the Server referenced by an SBC; change it to treat
a NotFound error as a stale reference and skip that SBC (log/debug and continue
iterating owners) instead of returning an error, while preserving behavior for
other errors (return on non-NotFound). Update the code around the k8sClient.Get
call in resolveServer to check apierrors.IsNotFound(err) and continue, and add a
focused unit test (e.g., TestResolveServer_SkipsStaleServerRef) that sets up two
owners where the first SBC's Server is deleted and the second resolves
successfully.
- Around line 65-69: The ownerSBCName function currently matches owner
references by Kind only, which can incorrectly match resources from other API
groups; update ownerSBCName to require both metav1.OwnerReference.Kind ==
"ServerBootConfiguration" and metav1.OwnerReference.APIVersion == the
ServerBootConfiguration API version used in this project (use the existing
constant or group/version value your types define) before returning ref.Name so
identity uses API version + kind + name; ensure you reference the
metav1.OwnerReference.APIVersion field in the ownerSBCName function when
implementing this check.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b6e8294a-0898-4b8c-8674-8a202d565cb4

📥 Commits

Reviewing files that changed from the base of the PR and between 8d287a2 and 5698192.

📒 Files selected for processing (2)
  • server/bootserver_test.go
  • server/config_selector.go

Comment thread server/config_selector.go
Comment thread server/config_selector.go
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
server/config_selector.go (1)

113-115: Nit: non-ASCII em-dash in error message.

The error string uses (U+2014). Error messages are generally easier to grep/inspect when kept ASCII. Consider a plain - or :.

Proposed tweak
-		return 0, fmt.Errorf("all %d boot configs are orphaned — none match Server %q boot configuration refs", len(owners), server.Name)
+		return 0, fmt.Errorf("all %d boot configs are orphaned: none match Server %q boot configuration refs", len(owners), server.Name)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/config_selector.go` around lines 113 - 115, The error message returned
when len(validIndices) == 0 uses a non-ASCII em-dash (—); update the fmt.Errorf
call that builds the error (the one referencing validIndices, owners length and
server.Name) to use an ASCII separator (e.g., "-" or ":") instead of the em-dash
so the string is plain ASCII and easier to grep/inspect.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/config_selector.go`:
- Around line 153-180: resolveServer currently returns the first Server resolved
from owners which can let a stale cross-server SBC determine the anchor; modify
resolveServer to tally ServerRef.Name (and namespace if relevant) from all
successfully fetched ServerBootConfiguration objects, choose the Server that is
referenced by the majority (or at least by N>=2 owners) and only return that
Server; if no Server meets the threshold, return an error so
preferredBootConfigIndex (and subsequent orphan filtering) cannot treat others
as orphans against a possibly-stale single-owner choice. Ensure you update the
lookup logic around resolveServer and preserve error handling for k8sClient.Get
failures.

---

Nitpick comments:
In `@server/config_selector.go`:
- Around line 113-115: The error message returned when len(validIndices) == 0
uses a non-ASCII em-dash (—); update the fmt.Errorf call that builds the error
(the one referencing validIndices, owners length and server.Name) to use an
ASCII separator (e.g., "-" or ":") instead of the em-dash so the string is plain
ASCII and easier to grep/inspect.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 68507703-3b55-4482-866e-116c1b84cc6c

📥 Commits

Reviewing files that changed from the base of the PR and between 5698192 and 281d5b2.

📒 Files selected for processing (2)
  • server/bootserver_test.go
  • server/config_selector.go
✅ Files skipped from review due to trivial changes (1)
  • server/bootserver_test.go

Comment thread server/config_selector.go
@atd9876 atd9876 requested a review from hardikdr April 21, 2026 11:17
Copy link
Copy Markdown
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick first pass, looks good, just a small comment inline.
Thanks for taking care of it, @atd9876

Comment thread server/config_selector.go
}

// Multiple valid configs: prefer the maintenance one if the server is in maintenance.
if server.Spec.MaintenanceBootConfigurationRef != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we also include server.Status.State == "Maintenance" as a check here? It feels like a better indicator of the server being in maintenance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Improve the handling of bootConfiguration when multiple are present

2 participants