fix: support name look-up on InstanceExists() and InstanceShutdown() if no providerID set#245
Conversation
sudomateo
left a comment
There was a problem hiding this comment.
Left some minor suggestions. Overall this looks great. I was a bit on the fence about returning true, nil in InstanceShutdown when the instance did not exist but I think that's the correct move to let the CCM stop worrying about the node.
| if errors.Is(err, oxide.ErrObjectNotFound) { | ||
| return true, nil | ||
| } | ||
| return false, err |
There was a problem hiding this comment.
Either here or in getInstance we should decorate the error with the instance ID we tried to fetch. I don't remember if the outer Kubernetes calling code has that information already.
There was a problem hiding this comment.
getInstance returns what we try to look up in the wrapped error. If it can't parse the provider ID, it's printed. If it uses the fallback path when the providerID is not set, it wraps the error returned from the SDK. The wrapped SDK error is admittedly ugly, so will need to figure out how to format that better into a structured line...
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd E0529 21:35:34.515314 1 node_controller.go:288] Error getting instance metadata for node addresses: failed viewing oxide instance: GET https://oxide<blah>/v1/instances/46221934-d503-43ea-825f-730ada60d85c │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd ----------- RESPONSE ----------- │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd Status: 404 ObjectNotFound │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd Message: not found: instance with id "46221934-d503-43ea-825f-730ada60d85c" │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd RequestID: 9163f53f-95b7-4c6b-8d61-538df4c4c408 │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd ------- RESPONSE HEADERS ------- │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd Content-Type: [application/json] │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd X-Request-Id: [9163f53f-95b7-4c6b-8d61-538df4c4c408] │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd Date: [Fri, 29 May 2026 21:35:34 GMT] │
│ oxide-ccm-oxide-cloud-controller-manager-776766d446-2zmhd Content-Length: [177]
There was a problem hiding this comment.
It's coming from oxide.go here: https://github.com/oxidecomputer/oxide.go/blob/93d8204aae4d37f5c56c90ea94764f5d4b0cce3a/oxide/errors.go#L96-L133
We have RFD-614 to discuss how we'd like to improve custom errors but we've stalled a bit on finishing it since we've implemented some of the stuff discussed there already. Ideally we'd pass the error to some structured logging package and it calls some structured printing method instead of Error() string. Either way that's another discussion for oxide.go.
Changes
InstanceExists()andInstanceShutdown().Why?
There is a comment on the interface of InstanceV2 saying providerID and name should be supported.
Apart from the interface comment, the edge case where a name look-up helps:
InstanceMetadata()andInstanceExists()return errors since there is no providerID set.Ideally, we should clean-up the node in this case since it in-fact does not exist.
NOTE: The name look-up only works if the node is registered with the
Nameof the Oxide instance. This should be pretty standard though. It's more standard to use thehostnameas the node name of a K8s node, which in most cases will be the Oxide instance name. We could match based onhostname, but the query would be much more expensive since we'd have to resort to a List rather than a direct Fetch (View).Testing
v0.6.0 the node will continue to exist forever and CCM logs show: