refactor(monitoring): delete per-SND ServiceMonitor wrapper#257
Merged
Conversation
Removes the per-SeiNodeDeployment ServiceMonitor reconciler. The
platform-owned cluster-wide PodMonitor (platform PR #577) covers
every seid pod via the `sei.io/role` label, and the per-SND wrapper
has been silently broken anyway (selecting an external Service that
strips the `metrics` port — confirmed in cross-review).
File deletions:
- `internal/controller/nodedeployment/monitoring.go` (153 LOC reconciler)
- `internal/controller/nodedeployment/monitoring_test.go` (191 LOC tests)
- `api/v1alpha1/monitoring_types.go` (`MonitoringConfig` + `ServiceMonitorConfig`)
Field/constant removals:
- `Monitoring *MonitoringConfig` on `SeiNodeDeploymentSpec`
- `ConditionServiceMonitorReady` constant
- `+kubebuilder:rbac` marker for `monitoring.coreos.com/servicemonitors`
- `reconcileMonitoring(...)` call site in `controller.go`
- `serviceMonitorGVK()` from the orphan-GVK slice at `networking.go:404`
- `monitoring:` block from the sample `pacific-1-rpc-group.yaml`
Regenerated:
- `config/crd/sei.io_seinodedeployments.yaml` (drops monitoring schema block)
- `manifests/sei.io_seinodedeployments.yaml` (same)
- `config/rbac/role.yaml` + `manifests/role.yaml` (drops servicemonitors rule)
- `api/v1alpha1/zz_generated.deepcopy.go` (drops MonitoringConfig DeepCopy)
Net: -499 LOC, full test suite still green.
Post-deploy aftercare: one-shot sweep to clean orphan ServiceMonitor
objects that the deleted reconciler will no longer manage:
kubectl delete servicemonitor -n pacific-1 \
archive-0 archive-1 archive-2 node-0 shadow-giga syncer-0
Platform manifest cleanup landed first in sei-protocol/platform#579.
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
Final correctness sweep (cross-reviewed by k8s + platform experts) caught 6 stale references the initial deletion left behind: - DeletionPolicy doc in seinodedeployment_types.go said "networking/monitoring resources" — now just networking - controller.go Reconcile comment said "during networking/monitoring reconciliation" — drop /monitoring - labels.go groupSelector doc said "Service, HTTPRoutes, and ServiceMonitor" — SM is gone, just Service and HTTPRoutes - networking.go externalPortsForMode rationale referenced the deleted wrapper's double-scrape bug — rewrite to reference the platform PodMonitor as the current scrape mechanism - noderesource.go deriveRole doc claimed it mirrors nodedeployment.deriveComponent (deleted) — rewrite to describe its actual role: stamp sei.io/role onto pod template, lifted by the platform PodMonitor relabeling - pacific-1-rpc-group.yaml sample header said "and Prometheus monitoring" — drop, the SND no longer provisions any CRD/RBAC regen captures the type-doc change.
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Removes the per-SeiNodeDeployment ServiceMonitor reconciler entirely. The platform-owned cluster-wide PodMonitor (platform PR #577) covers every seid pod via the
sei.io/rolelabel, and the per-SND wrapper has been silently broken anyway (selecting an external Service that strips themetricsport — confirmed in cross-review by k8s + observability experts).Platform-side manifest cleanup already merged: sei-protocol/platform#579. This is the controller-side companion.
Net change: −499 LOC, 12 files
Deleted files
internal/controller/nodedeployment/monitoring.go(153 LOC reconciler)internal/controller/nodedeployment/monitoring_test.go(191 LOC tests)api/v1alpha1/monitoring_types.go(MonitoringConfig+ServiceMonitorConfig)Field/constant/call-site removals
Monitoring *MonitoringConfigfield onSeiNodeDeploymentSpecConditionServiceMonitorReadyconstant+kubebuilder:rbacmarker formonitoring.coreos.com/servicemonitorsreconcileMonitoring(...)call site incontroller.goserviceMonitorGVK()from the orphan-GVK slice atnetworking.go:404monitoring:block from the samplepacific-1-rpc-group.yamlRegenerated (
make generate manifests)config/crd/sei.io_seinodedeployments.yaml— drops monitoring schema blockmanifests/sei.io_seinodedeployments.yaml— sameconfig/rbac/role.yaml+manifests/role.yaml— drops servicemonitors RBAC ruleapi/v1alpha1/zz_generated.deepcopy.go— dropsMonitoringConfig/ServiceMonitorConfigDeepCopyPost-deploy aftercare
The 6 currently-live ServiceMonitor objects in
pacific-1namespace will become orphans once this controller deploys (no reconcile logic delete them). One-shot kubectl sweep:kubectl delete servicemonitor -n pacific-1 \ archive-0 archive-1 archive-2 node-0 shadow-giga syncer-0Both K8s and Platform experts vetoed adding in-controller cleanup logic — would be 20 LOC added and immediately deleted next release.
Test plan
go build ./...cleango test ./...— full suite passesmake generate manifests— CRDs + RBAC regenerated cleanlycount(up{sei_role!=""}) by (chain_id, sei_role)still returns the same K8s scrape targets (PodMonitor unchanged)Cross-review
Both [k8s-specialist] and [platform-engineer] reviewed the deletion scope + sequencing. Key findings:
PreserveUnknownFieldsmarkers) — server-side prunes the field silently from existing SNDs after the CRD updatenetworking.go:404was the only hidden call site