Skip to content

postgres database controllers#1760

Draft
mploski wants to merge 48 commits intodevelopfrom
feature/database-controllers
Draft

postgres database controllers#1760
mploski wants to merge 48 commits intodevelopfrom
feature/database-controllers

Conversation

@mploski
Copy link
Copy Markdown
Collaborator

@mploski mploski commented Mar 9, 2026

Description

Introduces the MVP PostgreSQL contoller stack to the Splunk Operator. Rather than
implementing PostgreSQL cluster management from scratch, this PR builds a Kubernetes-native
wrapper on top of CloudNativePG (CNPG) — a production-grade
PostgreSQL operator. The Splunk Operator layer adds three new CRDs and two new controllers
that translate high-level declarative intent (cluster classes, logical databases, managed
roles) into CNPG primitives, while handling the Splunk-specific lifecycle concerns:
credential provisioning, connection metadata publishing, and multi-phase status tracking.


Key Changes

api/v4/postgresclusterclass_types.go (new)

Defines the Go API schema for PostgresClusterClass — a cluster-scoped CRD that
standardises PostgreSQL cluster configuration (instance count, storage, Postgres version,
resource limits, pg_hba rules, connection pooler) across environments. Operators define
dev and prod classes; teams reference them by name.

api/v4/postgrescluster_types.go (new)

Defines the Go API schema for PostgresCluster — the namespaced CRD teams use to request
a PostgreSQL cluster. Declares spec fields for merging class defaults with per-cluster
overrides, and status fields for provisioner reference, connection pooler state, and
managed role tracking.

api/v4/postgresdatabase_types.go (new)

Defines the Go API schema for PostgresDatabase — the CRD that declares one or more
logical databases on a PostgresCluster, including reclaim policy and the list of
database definitions. Status tracks per-database secret and ConfigMap references.

internal/controller/postgresoperator_common_types.go (new)

Shared constants, phase enums, condition types, and condition reasons used across both
controllers. Single source of truth for retry delays, finalizer names, endpoint suffixes,
and all status strings.

internal/controller/postgrescluster_controller.go (new)

Reconciles PostgresCluster CRs against CNPG:

  • Provisions a PostgreSQL cluster by translating the PostgresCluster spec into a CNPG
    cluster, merging any defaults inherited from the referenced PostgresClusterClass with
    per-cluster overrides
  • Manages the optional connection pooler (PgBouncer) lifecycle alongside the cluster,
    enabling high-throughput workloads to share a bounded pool of database connections
  • Handles safe deletion by cleaning up all dependent CNPG resources before removing the CR,
    preventing orphaned infrastructure
  • Exposes observable status conditions (ClusterReady, PoolerReady) so the
    PostgresDatabase controller and operators can wait on a confirmed healthy cluster before
    proceeding

internal/controller/postgresdatabase_controller.go (new)

Reconciles PostgresDatabase CRs against CNPG:

  • Ensures the target PostgresCluster is healthy before creating any database resources,
    so partial provisioning is never left in an inconsistent state
  • Provisions database roles (admin and read-write) by configuring them on the cluster —
    each role is scoped to its own database to enforce least-privilege access
  • Generates secure credentials for each role and stores them as Kubernetes Secrets,
    ready to be mounted by workloads that need database access
  • Publishes connection details (host, port, database name, usernames, and pooler endpoints
    when available) into a ConfigMap per database, giving consuming services a stable
    reference without hardcoding hostnames
  • Provisions the logical databases on the PostgreSQL cluster and respects a configurable
    deletion policy that controls whether the database is physically dropped when the CR is
    deleted
  • Exposes observable status conditions (ClusterReady, SecretsReady, ConfigMapsReady,
    UsersReady, DatabasesReady) so operators can track exactly which phase reconciliation
    is in at any point

cmd/main.go

Both new controllers registered with the manager.

config/rbac/

Nine new RBAC files: admin, editor, and viewer roles for PostgresCluster,
PostgresClusterClass, and PostgresDatabase.

config/samples/

Five sample CRs: dev and prod cluster classes, default and override cluster
instances, and a PostgresDatabase example.


Testing and Verification

Manual Testing

End-to-end flow verified by applying the full CR chain in order:

# Deploy Class
kubectl apply -f config/samples/enterprise_v4_postgresclusterclass_dev.yaml
# Deploy Postgres Cluster
kubectl apply -f config/samples/enterprise_v4_postgrescluster_default.yaml
# Deploy Postgres Databases
kubectl apply -f config/samples/enterprise_v4_postgresdatabase.yaml

1. Verify the PostgresCluster is ready

kubectl get postgrescluster postgresql-cluster-dev -n default

2. Verify CNPG provisioned the underlying cluster

kubectl get clusters.postgresql.cnpg.io postgresql-cluster-dev -n default

3. Verify the PostgresDatabase status and conditions

kubectl get postgresdatabase splunk-databases -n default
kubectl get postgresdatabase splunk-databases -n default \
  -o jsonpath='{.status.conditions}' | jq

4. Verify CNPG Database CRs were created for each declared database

kubectl get databases.postgresql.cnpg.io -n default

5. Inspect Secrets and ConfigMaps created per database

# Secrets: {postgresdatabase-name}-{db-name}-{role}
kubectl get secret splunk-databases-kvstore-admin -n default -o jsonpath='{.data}' | jq
kubectl get secret splunk-databases-kvstore-rw -n default -o jsonpath='{.data}' | jq

# ConfigMaps: {postgresdatabase-name}-{db-name}
kubectl get configmap splunk-databases-kvstore -n default -o jsonpath='{.data}' | jq
kubectl get configmap splunk-databases-analytics -n default -o jsonpath='{.data}' | jq

6. Connect to the database using credentials and endpoints from the ConfigMap and Secret

HOST=$(kubectl get configmap splunk-databases-kvstore -n default \
  -o jsonpath='{.data.rw-host}')
DB=$(kubectl get configmap splunk-databases-kvstore -n default \
  -o jsonpath='{.data.dbname}')
USER=$(kubectl get configmap splunk-databases-kvstore -n default \
  -o jsonpath='{.data.rw-user}')
PASS=$(kubectl get secret splunk-databases-kvstore-rw -n default \
  -o jsonpath='{.data.password}' | base64 -d)

psql "postgresql://$USER:$PASS@$HOST/$DB"

Additional scenarios verified:

  • pooler-rw-host and pooler-ro-host keys appear in ConfigMap only when the cluster
    has connection pooling enabled
  • Controller fully reconciles on first apply with no hang after finalizer write
  • Deleting a PostgresDatabase CR triggers clean-up respecting the deletionPolicy
    declared per database (Retain preserves data, Delete drops the database)

Automated Tests — Planned Next

Unit and integration tests covering the full reconciliation lifecycle, error paths, and
status condition transitions are scheduled as the next deliverable for this feature branch.

Related Issues

Jira tickets, GitHub issues, Support tickets...

PR Checklist

  • Code changes adhere to the project's coding standards.
  • Relevant unit and integration tests are included.
  • Documentation has been updated accordingly.
  • All tests pass locally.
  • The PR description follows the project's guidelines.

@mploski mploski marked this pull request as draft March 9, 2026 20:25
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 9, 2026

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contribution License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment with the exact sentence copied from below.


I have read the CLA Document and I hereby sign the CLA


1 out of 4 committers have signed the CLA.
@DmytroPI-dev
@mploski
@limak9182
@M4KIF
You can retrigger this bot by commenting recheck in this Pull Request

@mploski mploski force-pushed the feature/database-controllers branch 2 times, most recently from dbf7479 to 33a8d1d Compare March 9, 2026 20:48
@mploski mploski changed the title Feature/database controllers postgres database controllers Mar 10, 2026
if apierrors.IsNotFound(getPGClusterErr) {
logger.Info("PostgresCluster deleted, skipping reconciliation")
return ctrl.Result{}, nil
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the logic to pkg/postgresql or pkg/splunk/postgresql . we tried to avoid doing changes in controller code as this code is mostly auto-generated. upgrade becomes hard as and when we might have to auto-generate the code

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic moved, followed the pattern already used with other controllers

// PostgresDatabaseReconciler reconciles a PostgresDatabase object
type PostgresDatabaseReconciler struct {
client.Client
Scheme *runtime.Scheme
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add Recorder record.EventRecorder helps you generating events

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will add it

For(&enterprisev4.PostgresDatabase{}).WithEventFilter(predicate.GenerationChangedPredicate{}).
Owns(&cnpgv1.Database{}).
Owns(&corev1.Secret{}).
Named("postgresdatabase").
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing how many concurrent reconciliation you want to do.
MaxConcurrentReconciles: enterpriseApi.TotalWorker

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

return ctrl.NewControllerManagedBy(mgr).
For(&enterprisev4.PostgresDatabase{}).WithEventFilter(predicate.GenerationChangedPredicate{}).
Owns(&cnpgv1.Database{}).
Owns(&corev1.Secret{}).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if you contorller owns secret here. check the other controllers for reference

Watches(&corev1.Secret{},
			handler.EnqueueRequestForOwner(
				mgr.GetScheme(),
				mgr.GetRESTMapper(),
				&enterpriseApi.ClusterManager{},
			)).

also look into predicates

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we levarage predicates now for filtering event's out

logger.Info("Finalizer removed, cleanup complete")
return nil
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing metric data collection, check other controllers. we might have to improve that code, but that explains what we are trying to do

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be working on this in the next sprint

return ctrl.NewControllerManagedBy(mgr).
For(&enterprisev4.PostgresCluster{}).
Owns(&cnpgv1.Cluster{}).
Owns(&cnpgv1.Pooler{}).
Copy link
Copy Markdown
Collaborator

@vivekr-splunk vivekr-splunk Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure if PostgresCluster owns Pooler, configmap and secret here for all the Poolers/configmaps/secret. check example Watches in other contorller

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, postgrescluster is owner of pooler, configmap and secrets. If I understand correctly, you use this pattern to control what we trigger:

		Watches(&corev1.Pod{},
			handler.EnqueueRequestForOwner(
				mgr.GetScheme(),
				mgr.GetRESTMapper(),
				&enterpriseApiV3.LicenseMaster{},
			)).

and this is an equivalent to Owns we used. Isnt it?

postgresCluster := &enterprisev4.PostgresCluster{}

BeforeEach(func() {
By("creating the custom resource for the Kind PostgresCluster")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a test case where you do not own the postgresql database but they also use postgresql operator to create and manage database. how will this controller respond to it.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@M4KIF worth adding to tests you work on

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

// ManagedRole represents a PostgreSQL role to be created and managed in the cluster.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add annotations for maintenance mode support, where we do not want operator to do anything when customer /support is debugging postgresql database/cluster

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean <kind>.enterprise.splunk.com/paused annotations or maintenance_mode field in IDX/SHC spec?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share the reference doc for me to understand this pattern and what and how we should use this mechanism?

logger.Info("PostgresCluster is being deleted, cleanup complete")
return ctrl.Result{}, nil
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add support for maintenance mode where if user has set annotations you will return without running any logic in reconciliation loop

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reference, how to achieve this?

Comment thread api/v4/postgrescluster_types.go Outdated
Comment thread api/v4/postgrescluster_types.go
Comment thread api/v4/postgresclusterclass_types.go
Instances *int32 `json:"instances,omitempty"`

// Storage is the size of persistent volume for each instance.
// Cannot be decreased after cluster creation (PostgreSQL limitation).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a validation for that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, but it will require k8s admission webhooks - we decided to postpone it for now, but we have a ticket for it. Do we have already standard how to use the admission webhooks with SOK?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread api/v4/postgresclusterclass_types.go
// "restart" - tolerate brief downtime (suitable for development)
// "switchover" - minimal downtime via automated failover (production-grade)
//
// NOTE: When using "switchover", ensure clusterConfig.instances > 1.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a validation for that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in admission webhook. Work postponed for now, but we have a ticket for in in our board

Comment thread api/v4/postgresdatabase_types.go
Comment thread bundle/manifests/enterprise.splunk.com_databaseclasses.yaml Outdated
Comment thread cmd/main.go Outdated
Comment thread cmd/main.go
Comment thread cmd/main.go Outdated
Comment thread config/crd/bases/enterprise.splunk.com_databaseclasses.yaml
Comment thread config/manifests/bases/splunk-operator.clusterserviceversion.yaml Outdated
Comment thread config/rbac/kustomization.yaml Outdated
Comment thread config/rbac/postgrescluster_admin_role.yaml Outdated
@DmytroPI-dev DmytroPI-dev force-pushed the feature/database-controllers branch from 2eb9368 to 34097e9 Compare April 17, 2026 15:19
M4KIF and others added 7 commits April 17, 2026 17:30
logging changed, pureness fix attempt

removed redundant sync at the end

incremental state building with limited redundancy

logging align

cluster unit adj

merge adjustments

event emmision placed back, fortified with tests

allign with requirements on state building

review and rebase changes

merge alignment

review changes
…gic-patch-cleanup

Cleanup introduced cluster smelly returns
…nk-operator into feature/database-controllers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants