Skip to content

Phase 5: Dynamic Webhook Middleware Kubernetes Controller#4564

Draft
Sanskarzz wants to merge 1 commit intostacklok:mainfrom
Sanskarzz:dynamicwebhook5
Draft

Phase 5: Dynamic Webhook Middleware Kubernetes Controller#4564
Sanskarzz wants to merge 1 commit intostacklok:mainfrom
Sanskarzz:dynamicwebhook5

Conversation

@Sanskarzz
Copy link
Copy Markdown
Contributor

@Sanskarzz Sanskarzz commented Apr 4, 2026

[WIP] Depends on the merge of Phase 3 PR and 4 PR

Summary

This PR implements the fifth phase of the dynamic webhook middleware configuration system (RFC THV-0017), introducing Kubernetes custom resource definitions (CRDs), their respective controller reconciling mechanisms, and integration into the core MCPServer lifecycle.

Fixes #3401

Large PR Justification

This is a new feature package with a large test suite, and it needs to land as one coherent phase.

Key Changes

  1. MCPWebhookConfig CRD Creation

    • Introduced MCPWebhookConfig CRD in api/v1alpha1 matching the specifications described in RFC THV-0017.
    • Allows users to declaratively specify sets of Validating and Mutating webhooks.
    • Includes full configuration for security integrations:
      • HMACSecretRef for signing request payloads.
      • TLSConfig (CA, Client Cert, and Key secrets) for rigorous mTLS connections.
    • Fix: Updated CRD markers to use lowercase fail/ignore for FailurePolicy to align with the runner's runtime validation requirements.
  2. Controller Logic and Finalizers

    • Created the MCPWebhookConfigReconciler in cmd/thv-operator/controllers/.
    • The controller manages .Status.ConfigHash calculating changes to the configuration.
    • Cross-references incoming configurations dynamically by injecting finalizers. It correctly tracks all referencing MCPServers via .Status.ReferencingServers.
    • Integrated safety guards preventing the deletion of an MCPWebhookConfig while actively referenced by an MCPServer.
  3. MCPServer Controller Integration

    • Embedded WebhookConfigRef natively into MCPServerSpec.
    • Updated MCPServerStatus to explicitly trace configuration hashes linked via annotation hooks.
    • Adapted the Pod Environment builder (deploymentNeedsUpdate) to trace webhook Secret updates.
    • Upgraded createRunConfigFromMCPServer to evaluate and translate webhook settings locally using newly extracted utility functions in pkg/controllerutil/webhook.go.
    • Fix: Implemented robust lowercasing of FailurePolicy in buildWebhookConfig to ensure compatibility with the thv-proxyrunner, regardless of the case used in the CRD.
  4. Testing and Verification

    • Added robust unit test coverage confirming behavior for mcpwebhookconfig_types_test.go, the controller logic (mcpwebhookconfig_controller_test.go), and utilities (webhook_test.go).
    • Introduced comprehensive end-to-end chainsaw tests ensuring valid configurations proceed through creation securely, rejecting any malformed specs early on with CEL validation endpoints.

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)
  • Manual testing (describe below)

Manual Verification

Manual testing was performed using a local Kind cluster and the fetch MCPServer.

  1. Setup:
    • Deployed the operator using task operator-deploy-local.
    • Deployed an echo webhook server: kubectl apply -f manual-testing-phase5/echo-server.yaml.
     spec:
       containers:
       - name: echo
         image: ealen/echo-server:latest
    
  2. Configuration:
    • Created an MCPWebhookConfig pointing to the echo server with insecureSkipVerify: true.
    • Created a fetch MCPServer referencing the config.
  3. Execution:
    • Verified that the operator successfully reconciled the MCPWebhookConfig and generated a configHash.
    • Verified that the fetch server picked up the configuration and started the thv-proxyrunner.
    • Result: Inspected the fetch pod logs and confirmed that the mutating webhook middleware was active and correctly invoking the echo server (resulting in "denied request" logs as expected since the echo server doesn't return a valid allowed: true response).
  4. Dynamic Updates:
    • Updated the MCPWebhookConfig (e.g., changed the failure policy or URL).
    • Verified that the operator detected the change and restarted the fetch pod automatically to load the new settings.

Signed-off-by: Sanskarzz <sanskar.gur@gmail.com>
@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 13, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large PR Detected

This PR exceeds 1000 lines of changes and requires justification before it can be reviewed.

How to unblock this PR:

Add a section to your PR description with the following format:

## Large PR Justification

[Explain why this PR must be large, such as:]
- Generated code that cannot be split
- Large refactoring that must be atomic
- Multiple related changes that would break if separated
- Migration or data transformation

Alternative:

Consider splitting this PR into smaller, focused changes (< 1000 lines each) for easier review and reduced risk.

See our Contributing Guidelines for more details.


This review will be automatically dismissed once you add the justification section.

@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ Large PR justification has been provided. The size review has been dismissed and this PR can now proceed with normal review.

@github-actions github-actions bot dismissed their stale review April 13, 2026 11:47

Large PR justification has been provided. Thank you!

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 13, 2026

Codecov Report

❌ Patch coverage is 58.42697% with 111 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.91%. Comparing base (1d494c4) to head (b10c8ae).

Files with missing lines Patch % Lines
...d/thv-operator/controllers/mcpserver_controller.go 7.01% 50 Missing and 3 partials ⚠️
...perator/controllers/mcpwebhookconfig_controller.go 61.20% 32 Missing and 13 partials ⚠️
cmd/thv-operator/pkg/controllerutil/webhook.go 92.77% 3 Missing and 3 partials ⚠️
cmd/thv-operator/main.go 0.00% 5 Missing ⚠️
...md/thv-operator/controllers/mcpserver_runconfig.go 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4564      +/-   ##
==========================================
- Coverage   69.00%   68.91%   -0.10%     
==========================================
  Files         518      521       +3     
  Lines       54744    55011     +267     
==========================================
+ Hits        37777    37910     +133     
- Misses      14071    14189     +118     
- Partials     2896     2912      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Webhook Middleware Phase 5: Kubernetes CRD and controller integration

1 participant