Skip to content

Cache https.Agent and Dispatcher per cluster/user pair to fix FD leaks on Watch reconnection#2904

Draft
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-socket-fd-leak-in-kubeconfig
Draft

Cache https.Agent and Dispatcher per cluster/user pair to fix FD leaks on Watch reconnection#2904
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-socket-fd-leak-in-kubeconfig

Conversation

Copilot AI commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

KubeConfig.createAgent() and createDispatcher() unconditionally construct a new instance on every call. With Watch's reconnect pattern, this leaks a new socket pool each cycle — orphaned agents are not promptly GC'd due to reference chains through node-fetch response body streams, causing steady FD growth and eventual EMFILE.

Changes

src/config.ts

  • Add agentCache: Map<string, Agent> and dispatcherCache: Map<string, Dispatcher | undefined> to KubeConfig, keyed by "clusterName::userName" (the tuple that uniquely determines TLS config and auth)
  • createAgent() and createDispatcher() now return a cached instance on subsequent calls for the same cluster/user pair, constructing only on first use
  • Add private getAgentCacheKey(cluster) helper

src/config_test.ts

  • 4 new tests: same-instance reuse for both agent and dispatcher, distinct instances across different cluster/user contexts

Behavior

const watch = new k8s.Watch(kc);

// Before: new https.Agent created on every reconnection → FD growth
// After:  same https.Agent reused for the same cluster/user → stable FD count
async function startWatch() {
  await watch.watch('/apis/...', {}, (type, obj) => {}, (err) => {
    setTimeout(startWatch, 1000); // reconnect — agent is now reused
  });
}

The cache is keyed by cluster+user names rather than a single global entry, so multi-cluster configs that switch context get the correct per-endpoint agent.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 19, 2026
@linux-foundation-easycla

linux-foundation-easycla Bot commented Jun 19, 2026

Copy link
Copy Markdown

CLA Not Signed

@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jun 19, 2026
…ubeConfig

Fixes socket/FD leaks caused by creating a new https.Agent on every
Watch reconnection or API call.

- Add `agentCache` (Map keyed by "clusterName::userName") to reuse
  agent instances across repeated calls with the same cluster/user.
- Add `dispatcherCache` (same key) to reuse undici Dispatcher instances
  across repeated applySecurityAuthentication() calls.
- Add private `getAgentCacheKey()` helper that builds the key from the
  current cluster name and user name — consistent with the comment from
  @brendandburns about keying off the user/cluster tuple.
- Add four new tests in config_test.ts verifying same-instance reuse
  and distinct instances for different cluster/user combinations.
@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Copilot
Once this PR has been reviewed and has the lgtm label, please ask for approval from brendandburns. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jun 19, 2026
Copilot AI changed the title [WIP] Fix socket and FD leaks in KubeConfig.createAgent() Cache https.Agent and Dispatcher per cluster/user pair to fix FD leaks on Watch reconnection Jun 19, 2026
Copilot AI requested a review from brendandburns June 19, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants