Skip to content

Refactor test setup: explicit LocalStack installation and improved diagnostics#32

Merged
adityathebe merged 3 commits into
mainfrom
claude/batch-runner-test-failure-4wh179
Jul 3, 2026
Merged

Refactor test setup: explicit LocalStack installation and improved diagnostics#32
adityathebe merged 3 commits into
mainfrom
claude/batch-runner-test-failure-4wh179

Conversation

@adityathebe

Copy link
Copy Markdown
Member

Summary

This PR refactors the Helm test suite setup to explicitly install LocalStack as a separate Helm chart rather than relying on implicit service setup, and adds comprehensive Kubernetes diagnostics output for debugging failed deployments.

Key Changes

  • Removed sync-based parallelization: Removed the sync.WaitGroup that was parallelizing Docker build and kind cluster creation. Operations now run sequentially for better clarity and error handling.

  • Explicit LocalStack Helm installation: Added dedicated LocalStack Helm chart installation in BeforeSuite with:

    • Pinned image version (localstack/localstack:4.12.0) to avoid dependency on the pro image that requires authentication
    • Configurable wait timeout (10 * time.Minute)
    • Proper image pull strategy (IfNotPresent) to use pre-loaded images
  • Image pre-loading optimization: Added explicit docker pull of the LocalStack image on the host before loading into kind, preventing rate-limited unauthenticated pulls from Docker Hub within the cluster.

  • Enhanced diagnostics: Added dumpKubeDiagnostics() function that captures:

    • Pod status and details (kubectl get pods)
    • Kubernetes events (sorted by timestamp)
    • Deployment descriptions
    • LocalStack pod logs (last 100 lines)

    This function is called when Helm installations fail to surface underlying pod state in CI logs.

  • Improved error handling: Wrapped both LocalStack and Batch Runner Helm installations with explicit error checking and diagnostics output before assertions.

Implementation Details

  • Constants added for LocalStack configuration (wait timeout, image repo, and tag)
  • LocalStack image is now loaded into the kind cluster just like the batch-runner image
  • Diagnostics are only dumped on installation failure, keeping normal test output clean
  • The refactoring maintains the same test functionality while improving debuggability and reducing implicit dependencies

https://claude.ai/code/session_01PYtCgvgDiYzJY84y1qecB4

adityathebe and others added 3 commits July 2, 2026 00:05
Dependabot reported critical pgx and grpc advisories in the Go module.\n\nUpgrade pgx and grpc to patched versions so the module no longer resolves vulnerable releases.
The CI e2e suite timed out while the kind helper installed Localstack with the default 5 minute wait as other setup work was still competing for Docker/Kubernetes resources.

Install Localstack explicitly after the image build and kind cluster setup complete, and give the Helm release a longer wait timeout so the deployment can become ready on slower runners.
The unpinned localstack chart install started failing after
localstack/helm-charts#148 (Mar 2026) switched the chart's default image
to localstack/localstack-pro, which requires an auth token and never
passes readiness without one, so helm --wait timed out with the
Deployment stuck InProgress.

- Pin image.repository/tag to the community localstack/localstack:4.12.0
  (current when this suite was introduced) via chart values
- Pre-pull the image on the host and kind-load it so the node never
  pulls from Docker Hub (unauthenticated in-node pulls are rate-limited
  on shared CI runners)
- Dump pods, events, deployment describe and localstack logs when a
  helm install fails so the next failure shows the underlying pod state

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01PYtCgvgDiYzJY84y1qecB4
@adityathebe adityathebe marked this pull request as ready for review July 3, 2026 09:12
@adityathebe adityathebe merged commit d828cf3 into main Jul 3, 2026
6 checks passed
@adityathebe adityathebe deleted the claude/batch-runner-test-failure-4wh179 branch July 3, 2026 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants