-
Notifications
You must be signed in to change notification settings - Fork 2k
Adding fix-flaky-go-test skill #22010
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kalverra
wants to merge
2
commits into
develop
Choose a base branch
from
fixFlakyTestsSkill
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+155
−0
Open
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,155 @@ | ||
| --- | ||
| name: fix-flaky-go-test | ||
| description: >- | ||
| Fix flaky Go tests in Chainlink: stress, Postgres, -shuffle, race (tools/bin), | ||
| build tags. Use for intermittent failures, CI-only, -count/-shuffle issues, | ||
| races, noisy output. | ||
| --- | ||
|
|
||
| # Fix flaky Go tests (Chainlink) | ||
|
|
||
| <scope> | ||
| Reproduce before refactors. Fix determinism, isolation, time, concurrency. | ||
| Do not widen assertions or add blind retries. | ||
| Core tests need Postgres and usually CL_DATABASE_URL. CI uses tools/bin (gotestsum, race, integration), not only go test ./... | ||
| Read README.md Running tests, .github/workflows/ci-core.yml, tools/bin for parity. | ||
| </scope> | ||
|
|
||
| <trunk> | ||
| ## Trunk.io — gather context before touching code | ||
|
|
||
| Trunk.io tracks flaky test history, failure rates, and AI-generated root cause analysis. | ||
| Always check Trunk first — it may already have a fix recommendation. | ||
|
|
||
| ### Finding the Trunk test link | ||
|
|
||
| Jira tickets for flaky tests almost always contain a Trunk link. Look for: | ||
| - A URL matching `https://app.trunk.io/chainlink/flaky-tests/test/<uuid>/` | ||
| - The UUID in that URL is the **test case ID** (not a fix/investigation ID) | ||
|
|
||
| To extract it from a Jira ticket: | ||
| ``` | ||
| mcp__atlassian__getJiraIssue issue: "CCIP-XXXX" | ||
| ``` | ||
| Then look for `app.trunk.io` URLs in the description or comments. | ||
|
|
||
| ### Reading test history and failure data | ||
|
|
||
| Open the test case page directly — it shows failure rate, timeline, and recent CI runs: | ||
| ``` | ||
| https://app.trunk.io/chainlink/flaky-tests/test/<test-case-uuid>/ | ||
| ``` | ||
|
|
||
| Use the Scrapling MCP to fetch it (JS-rendered page): | ||
| ``` | ||
| mcp__ScraplingServer__fetch url: "https://app.trunk.io/chainlink/flaky-tests/test/<uuid>/" | ||
| ``` | ||
|
|
||
| ### Getting an AI fix recommendation (Trunk MCP) | ||
|
|
||
| The Trunk MCP tool `fix-flaky-test` requires a **fix/investigation ID**, which is different | ||
| from the test case ID in the URL. Investigations must be triggered from the Trunk UI first. | ||
|
|
||
| If an investigation exists, call: | ||
| ``` | ||
| mcp__plugin_trunk_trunk__fix-flaky-test | ||
| repoName: "smartcontractkit/chainlink" | ||
| orgSlug: "chainlink" | ||
| fixId: "<fix-or-investigation-uuid>" | ||
| ``` | ||
|
|
||
| If the tool returns "Investigation not found", the investigation has not been triggered yet. | ||
| Ask the reporter to open the test case page and click "Investigate" — or proceed with | ||
| code-level analysis using the workflow below. | ||
|
|
||
| ### What to read from Trunk | ||
|
|
||
| - **Failure rate** — how often it fails (e.g. 12% over last 30 days) | ||
| - **Failure pattern** — does it cluster around certain times, branches, or PR authors? | ||
| - **First seen / last seen** — did it regress recently after a change? | ||
| - **CI job name** — which workflow step fails (unit, race, integration, ccip-deployment) | ||
| - **Trunk root cause label** — if already classified (race, timing, docker, network, etc.) | ||
| </trunk> | ||
|
|
||
| <setup> | ||
| Run README prep: pnpm, make mockery, make generate, Postgres, make setup-testdb, source .dbenv, make testdb after pulls. Use make testdb-force if DB stuck. | ||
| Unset env vars except CL_DATABASE_URL when tests act wrong. | ||
| CL_DATABASE_URL must target a *_test database (preparetest). | ||
| Modules: repo root, integration-tests/, core/scripts/. Run go test from the correct module root. | ||
| </setup> | ||
|
|
||
| <requirements> | ||
| If unknown, ask: package path, test name, module root, whether file is //go:build integration, whether test uses pgtest/cltest/SqlxDB or is -short safe. | ||
| State your assumptions when you start. | ||
| </requirements> | ||
|
|
||
| <principles> | ||
| Stress with plain go test -count/-failfast/-shuffle; gotestsum --rerun-fails in tools/bin/go_core_tests can hide flakes on PRs. | ||
| Treat flakes as production bugs until disproved. | ||
| Prefer injected time, IO, randomness; per-test resources; scoped state. | ||
| Do not loosen timeouts or assertions without a named cause. | ||
| </principles> | ||
|
|
||
| <classify> | ||
| Append --tags integration to every go test below if the file has //go:build integration. | ||
| deployment/ CCIP: use tools/bin/go_core_ccip_deployment_tests pattern (cd deployment, CL_RESERVE_PORTS=128). | ||
| Optional CI parity: GODEBUG=goindex=0 on go test (see ci-core.yml). | ||
| If the file uses //go:build dev or trace, add matching --tags when reproducing. | ||
| </classify> | ||
|
|
||
| <workflow> | ||
| <reproduce> | ||
| Stop when you have a stable repro. Add -v when needed. | ||
| Record package, -run regex, failure mode. | ||
|
|
||
| 1. No DB quick path: | ||
| ```sh | ||
| go test -short ./path/to/pkg -run '^TestName$' -count 100 -failfast | ||
| ``` | ||
|
|
||
| 2. With DB from repo root: | ||
| ```sh | ||
| source .dbenv && make testdb | ||
| go test ./path/to/pkg -run '^TestName$' -count 100 -failfast | ||
| ``` | ||
|
|
||
| 3. Whole package: same DB prep then go test ./path/to/pkg -count 100 -failfast | ||
|
|
||
| 4. Shuffle: add -shuffle on; bisect with -shuffle N | ||
|
|
||
| 5. Race (fail if race.* exists): | ||
| ```sh | ||
| GORACE="log_path=$PWD/race" go test -race -shuffle on -timeout 10s -count 100 ./path/to/pkg -run '^TestName$' -failfast | ||
| ``` | ||
|
|
||
| 6. Parallelism probe: -cpu 1,2,4 and -parallel 4 with -shuffle on -count 50 -failfast | ||
|
|
||
| 7. Optional full unit job after local repro: GODEBUG=goindex=0 ./tools/bin/go_core_tests ./... (see script for GITHUB_EVENT_NAME flags) | ||
| </reproduce> | ||
|
|
||
| <fix> | ||
| Apply fix_patterns. Avoid permanent time.Sleep as the main fix. | ||
| Re-run the same repro command. Record shuffle seed in commit or comment if order-dependent. | ||
| </fix> | ||
| </workflow> | ||
|
|
||
| <root_causes> | ||
| General: package init and globals, t.Parallel plus shared fixtures, wall clock without fakes, port or path collisions, map order assumptions, leaked env or cwd, goroutines after test end. | ||
|
|
||
| Chainlink: shared Postgres or stale schema; missing pgtest.NewSqlxDB(t); cltest.TestApplication teardown or leaked HTTP; ports without :0 or CL_RESERVE_PORTS; stress without --tags integration on integration files; wrong module root. | ||
|
|
||
| Docker/Solana: WithSolanaContainerN port conflicts or slow startup; sync.Once download helpers that mark a failed download as done (causing cascading file-not-found failures in parallel runs); LoadCCIPPrograms network timeouts. If a test spins up Docker/Solana but the code under test early-exits before any chain interaction, remove the unnecessary infra. | ||
| </root_causes> | ||
|
|
||
| <fix_patterns> | ||
| Scope state per test. Use t.Cleanup only when needed and obvious. Inject time, randomness, net, fs. Use t.TempDir and :0 listeners. Serialize or drop t.Parallel on shared resources. Prefer channels, WaitGroup, explicit sync over sleep polls. | ||
|
|
||
| Chainlink: pgtest.NewSqlxDB(t) and core/internal/testutils/pgtest helpers; testutils.Context(t); core/internal/cltest TestApplication and matching cleanup; configtest and evmtest under core/internal/testutils; core/utils/testutils/heavyweight for ORM-heavy tests. | ||
| </fix_patterns> | ||
|
|
||
| <verify> | ||
| Write the exact repro go test line including -run and --tags integration when relevant. | ||
| Race: GORACE log_path, go test -race -shuffle on, confirm no race.* or document skip. | ||
|
Comment on lines
+150
to
+152
|
||
| Optional: TIMEOUT and COUNT with ./tools/bin/go_core_race_tests. | ||
| Do not merge unexplained timeout or assertion loosening. | ||
| </verify> | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
go testuses the-tagsflag (e.g.,-tags=integration), not--tags. As written,--tags integrationwill be rejected bygo testand would break the reproduction instructions (and any agent automation relying on them).