Context:
The diagnostics system currently supports manual checks plus automatic re-checks after source edits when unresolved issues already exist. Users report stale diagnostics in real workflows, especially when alternating lint and typecheck runs, fixing issues one at a time, switching tabs, and re-running checks. We want to keep the auto re-check behavior after an error is discovered, but eliminate stale/incorrect results and align the diagnostics UI with the multi-tab model.
Important framing:
Treat this as if no prior fixes exist yet. Start from root causes and design for correctness and maintainability, not patch stacking.
Primary goals:
- Preserve this behavior:
Automatic re-checks continue after edits only when unresolved issues were previously detected.
- Eliminate stale diagnostics:
No outdated lint/type results should overwrite newer editor state.
- Redesign diagnostics drawer UX for multi-tab workflows:
The drawer should reflect the active tab language and show only relevant sections/actions.
Functional requirements:
-
Stale diagnostics correctness
1.1 Runs must always evaluate the current source for the target tab/path.
1.2 In-flight runs must be invalidated or aborted on relevant source changes.
1.3 Late async completions must be ignored if they do not match the latest run identity.
1.4 Pending auto-recheck timers must not overwrite newer manual runs.
1.5 Tab switches must not “fix” correctness by accident; correctness must hold without switching tabs.
-
Auto re-check policy
2.1 If unresolved type errors exist and component source changes, schedule re-check.
2.2 If unresolved lint issues exist and relevant source changes, schedule re-check.
2.3 If no unresolved issues exist, mark diagnostics stale and require explicit user action.
2.4 Keep this behavior predictable and visible in status/pending UI.
-
Drawer behavior by active tab language
3.1 Read active tab language from workspace tab records.
3.2 If active tab language is css or css dialect:
Show one diagnostics section for lint only.
Show one reset button.
3.3 If active tab language is javascript-jsx/tsx/react-like:
Show two sections named Typecheck and Lint.
Show three reset buttons named Reset types, Reset styles, Reset all.
3.4 Remove old section naming that no longer fits multi-tab (Component/Styles labels in the drawer for jsx tabs).
3.5 Ensure reset actions map cleanly to what is visible and expected in each mode.
-
State model and ownership
4.1 Make diagnostics ownership explicit:
Which run produced which state, for which tab/path, and at what version.
4.2 Avoid cross-contamination between tabs and between lint/type domains.
4.3 Keep status text and drawer content synchronized and non-contradictory.
Non-goals:
- Do not remove auto re-check entirely.
- Do not add new dependencies without explicit approval.
- Do not do broad unrelated refactors.
Implementation guidance:
- Identify and document root causes before coding:
Run ordering, source/path mismatch, timer races, and UI state overwrites.
- Introduce a unified freshness contract:
Run identity + target identity (tab/path) + cancellation/invalidation rules.
- Prefer small, composable utilities for:
Abort/invalidate, timer lifecycle, source resolution, and stale-result guards.
- Keep changes localized and minimal in surface area.
Diagnostics drawer UX acceptance:
- CSS/dialect active tab:
Exactly one diagnostics section (lint) and one reset button.
- JSX/TSX active tab:
Two sections labeled Typecheck and Lint.
Buttons: Reset types, Reset styles, Reset all.
- No confusing legacy labels tied to pre-multi-tab assumptions.
Correctness acceptance criteria:
- Repro flow A:
Start with type + lint issues, fix one, run the other, fix remaining issue, rerun both.
Result must never show stale old errors after current-source checks complete.
- Repro flow B:
While a check is pending, edit source.
Outdated completion must not overwrite latest state.
- Repro flow C:
Switch tabs during stale/pending states and return.
Diagnostics remain correct; switching does not act as hidden repair.
- Repro flow D:
Missing button type lint issue appears when present and disappears when fixed, reliably.
Testing requirements:
- Add/adjust Playwright diagnostics tests for all repro flows above.
- Include at least one test that edits during an in-flight check.
- Include tab-language-driven drawer layout assertions.
- Keep existing relevant diagnostics tests passing.
- Run lint and targeted diagnostics e2e suite before completion.
Deliverables:
- Code changes for stale-result prevention and drawer UX updates.
- Updated tests covering race conditions and new drawer behavior.
- Brief implementation notes:
Root causes found, strategy chosen, and why it is race-safe.
- Validation summary with commands run and key results.
Quality bar:
Prioritize deterministic correctness and user trust over aggressive live updates. Keep the UX simple, context-aware, and aligned with multi-tab editing.
Context:
The diagnostics system currently supports manual checks plus automatic re-checks after source edits when unresolved issues already exist. Users report stale diagnostics in real workflows, especially when alternating lint and typecheck runs, fixing issues one at a time, switching tabs, and re-running checks. We want to keep the auto re-check behavior after an error is discovered, but eliminate stale/incorrect results and align the diagnostics UI with the multi-tab model.
Important framing:
Treat this as if no prior fixes exist yet. Start from root causes and design for correctness and maintainability, not patch stacking.
Primary goals:
Automatic re-checks continue after edits only when unresolved issues were previously detected.
No outdated lint/type results should overwrite newer editor state.
The drawer should reflect the active tab language and show only relevant sections/actions.
Functional requirements:
Stale diagnostics correctness
1.1 Runs must always evaluate the current source for the target tab/path.
1.2 In-flight runs must be invalidated or aborted on relevant source changes.
1.3 Late async completions must be ignored if they do not match the latest run identity.
1.4 Pending auto-recheck timers must not overwrite newer manual runs.
1.5 Tab switches must not “fix” correctness by accident; correctness must hold without switching tabs.
Auto re-check policy
2.1 If unresolved type errors exist and component source changes, schedule re-check.
2.2 If unresolved lint issues exist and relevant source changes, schedule re-check.
2.3 If no unresolved issues exist, mark diagnostics stale and require explicit user action.
2.4 Keep this behavior predictable and visible in status/pending UI.
Drawer behavior by active tab language
3.1 Read active tab language from workspace tab records.
3.2 If active tab language is css or css dialect:
Show one diagnostics section for lint only.
Show one reset button.
3.3 If active tab language is javascript-jsx/tsx/react-like:
Show two sections named Typecheck and Lint.
Show three reset buttons named Reset types, Reset styles, Reset all.
3.4 Remove old section naming that no longer fits multi-tab (Component/Styles labels in the drawer for jsx tabs).
3.5 Ensure reset actions map cleanly to what is visible and expected in each mode.
State model and ownership
4.1 Make diagnostics ownership explicit:
Which run produced which state, for which tab/path, and at what version.
4.2 Avoid cross-contamination between tabs and between lint/type domains.
4.3 Keep status text and drawer content synchronized and non-contradictory.
Non-goals:
Implementation guidance:
Run ordering, source/path mismatch, timer races, and UI state overwrites.
Run identity + target identity (tab/path) + cancellation/invalidation rules.
Abort/invalidate, timer lifecycle, source resolution, and stale-result guards.
Diagnostics drawer UX acceptance:
Exactly one diagnostics section (lint) and one reset button.
Two sections labeled Typecheck and Lint.
Buttons: Reset types, Reset styles, Reset all.
Correctness acceptance criteria:
Start with type + lint issues, fix one, run the other, fix remaining issue, rerun both.
Result must never show stale old errors after current-source checks complete.
While a check is pending, edit source.
Outdated completion must not overwrite latest state.
Switch tabs during stale/pending states and return.
Diagnostics remain correct; switching does not act as hidden repair.
Missing button type lint issue appears when present and disappears when fixed, reliably.
Testing requirements:
Deliverables:
Root causes found, strategy chosen, and why it is race-safe.
Quality bar:
Prioritize deterministic correctness and user trust over aggressive live updates. Keep the UX simple, context-aware, and aligned with multi-tab editing.