feat(docs): add markdown-link-check to docs CI (#259)#300
feat(docs): add markdown-link-check to docs CI (#259)#300ClintEastman02 wants to merge 8 commits into
Conversation
Adds a broken-link checker that validates internal cross-references and external URLs across all Markdown sources at PR time, preventing silent link rot in documentation that also serves as agent context.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #300 +/- ##
=======================================
Coverage ? 86.37%
=======================================
Files ? 176
Lines ? 40811
Branches ? 4106
=======================================
Hits ? 35251
Misses ? 5560
Partials ? 0 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The link #domain--requiresrepo was missing the underscore; the heading renders as #domain--requires_repo.
krokoko
left a comment
There was a problem hiding this comment.
Thanks for picking up CA-12 — this is exactly the kind of cheap fitness function the docs need, and the WORKFLOWS.md anchor fix already proves the gate has real targets. A few findings beyond the existing comments (the suggestions to run the CI step through mise and fold it into drift-prevention would also resolve the command duplication and the concurrency concern, so I've left those out):
1. The pipeline fails open — the gate can pass while checking nothing (docs/mise.toml:31, docs/package.json:12)
find … | xargs markdown-link-check runs without pipefail, so the pipeline's exit code is xargs's alone. If find errors (e.g. a docs directory is renamed) the job still goes green — reproducible with sh -c "find /nonexistent | xargs echo"; echo $? → 0. Suggested hardening:
- run under
bash -o pipefail(orset -euo pipefailin the script), - use
find … -print0 | xargs -0 -r(NUL-safe, and-rskips the run on empty input instead of invoking the checker with no args), - assert a minimum file count before checking (corpus is ~46 today) so a shrunk scan fails loudly rather than silently passing.
2. The aws-cdk bootstrap ignore pattern hides a genuinely dead link (docs/.markdown-link-check.json:9)
The ignored URL is used in docs/decisions/ADR-002-least-privilege-bootstrap-policies.md:71 and currently returns 404 (the bootstrap code moved within the aws-cdk repo). This is the exact rot class the gate exists to catch, suppressed on day one. Suggest repointing ADR-002 to a SHA-pinned permalink (or the published CDK docs) and deleting the ignore entry — allowlist entries are best reserved for categorically-unverifiable URLs.
3. aliveStatusCodes treats 301/302/307/308 as alive (docs/.markdown-link-check.json:23)
A permanently-moved link whose destination 404s will still pass. Suggest dropping the 3xx codes and letting the checker follow redirects to validate the terminal status.
4. Blanket npmjs.com ignore (docs/.markdown-link-check.json:6)
A typo'd package URL passes silently, and retryOn429 already covers the rate-limiting that motivates this. Worth narrowing or removing.
5. -maxdepth 1 will silently drop future subdirectories (docs/mise.toml:31, docs/package.json:12)
guides/, design/, decisions/ are flat today so coverage is currently complete, but the first nested folder vanishes from the check with no signal. Either drop -maxdepth 1 for the named dirs (keep it only on ..), or rely on the min-file-count assertion from point 1. Related: docs/README.md is never scanned (no . start point).
6. The failure path was never demonstrated
The test plan validates only the green path. Since the gate's entire value is in failing, it would be great to link a throwaway commit with a deliberately broken link showing the job go red — that also empirically confirms point 1's exit-code handling.
Nit: the ^link$ ignore is legitimate (it masks the intentional placeholder in ADR-004's example prose) but opaque — a one-line note in the config or Developer Guide would save a future maintainer the archaeology.
One consideration for the drift-prevention suggestion: external-URL checking is non-hermetic (network flakiness, transient 5xx), so folding it into the main build puts that flakiness on every build's critical path. Splitting internal-link checking (hermetic, belongs in build/drift-prevention) from external-URL checking (scheduled or PR-only) might be the sweet spot.
Happy to discuss any of these — points 1 and 2 are the ones I'd consider blocking.
|
Thanks for the thorough review @krokoko — addressed all points: Inline comments (drift-prevention integration):
1. [BLOCKING] Pipeline fails open:
2. [BLOCKING] aws-cdk dead link:
3. aliveStatusCodes:
4. Blanket npmjs.com ignore:
5. -maxdepth 1:
6. Failure path demonstration:
7. Nit (^link$ opaque):
Architecture (internal vs external split):
|
…prevention Remove dedicated link-check CI job from docs.yml and wire it into the existing drift-prevention task so it runs as part of every build. Harden the script with pipefail, temp-file approach, min-file-count assertion, and xargs -r. Fix dead aws-cdk link in ADR-002, narrow npmjs ignore to package paths only, drop 3xx from aliveStatusCodes, and document the ^link$ ignore pattern.
Summary
markdown-link-checkas a devDependency in the docs packagelink-checkCI job runs on pull requests touching documentation, validating internal cross-references and external URLs across source Markdown (docs/guides/,docs/design/,docs/decisions/, root.mdfiles).markdown-link-check.json) handles known-flaky external hosts (npmjs bot-blocking, rate-limited APIs) and intentional patterns (Starlight site paths, placeholder text)mise //docs:link-checktask for local runsdocs/design/WORKFLOWS.md(caught by the new checker)Closes #259.
Test plan