Skip to content

feat(docs): add markdown-link-check to docs CI (#259)#300

Open
ClintEastman02 wants to merge 8 commits into
aws-samples:mainfrom
ClintEastman02:feat/259-markdown-link-check
Open

feat(docs): add markdown-link-check to docs CI (#259)#300
ClintEastman02 wants to merge 8 commits into
aws-samples:mainfrom
ClintEastman02:feat/259-markdown-link-check

Conversation

@ClintEastman02

@ClintEastman02 ClintEastman02 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds markdown-link-check as a devDependency in the docs package
  • New link-check CI job runs on pull requests touching documentation, validating internal cross-references and external URLs across source Markdown (docs/guides/, docs/design/, docs/decisions/, root .md files)
  • Allowlist config (.markdown-link-check.json) handles known-flaky external hosts (npmjs bot-blocking, rate-limited APIs) and intentional patterns (Starlight site paths, placeholder text)
  • New mise //docs:link-check task for local runs
  • Documented in the Developer Guide
  • Bonus: fixed a pre-existing broken anchor in docs/design/WORKFLOWS.md (caught by the new checker)

Closes #259.

Test plan

  • Full link-check passes locally (46 files, 0 broken links)
  • Docs sync + build pass cleanly
  • CI green on this PR

Adds a broken-link checker that validates internal cross-references and
external URLs across all Markdown sources at PR time, preventing silent
link rot in documentation that also serves as agent context.
@ClintEastman02 ClintEastman02 requested a review from a team as a code owner June 9, 2026 20:44
@codecov-commenter

codecov-commenter commented Jun 9, 2026

Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@14820c8). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #300   +/-   ##
=======================================
  Coverage        ?   86.37%           
=======================================
  Files           ?      176           
  Lines           ?    40811           
  Branches        ?     4106           
=======================================
  Hits            ?    35251           
  Misses          ?     5560           
  Partials        ?        0           

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread .github/workflows/docs.yml Outdated
Comment thread docs/mise.toml

@krokoko krokoko left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking up CA-12 — this is exactly the kind of cheap fitness function the docs need, and the WORKFLOWS.md anchor fix already proves the gate has real targets. A few findings beyond the existing comments (the suggestions to run the CI step through mise and fold it into drift-prevention would also resolve the command duplication and the concurrency concern, so I've left those out):

1. The pipeline fails open — the gate can pass while checking nothing (docs/mise.toml:31, docs/package.json:12)
find … | xargs markdown-link-check runs without pipefail, so the pipeline's exit code is xargs's alone. If find errors (e.g. a docs directory is renamed) the job still goes green — reproducible with sh -c "find /nonexistent | xargs echo"; echo $?0. Suggested hardening:

  • run under bash -o pipefail (or set -euo pipefail in the script),
  • use find … -print0 | xargs -0 -r (NUL-safe, and -r skips the run on empty input instead of invoking the checker with no args),
  • assert a minimum file count before checking (corpus is ~46 today) so a shrunk scan fails loudly rather than silently passing.

2. The aws-cdk bootstrap ignore pattern hides a genuinely dead link (docs/.markdown-link-check.json:9)
The ignored URL is used in docs/decisions/ADR-002-least-privilege-bootstrap-policies.md:71 and currently returns 404 (the bootstrap code moved within the aws-cdk repo). This is the exact rot class the gate exists to catch, suppressed on day one. Suggest repointing ADR-002 to a SHA-pinned permalink (or the published CDK docs) and deleting the ignore entry — allowlist entries are best reserved for categorically-unverifiable URLs.

3. aliveStatusCodes treats 301/302/307/308 as alive (docs/.markdown-link-check.json:23)
A permanently-moved link whose destination 404s will still pass. Suggest dropping the 3xx codes and letting the checker follow redirects to validate the terminal status.

4. Blanket npmjs.com ignore (docs/.markdown-link-check.json:6)
A typo'd package URL passes silently, and retryOn429 already covers the rate-limiting that motivates this. Worth narrowing or removing.

5. -maxdepth 1 will silently drop future subdirectories (docs/mise.toml:31, docs/package.json:12)
guides/, design/, decisions/ are flat today so coverage is currently complete, but the first nested folder vanishes from the check with no signal. Either drop -maxdepth 1 for the named dirs (keep it only on ..), or rely on the min-file-count assertion from point 1. Related: docs/README.md is never scanned (no . start point).

6. The failure path was never demonstrated
The test plan validates only the green path. Since the gate's entire value is in failing, it would be great to link a throwaway commit with a deliberately broken link showing the job go red — that also empirically confirms point 1's exit-code handling.

Nit: the ^link$ ignore is legitimate (it masks the intentional placeholder in ADR-004's example prose) but opaque — a one-line note in the config or Developer Guide would save a future maintainer the archaeology.

One consideration for the drift-prevention suggestion: external-URL checking is non-hermetic (network flakiness, transient 5xx), so folding it into the main build puts that flakiness on every build's critical path. Splitting internal-link checking (hermetic, belongs in build/drift-prevention) from external-URL checking (scheduled or PR-only) might be the sweet spot.

Happy to discuss any of these — points 1 and 2 are the ones I'd consider blocking.

@krokoko krokoko added the approved When an issue has been approved and ready label Jun 10, 2026
@ClintEastman02

ClintEastman02 commented Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @krokoko — addressed all points:

Inline comments (drift-prevention integration):

  • Removed the dedicated link-check job from docs.yml — no more separate CI step
  • Added { task = "//docs:link-check" } to root drift-prevention task so it runs as part of every mise run build
  • docs/package.json script now delegates to mise run link-check

1. [BLOCKING] Pipeline fails open:

  • set -euo pipefail (bash strict mode)
  • Temp-file approach (bash variables can't hold NUL bytes, so -print0 in a subshell doesn't work — used mktemp with trap cleanup instead)
  • Min-file-count assertion (< 10 → error + exit 1)
  • xargs -r to skip invocation on empty input

2. [BLOCKING] aws-cdk dead link:

  • Replaced dead GitHub link in ADR-002 with live AWS docs URL (https://docs.aws.amazon.com/cdk/v2/guide/bootstrapping-env.html)
  • Removed the aws-cdk ignore pattern from config

3. aliveStatusCodes:

  • Dropped 3xx codes — now [200, 206] only, checker follows redirects to validate terminal status

4. Blanket npmjs.com ignore:

  • Narrowed from www.npmjs.com to www.npmjs.com/package/ — npmjs returns 403 (not 429) to automated requests so retryOn429 doesn't help, but typo'd non-package URLs will now get caught

5. -maxdepth 1:

  • Already resolved — named dirs (guides, design, decisions) have no depth limit; -maxdepth 1 only applies to root .. (correct behavior)

6. Failure path demonstration:

  • Will push a throwaway commit with a broken link to show the gate going red

7. Nit (^link$ opaque):

  • Added _comment field explaining it masks the intentional placeholder in ADR-004 example prose

Architecture (internal vs external split):

  • Acknowledged — since external URL checking is non-hermetic, this may need splitting in a follow-up if flakiness becomes an issue. For now retryCount: 3 + fallbackRetryDelay: 10s provides some resilience.

ClintEastman02 and others added 3 commits June 10, 2026 13:40
…prevention

Remove dedicated link-check CI job from docs.yml and wire it into the
existing drift-prevention task so it runs as part of every build. Harden
the script with pipefail, temp-file approach, min-file-count assertion,
and xargs -r. Fix dead aws-cdk link in ADR-002, narrow npmjs ignore to
package paths only, drop 3xx from aliveStatusCodes, and document the
^link$ ignore pattern.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved When an issue has been approved and ready

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(docs): add markdown-link-check/linkinator to docs CI (CA-12)

3 participants