From 93fbd516964c75c7fbe1832a697706f0eb006edd Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Tue, 17 Feb 2026 22:26:56 +0000 Subject: [PATCH 1/2] Add link-checker workflow source and Glossary Maintainer workflow Phase 1 - Consistency Fix: - Added missing workflows/link-checker.md (docs existed but workflow source was missing) Phase 2 - New Workflow: - Added Glossary Maintainer workflow from Peli's Agent Factory - Perfect 100% merge rate (10/10 PRs) in original repository - Generalized for any project with glossary/terminology documentation - Added workflows/glossary-maintainer.md - Added docs/glossary-maintainer.md - Updated README.md with new workflow entry Source: https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/glossary-maintainer.md Blog: https://github.github.com/gh-aw/blog/2026-01-13-meet-the-workflows-documentation/ Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- README.md | 1 + docs/glossary-maintainer.md | 173 +++++++++++++++++++++ workflows/glossary-maintainer.md | 248 +++++++++++++++++++++++++++++++ workflows/link-checker.md | 230 ++++++++++++++++++++++++++++ 4 files changed, 652 insertions(+) create mode 100644 docs/glossary-maintainer.md create mode 100644 workflows/glossary-maintainer.md create mode 100644 workflows/link-checker.md diff --git a/README.md b/README.md index 052715c..94ed56d 100644 --- a/README.md +++ b/README.md @@ -49,6 +49,7 @@ You can use the "/plan" agent to turn the reports into actionable issues which c - [📖 Regular Documentation Update](docs/update-docs.md) - Update documentation automatically - [📖 Daily Documentation Updater](docs/daily-doc-updater.md) - Automatically update documentation based on recent code changes and merged PRs +- [📖 Glossary Maintainer](docs/glossary-maintainer.md) - Automatically maintain project glossary based on codebase changes - [🔗 Link Checker](docs/link-checker.md) - Daily automated link checker that finds and fixes broken links in documentation - [✨ Code Simplifier](docs/code-simplifier.md) - Automatically simplify recently modified code for improved clarity and maintainability - [⚡ Daily Progress](docs/daily-progress.md) - Automated daily feature development following a structured roadmap diff --git a/docs/glossary-maintainer.md b/docs/glossary-maintainer.md new file mode 100644 index 0000000..8137acc --- /dev/null +++ b/docs/glossary-maintainer.md @@ -0,0 +1,173 @@ +# 📖 Glossary Maintainer + +> For an overview of all available workflows, see the [main README](../README.md). + +The [Glossary Maintainer workflow](../workflows/glossary-maintainer.md?plain=1) automatically maintains project glossary or terminology documentation by scanning code changes and keeping technical terms up-to-date. + +## Installation + +```bash +# Install the 'gh aw' extension +gh extension install github/gh-aw + +# Add the workflow to your repository +gh aw add-wizard githubnext/agentics/glossary-maintainer +``` + +This walks you through adding the workflow to your repository. + +You can start a run of this workflow immediately by running: + +```bash +gh aw run glossary-maintainer +``` + +## What It Does + +The Glossary Maintainer workflow runs daily on weekdays and: + +1. **Scans Recent Changes** - Reviews commits and PRs from the last 24 hours (daily) or 7 days (weekly on Mondays) +2. **Identifies New Terms** - Finds technical terminology, configuration options, and project-specific concepts +3. **Updates Definitions** - Adds new terms or updates existing definitions based on code changes +4. **Maintains Consistency** - Ensures glossary follows existing structure and style +5. **Creates Pull Requests** - Proposes glossary updates for review when new terms are found + +## How It Works + +### Incremental vs Full Scans + +- **Daily (Mon-Fri)**: Incremental scan of last 24 hours +- **Monday**: Full scan of last 7 days for comprehensive review + +### What Gets Added + +The workflow identifies terms that are: +- Used in user-facing documentation or code +- Project-specific or domain-specific +- Technical terms requiring explanation +- Newly introduced in recent changes + +### What Gets Skipped + +The workflow avoids adding: +- Generic programming terms +- Self-evident terminology +- Internal implementation details +- Terms only in code comments + +## Configuration + +This workflow works out of the box and automatically: +- Locates your glossary file (common paths: `docs/glossary.md`, `docs/reference/glossary.md`, `GLOSSARY.md`) +- Follows your existing glossary structure and style +- Maintains alphabetical or categorical organization +- Uses cache memory to avoid duplicate work + +You can edit the workflow to customize: +- The glossary file path if it's in a non-standard location +- The scanning timeframe (daily vs weekly) +- The PR title prefix and labels + +After editing, run `gh aw compile` to apply your changes. + +## What it reads from GitHub + +- Recent commits and their diffs +- Merged pull requests from the specified timeframe +- PR descriptions and comments +- Code changes that introduce new terminology + +## What it creates + +- Pull requests with glossary updates +- Cache memory of processed commits to avoid duplicates + +## When it runs + +- **Daily on weekdays** (incremental scan of last 24 hours) +- **Mondays** (full scan of last 7 days) +- **Manually** via workflow_dispatch + +## Permissions required + +- `contents: read` - To read repository files +- `issues: read` - To review issue discussions +- `pull-requests: read` - To analyze PR descriptions +- `actions: read` - To check workflow runs + +## Benefits + +1. **Automatic maintenance**: Keeps terminology documentation current without manual tracking +2. **Consistency**: Ensures definitions stay aligned with code changes +3. **Accessibility**: Helps new contributors understand project-specific terms +4. **Time savings**: Eliminates manual glossary updates +5. **Historical context**: Uses cache memory to track what's been processed + +## Example Output + +When the workflow finds new terms, it creates a PR like: + +**Title:** `[docs] Update glossary - daily scan` + +**Body:** +```markdown +### Glossary Updates + +**Scan Type**: Incremental (daily) + +**Terms Added**: +- **Safe Outputs**: Security mechanism for controlling workflow write operations +- **Cache Memory**: Persistent storage for workflow state across runs + +**Terms Updated**: +- **Frontmatter**: Updated to include new engine configuration options + +**Changes Analyzed**: +- Reviewed 5 commits from last 24 hours +- Analyzed 2 merged PRs +- Processed 1 new feature (cache-memory tool) + +**Related Changes**: +- abc1234: Add cache-memory tool support +- PR #123: Implement safe-outputs validation +``` + +## Source and Success Rate + +This workflow is adapted from Peli's Agent Factory, where it achieved: + +- **100% merge rate** (10 out of 10 PRs merged) +- **Perfect accuracy** in identifying relevant terminology +- **High value** for documentation consistency + +Source: [Glossary Maintainer](https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/glossary-maintainer.md) from [github/gh-aw](https://github.com/github/gh-aw) + +Read more: [Meet the Workflows: Continuous Documentation](https://github.github.com/gh-aw/blog/2026-01-13-meet-the-workflows-documentation/) + +## Generalization Notes + +This workflow has been adapted from the gh-aw repository to work with any project: + +- **Flexible glossary location**: Automatically finds glossary files in common locations +- **Structure-agnostic**: Works with alphabetical or categorical organization +- **No external dependencies**: Doesn't require Serena or project-specific tools +- **Universal terminology**: Identifies terms based on usage patterns, not hard-coded lists + +## Use Cases + +This workflow is valuable for: + +- **Open source projects** with growing terminology +- **Technical documentation** that needs to stay synchronized with code +- **API projects** with evolving configuration options +- **Developer tools** with CLI commands and options +- **Frameworks** with specialized concepts and patterns + +## Future Enhancements + +Possible improvements: +- Suggest related terms when adding new definitions +- Detect inconsistent terminology usage across documentation +- Generate glossary if one doesn't exist +- Link term usage to specific code locations +- Track term popularity and usage frequency diff --git a/workflows/glossary-maintainer.md b/workflows/glossary-maintainer.md new file mode 100644 index 0000000..7b127b0 --- /dev/null +++ b/workflows/glossary-maintainer.md @@ -0,0 +1,248 @@ +--- +name: Glossary Maintainer +description: Maintains and updates the documentation glossary based on codebase changes +on: + schedule: daily on weekdays + workflow_dispatch: + +permissions: + contents: read + issues: read + pull-requests: read + actions: read + +network: + allowed: + - node + - python + - github + +safe-outputs: + create-pull-request: + expires: 2d + title-prefix: "[docs] " + labels: [documentation, glossary] + draft: false + noop: + +tools: + cache-memory: true + github: + toolsets: [default] + edit: + bash: true + +timeout-minutes: 20 + +--- + +# Glossary Maintainer + +You are an AI documentation agent that maintains the project glossary or terminology reference documentation. + +## Your Mission + +Keep the glossary up-to-date by: +1. Scanning recent code changes for new technical terms +2. Performing incremental updates daily (last 24 hours) +3. Performing comprehensive full scan on Mondays (last 7 days) +4. Adding new terms and updating definitions based on repository changes + +## Task Steps + +### 1. Locate the Glossary File + +First, find the glossary file in the repository. Common locations include: +- `docs/glossary.md` +- `docs/reference/glossary.md` +- `GLOSSARY.md` +- `docs/terminology.md` +- Look for files with "glossary", "terminology", or "definitions" in the name + +Use bash to search: + +````bash +find . -iname "*glossary*" -o -iname "*terminology*" -o -iname "*definitions*" | grep -v node_modules | grep -v .git +```` + +If no glossary file exists, check if the project would benefit from one by examining the documentation structure. If so, you may create a new glossary file. + +### 2. Determine Scan Scope + +Check what day it is: +- **Monday**: Full scan (review changes from last 7 days) +- **Other weekdays**: Incremental scan (review changes from last 24 hours) + +Use bash commands to check recent activity: + +````bash +# For incremental (daily) scan +git log --since='24 hours ago' --oneline + +# For full (weekly) scan on Monday +git log --since='7 days ago' --oneline +```` + +### 3. Load Cache Memory + +You have access to cache-memory to track: +- Previously processed commits +- Terms that were recently added +- Terms that need review + +Check your cache to avoid duplicate work: +- Load the list of processed commit SHAs +- Skip commits you've already analyzed + +### 4. Scan Recent Changes + +Based on the scope (daily or weekly): + +**Use GitHub tools to:** +- List recent commits using `list_commits` for the appropriate timeframe +- Get detailed commit information using `get_commit` for commits that might introduce new terminology +- Search for merged pull requests using `search_pull_requests` +- Review PR descriptions and comments for new terminology + +**Look for:** +- New configuration options or settings +- New command names or API endpoints +- New tool names or dependencies +- New concepts or features +- Technical acronyms that need explanation +- Specialized terminology unique to this project +- Terms that appear multiple times in recent changes + +### 5. Review Current Glossary + +If a glossary exists, read it to understand the current structure: + +````bash +cat [path-to-glossary-file] +```` + +**Check for:** +- Terms that are missing from the glossary +- Terms that need updated definitions +- Outdated terminology +- Inconsistent definitions +- The organizational structure (alphabetical, by category, etc.) + +### 6. Identify New Terms + +Based on your scan of recent changes, create a list of: + +1. **New terms to add**: Technical terms introduced in recent changes +2. **Terms to update**: Existing terms with changed meaning or behavior +3. **Terms to clarify**: Terms with unclear or incomplete definitions + +**Criteria for inclusion:** +- The term is used in user-facing documentation or code +- The term requires explanation (not self-evident) +- The term is specific to this project or domain +- The term is likely to confuse users without a definition + +**Do NOT add:** +- Generic programming terms (unless used in a specific way) +- Self-evident terms +- Internal implementation details +- Terms only used in code comments + +### 7. Update the Glossary + +For each term identified: + +1. **Determine the correct location** in the glossary: + - Follow the existing organizational structure + - If alphabetical, place in alphabetical order + - If categorized, choose the appropriate category + +2. **Write the definition** following these guidelines: + - Start with what the term is (not what it does) + - Use clear, concise language + - Include context if needed + - Add a simple example if helpful + - Link to related documentation if available + +3. **Maintain consistency** with existing entries: + - Follow the same formatting pattern + - Use similar tone and style + - Keep definitions at a similar level of detail + +4. **Use the edit tool** to update the glossary file + +### 8. Save Cache State + +Update your cache-memory with: +- Commit SHAs you processed +- Terms you added or updated +- Date of last full scan +- Any notes for next run + +This prevents duplicate work and helps track progress. + +### 9. Create Pull Request or Report + +If you made any changes to the glossary: + +**Use safe-outputs create-pull-request** to create a PR with: + +**PR Title Format**: +- Daily: `[docs] Update glossary - daily scan` +- Weekly: `[docs] Update glossary - weekly full scan` + +**PR Description Template**: +````markdown +### Glossary Updates + +**Scan Type**: [Incremental (daily) / Full scan (weekly)] + +**Terms Added**: +- **Term Name**: Brief explanation of why it was added + +**Terms Updated**: +- **Term Name**: What changed and why + +**Changes Analyzed**: +- Reviewed X commits from [timeframe] +- Analyzed Y merged PRs +- Processed Z new features + +**Related Changes**: +- Commit SHA: Brief description +- PR #NUMBER: Brief description +```` + +**If no new terms are identified**, use the `noop` safe output with a message like: +- "All terminology is current - no new terms identified in recent changes" +- "Glossary is up-to-date after reviewing [X] commits" + +### 10. Handle Edge Cases + +- **No glossary file exists**: Consider if the project would benefit from a glossary. If yes, create one with initial terms. If no, use `noop` to report that no glossary exists. +- **No new terms**: Exit gracefully using `noop` +- **Unclear terms**: Add them with a note that they may need review +- **Conflicting definitions**: Note both meanings if a term has multiple uses + +## Guidelines + +- **Be Selective**: Only add terms that genuinely need explanation +- **Be Accurate**: Ensure definitions match actual implementation and usage +- **Be Consistent**: Follow existing glossary style and structure +- **Be Complete**: Don't leave terms partially defined +- **Be Clear**: Write for users who are learning, not experts +- **Follow Structure**: Maintain the existing organizational pattern +- **Use Cache**: Track your work to avoid duplicates +- **Link Appropriately**: Add references to related documentation where helpful + +## Important Notes + +- You have edit tool access to modify the glossary +- You have GitHub tools to search and review changes +- You have bash commands to explore the repository +- You have cache-memory to track your progress +- The safe-outputs create-pull-request will create a PR automatically +- Focus on user-facing terminology and concepts +- Review recent changes to understand what's actively being developed + +Your work helps users understand project-specific terminology and concepts, making documentation more accessible and consistent. diff --git a/workflows/link-checker.md b/workflows/link-checker.md new file mode 100644 index 0000000..b371613 --- /dev/null +++ b/workflows/link-checker.md @@ -0,0 +1,230 @@ +--- +description: Daily automated link checker that finds and fixes broken links in documentation files +on: + schedule: daily on weekdays +permissions: read-all +timeout-minutes: 60 +network: + allowed: + - node + - python + - github +steps: + - name: Checkout repository + uses: actions/checkout@v4 + with: + fetch-depth: 0 + + - name: Check and test all documentation links + id: link-check + run: | + echo "# Link Check Results" > /tmp/link-check-results.md + echo "" >> /tmp/link-check-results.md + + # Find all markdown files in docs directory and README + echo "Finding all markdown files..." + MARKDOWN_FILES=$(find docs README.md -type f -name "*.md" 2>/dev/null || echo "") + + if [ -z "$MARKDOWN_FILES" ]; then + echo "No markdown files found" + echo "no_files=true" >> $GITHUB_OUTPUT + exit 0 + fi + + # Extract all links from markdown files + echo "## Links Found" >> /tmp/link-check-results.md + echo "" >> /tmp/link-check-results.md + + # Use grep to find markdown links and HTTP(S) URLs + for file in $MARKDOWN_FILES; do + echo "Checking $file..." + # Extract markdown links [text](url) + grep -oP '\[([^\]]+)\]\(([^\)]+)\)' "$file" | grep -oP '\(([^\)]+)\)' | tr -d '()' >> /tmp/all-links.txt 2>/dev/null || true + # Extract plain HTTP(S) URLs + grep -oP 'https?://[^\s<>"]+' "$file" >> /tmp/all-links.txt 2>/dev/null || true + done + + # Remove duplicates and sort + if [ -f /tmp/all-links.txt ]; then + sort -u /tmp/all-links.txt > /tmp/unique-links.txt + LINK_COUNT=$(wc -l < /tmp/unique-links.txt) + echo "Found $LINK_COUNT unique links" >> /tmp/link-check-results.md + echo "" >> /tmp/link-check-results.md + else + echo "No links found" >> /tmp/link-check-results.md + echo "no_links=true" >> $GITHUB_OUTPUT + exit 0 + fi + + # Test each link + echo "## Link Test Results" >> /tmp/link-check-results.md + echo "" >> /tmp/link-check-results.md + echo "Testing links..." >> /tmp/link-check-results.md + + BROKEN_COUNT=0 + WORKING_COUNT=0 + + while IFS= read -r url; do + # Skip relative links and anchors + if [[ "$url" == "#"* ]] || [[ "$url" != "http"* ]]; then + continue + fi + + # Test the link with curl + HTTP_CODE=$(curl -L -s -o /dev/null -w "%{http_code}" --max-time 10 "$url" 2>/dev/null || echo "000") + + if [[ "$HTTP_CODE" =~ ^2 ]] || [[ "$HTTP_CODE" =~ ^3 ]]; then + WORKING_COUNT=$((WORKING_COUNT + 1)) + echo "✅ $url (HTTP $HTTP_CODE)" >> /tmp/link-check-results.md + else + BROKEN_COUNT=$((BROKEN_COUNT + 1)) + echo "❌ $url (HTTP $HTTP_CODE)" >> /tmp/link-check-results.md + fi + done < /tmp/unique-links.txt + + echo "" >> /tmp/link-check-results.md + echo "**Summary:** $WORKING_COUNT working, $BROKEN_COUNT broken" >> /tmp/link-check-results.md + + # Output results + echo "broken_count=$BROKEN_COUNT" >> $GITHUB_OUTPUT + echo "working_count=$WORKING_COUNT" >> $GITHUB_OUTPUT + + cat /tmp/link-check-results.md + shell: bash + +tools: + github: + toolsets: [default] + cache-memory: true + web-fetch: + +safe-outputs: + create-pull-request: + title-prefix: "[link-checker] " + labels: [documentation, automated] + draft: false + if-no-changes: "warn" + noop: +--- + +# Daily Link Checker & Fixer + +You are an automated link checker and fixer agent. Your job is to find and fix broken links in the documentation files of this repository. + +## Your Mission + +Your workflow has already collected and tested all links in the previous step. Use the test results to identify broken links and fix them where possible. + +## Step 1: Review Link Check Results + +The link check step has already run and created a report at `/tmp/link-check-results.md`. Read this file to see: +- All links found in the documentation +- Which links are working (✅) and which are broken (❌) +- HTTP status codes for each link + +Use bash to read the file: +```bash +cat /tmp/link-check-results.md +``` + +## Step 2: Load Cache Memory + +Check cache memory for previously identified unfixable broken links: +- Load the cache memory to see if there are any broken links we've tried to fix before but couldn't +- These are links that are permanently broken or removed from the internet +- Skip these links to avoid repeated attempts + +The cache memory should store a JSON object with this structure: +```json +{ + "unfixable_links": [ + { + "url": "https://example.com/removed-page", + "reason": "404 Not Found - content removed", + "first_seen": "2026-02-17" + } + ], + "last_run": "2026-02-17" +} +``` + +## Step 3: Research and Fix Broken Links + +For each broken link found in the test results (but NOT in the unfixable list): + +1. **Investigate the link:** + - Determine what the link was supposed to point to based on: + - The link text in the markdown + - The context around the link + - The surrounding documentation + +2. **Search for alternatives:** + - Use web-fetch to search for if the content has moved to a new URL + - Try common alternatives (www vs non-www, http vs https, with/without trailing slash) + - Look for redirects or updated documentation + - Check if there's an official replacement + +3. **Fix the link:** + - If you find a working replacement URL, use the `edit` tool to update the markdown file + - Replace the broken URL with the working one + - Make sure to preserve the link text and formatting + +4. **Document unfixable links:** + - If a link truly cannot be fixed (content permanently removed, no alternatives found): + - Add it to the unfixable_links list in cache memory + - Include the URL, reason, and date + - This prevents future runs from wasting time on the same broken link + +## Step 4: Update Cache Memory + +After processing all broken links: +- Update the cache memory with any new unfixable links +- Update the "last_run" timestamp +- Save the updated cache memory + +## Step 5: Create Pull Request or Noop + +Based on your work: + +**If you fixed any links:** +- Use the `create-pull-request` safe output to create a PR with your fixes +- In the PR body, include: + - A summary of how many links were fixed + - A list of the broken links and their replacements + - Any links that were added to the unfixable list +- Title format: "Fix broken documentation links" + +**If no links needed fixing:** +- Use the `noop` safe output with a clear message like: + - "All documentation links are working correctly" (if no broken links found) + - "All broken links are in the unfixable list, no new fixes available" (if broken links exist but can't be fixed) + +## Important Guidelines + +- **Be thorough:** Check every broken link carefully +- **Preserve context:** When replacing links, make sure the new URL points to equivalent or better content +- **Document everything:** Keep the cache memory up to date with unfixable links +- **Be selective:** Only add links to the unfixable list if you've genuinely tried to find alternatives +- **Use web-fetch wisely:** Try to fetch the broken URL and check for redirects or alternatives +- **Relative links:** Focus only on HTTP(S) links. Skip relative links and anchors (they're tested differently) + +## Example Cache Memory Update + +```json +{ + "unfixable_links": [ + { + "url": "https://old-docs.example.com/api/v1", + "reason": "Documentation site shut down, no replacement found despite searching", + "first_seen": "2026-02-17" + } + ], + "last_run": "2026-02-17" +} +``` + +## Context + +- Repository: `${{ github.repository }}` +- Run daily on weekdays to catch broken links early +- Link test results are available at `/tmp/link-check-results.md` From d7fac53020e1ecbd2c1beb585adffa9b7d79dc33 Mon Sep 17 00:00:00 2001 From: Don Syme Date: Wed, 18 Feb 2026 00:02:51 +0000 Subject: [PATCH 2/2] Update glossary-maintainer.md --- docs/glossary-maintainer.md | 21 ++------------------- 1 file changed, 2 insertions(+), 19 deletions(-) diff --git a/docs/glossary-maintainer.md b/docs/glossary-maintainer.md index 8137acc..73367a1 100644 --- a/docs/glossary-maintainer.md +++ b/docs/glossary-maintainer.md @@ -132,26 +132,9 @@ When the workflow finds new terms, it creates a PR like: - PR #123: Implement safe-outputs validation ``` -## Source and Success Rate +## Source -This workflow is adapted from Peli's Agent Factory, where it achieved: - -- **100% merge rate** (10 out of 10 PRs merged) -- **Perfect accuracy** in identifying relevant terminology -- **High value** for documentation consistency - -Source: [Glossary Maintainer](https://github.com/github/gh-aw/blob/v0.45.5/.github/workflows/glossary-maintainer.md) from [github/gh-aw](https://github.com/github/gh-aw) - -Read more: [Meet the Workflows: Continuous Documentation](https://github.github.com/gh-aw/blog/2026-01-13-meet-the-workflows-documentation/) - -## Generalization Notes - -This workflow has been adapted from the gh-aw repository to work with any project: - -- **Flexible glossary location**: Automatically finds glossary files in common locations -- **Structure-agnostic**: Works with alphabetical or categorical organization -- **No external dependencies**: Doesn't require Serena or project-specific tools -- **Universal terminology**: Identifies terms based on usage patterns, not hard-coded lists +This workflow is adapted from Peli's Agent Factory. Read more: [Meet the Workflows: Continuous Documentation](https://github.github.com/gh-aw/blog/2026-01-13-meet-the-workflows-documentation/) ## Use Cases