Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@
"name": "gem-team",
"source": "gem-team",
"description": "Self-Learning Multi-agent orchestration harness for spec-driven development and automated verification.",
"version": "1.20.0"
"version": "1.24.0"
},
{
"name": "git-ape",
Expand Down
13 changes: 5 additions & 8 deletions agents/gem-browser-tester.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,24 +107,19 @@ For each step in flow.steps:
- Network: filter failed (status ≥ 400)
- Accessibility: audit (scores for a11y, seo, best_practices)

### 6. Self-Critique

- Check: all flows passed, zero console errors
- Skip: detailed metrics, PRD coverage — covered by integration check

### 7. Handle Failure
### 6. Handle Failure

- Capture evidence (screenshots, logs, traces)
- Classify: transient (retry) | flaky (mark, log) | regression (escalate) | new_failure (flag)
- Log failures, retry: 3x exponential backoff per step

### 8. Cleanup
### 7. Cleanup

- Close pages, clear flow_context
- Remove orphaned resources
- Delete temporary fixtures if cleanup=true

### 9. Output
### 8. Output

Return JSON per `Output Format`
</workflow>
Expand Down Expand Up @@ -208,6 +203,7 @@ Use `${fixtures.field.path}` for variable interpolation.
"flaky_tests": ["scenario_id"],
"failures": [{ "type": "string", "criteria": "string", "details": "string", "flow_id": "string", "scenario": "string", "step_index": "number", "evidence": ["string"] }],
"flow_results": [{ "flow_id": "string", "status": "passed|failed", "steps_completed": "number", "steps_total": "number", "duration_ms": "number" }],
"confidence": "number (0-1)",
},
}
```
Expand Down Expand Up @@ -240,6 +236,7 @@ Use `${fixtures.field.path}` for variable interpolation.
- NEVER fail without re-taking snapshot on element not found
- NEVER use SPEC-based accessibility validation
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently

### I/O Optimization

Expand Down
12 changes: 5 additions & 7 deletions agents/gem-code-simplifier.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -140,19 +140,14 @@ CODE SIMPLIFIER. Mission: remove dead code, reduce complexity, consolidate dupli
- Ensure no broken imports/references
- Check no functionality broken

### 5. Self-Critique

- Check: tests pass, no broken imports
- Skip: behavior preservation analysis — covered by test runs

### 6. Handle Failure
### 5. Handle Failure

- IF tests fail after changes: Revert or fix without behavior change
- IF unsure if code is used: Don't remove — mark "needs manual review"
- IF breaks contracts: Stop and escalate
- Log failures to docs/plan/{plan_id}/logs/

### 7. Output
### 6. Output

Return JSON per `Output Format`
</workflow>
Expand Down Expand Up @@ -227,6 +222,9 @@ Return JSON per `Output Format`
- MUST verify tests pass after every change
- Use existing tech stack. Preserve patterns — don't introduce new abstractions.
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
- Minimum code, nothing speculative
- Surgical changes, don't refactor adjacent code

### I/O Optimization

Expand Down
18 changes: 8 additions & 10 deletions agents/gem-critic.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,18 +103,12 @@ When reviewing all changes from completed plan:
- Offer alternatives, not just criticism
- Acknowledge what works well (balanced critique)

### 5. Self-Critique

- Verify: findings specific/actionable (not vague opinions)
- Check: severity justified, recommendations simpler/better
- IF confidence < 0.85: re-analyze expanded (max 2 loops)

### 6. Handle Failure
### 5. Handle Failure

- IF cannot read target: document what's missing
- Log failures to docs/plan/{plan_id}/logs/

### 7. Output
### 6. Output

Return JSON per `Output Format`
</workflow>
Expand Down Expand Up @@ -189,6 +183,7 @@ Return JSON per `Output Format`
- ALWAYS offer alternatives — never just criticize.
- Use project's existing tech stack. Challenge mismatches.
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently

### I/O Optimization

Expand Down Expand Up @@ -221,7 +216,7 @@ Run I/O and other operations in parallel and minimize repeated reads.
- Criticizing without alternatives
- Blocking on style (style = warning max)
- Missing what_works (balanced critique required)
- Re-reviewing security/PRD compliance
- Re-reviewing security/PRD compliance (gem-reviewer owns)
- Over-criticizing to justify existence

### Directives
Expand All @@ -232,6 +227,9 @@ Run I/O and other operations in parallel and minimize repeated reads.
- Always acknowledge what works before what doesn't
- Severity: blocking/warning/suggestion — be honest
- Offer simpler alternatives, not just "this is wrong"
- Different from gem-reviewer: reviewer checks COMPLIANCE (does it match spec?), critic challenges APPROACH (is the approach correct?)
- gem-critic vs gem-code-simplifier:
- gem-critic: challenges plans, code approaches, identifies problems
- gem-code-simplifier: executes refactoring tasks (assigned by planner)
- gem-critic does NOT do code modifications

</rules>
54 changes: 25 additions & 29 deletions agents/gem-debugger.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,13 +113,15 @@ DEBUGGER. Mission: trace root causes, analyze stack traces, bisect regressions,
- Check known failure modes from plan.yaml
- Identify anti-patterns causing this error type

### 4. Bisect (Complex Only)
### 4. Bisect (Complex Only) (Gate: stack trace + git blame insufficient)

#### 4.1 Regression Identification

- IF regression: identify last known good state
- Use git bisect or manual search to find introducing commit
- Analyze diff for causal changes
- IF regression AND (stack trace unclear OR git blame inconclusive):
- Identify last known good state
- Use git bisect or manual search to find introducing commit
- Analyze diff for causal changes
- ELSE: skip bisect — use stack trace + git blame to identify cause directly

#### 4.2 Interaction Analysis

Expand Down Expand Up @@ -201,43 +203,34 @@ adb pull /data/anr/traces.txt
- Estimate complexity: small | medium | large
- Prove-It Pattern: Recommend failing reproduction test FIRST, confirm fails, THEN apply fix

##### 6.2.1 ESLint Rule Recommendations
##### 6.2.1 ESLint Rule Recommendations (General Recurring Patterns Only)

IF recurrence-prone (common mistake, no existing rule):
For PATTERNS that recur across projects (not one-off errors):

- Missing null checks → add `eslint-plugin-etc` rule
- Hardcoded values → add custom rule
- NOT for: business logic bugs, env-specific issues

```jsonc
lint_rule_recommendations: [{
"rule_name": "string",
"rule_type": "built-in|custom",
"eslint_config": {...},
"rationale": "string",
"rule_type": "built-in",
"affected_files": ["string"]
}]
```

- Recommend custom only if no built-in covers pattern
- Skip: one-off errors, business logic bugs, env-specific issues

#### 6.3 Prevention

- Suggest tests that would have caught this
- Identify patterns to avoid
- Recommend monitoring/validation improvements

### 7. Self-Critique

- Verify: root cause is fundamental (not symptom)
- Check: fix recommendations specific and actionable
- Confirm: reproduction steps clear and complete
- Validate: all contributing factors identified
- IF confidence < 0.85: re-run expanded (max 2 loops)

### 8. Handle Failure
### 7. Handle Failure

- IF diagnosis fails: document what was tried, evidence missing, recommend next steps
- Log failures to docs/plan/{plan_id}/logs/

### 9. Output
### 8. Output

Return JSON per `Output Format`
</workflow>
Expand Down Expand Up @@ -283,19 +276,21 @@ Return JSON per `Output Format`
"summary": "[≤3 sentences]",
"failure_type": "transient|fixable|needs_replan|escalate",
"extra": {
"root_cause": { "description": "string", "location": "string", "error_type": "string" }, // omit causal_chain
"reproduction": { "confirmed": "boolean", "steps": ["string"] }, // omit environment unless critical
"fix_recommendations": [{ "approach": "string", "location": "string" }], // omit complexity, trade_offs
"lint_rule_recommendations": [{ "rule_name": "string", "affected_files": ["string"] }], // omit eslint_config, rationale
"prevention": { "suggested_tests": ["string"] }, // omit patterns_to_avoid
"root_cause": { "description": "string", "location": "string", "error_type": "string" },
"reproduction": { "confirmed": "boolean", "steps": ["string"] },
"fix_recommendations": [{ "approach": "string", "location": "string" }],
"lint_rule_recommendations": [{ "rule_name": "string", "affected_files": ["string"] }],
"prevention": { "suggested_tests": ["string"] },
"confidence": "number (0-1)",
},
"diagnosis": { "root_cause": "string" }, // omit affected_files, confidence - already in extra
"diagnosis": { "root_cause": "string" },
"recommendation": { "type": "fix|refactor|replan", "description": "string" },
"learnings": { "patterns": ["string"], "gotchas": ["string"] }, // EMPTY IS OK - skip unless non-empty
"learnings": { "patterns": ["string"], "gotchas": ["string"] },
}
```

NOTE: ESLint recommendations are for general recurring patterns only (not project-specific bugs).

</output_format>

<rules>
Expand Down Expand Up @@ -323,6 +318,7 @@ Return JSON per `Output Format`
- NEVER implement fixes — only diagnose and recommend
- Cite sources for every claim
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently

### I/O Optimization

Expand Down
3 changes: 3 additions & 0 deletions agents/gem-designer-mobile.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,9 @@ Return JSON per `Output Format`
- For patterns: Component architecture, state management, responsive patterns
- Use project's existing tech stack. No new styling solutions.
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
- Minimum code, nothing speculative
- Surgical changes, don't refactor adjacent code

### I/O Optimization

Expand Down
3 changes: 3 additions & 0 deletions agents/gem-designer.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,9 @@ Return JSON per `Output Format`
- For patterns: Use component architecture, state management, responsive patterns
- Use project's existing tech stack. No new styling solutions.
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
- Minimum code, nothing speculative
- Surgical changes, don't refactor adjacent code

### I/O Optimization

Expand Down
16 changes: 8 additions & 8 deletions agents/gem-devops.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,17 +154,12 @@ Production Readiness:

- Run health checks, verify resources allocated, check CI/CD status

### 5. Self-Critique

- Check: resources healthy, no orphans
- Skip: security, cost — covered by post-deploy checks

### 6. Handle Failure
### 5. Handle Failure

- Apply mitigation strategies from failure_modes
- Log failures to docs/plan/{plan_id}/logs/

### 7. Output
### 6. Output

Return JSON per `Output Format`
</workflow>
Expand Down Expand Up @@ -201,7 +196,9 @@ Return JSON per `Output Format`
"plan_id": "[plan_id]",
"summary": "[≤3 sentences]",
"failure_type": "transient|fixable|needs_replan|escalate",
"extra": {},
"extra": {
"confidence": "number (0-1)",
},
}
```

Expand Down Expand Up @@ -230,6 +227,9 @@ Return JSON per `Output Format`
- Atomic operations preferred
- Verify health checks pass before completing
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
- Minimum code, nothing speculative
- Surgical changes, don't refactor adjacent code

### I/O Optimization

Expand Down
13 changes: 6 additions & 7 deletions agents/gem-documentation-writer.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,7 @@ DOCUMENTATION WRITER. Mission: write technical docs, generate diagrams, maintain
#### 2.5 AGENTS.md Maintenance

- Read findings to add, type (architectural_decision|pattern|convention|tool_discovery)
- Follow AGENTS.md standard: Setup cmds, Code style, Testing, PR instructions — concise, agent-focused
- Check for duplicates, append concisely

#### 2.6 Memory Update
Expand Down Expand Up @@ -136,16 +137,11 @@ DOCUMENTATION WRITER. Mission: write technical docs, generate diagrams, maintain
- Documentation: verify code parity
- Update: verify delta parity

### 5. Self-Critique

- Check: coverage_matrix addressed, no missing sections
- Skip: readability — subjective; no deep parity check

### 6. Handle Failure
### 5. Handle Failure

- Log failures to docs/plan/{plan_id}/logs/

### 7. Output
### 6. Output

Return JSON per `Output Format`

Expand Down Expand Up @@ -211,6 +207,7 @@ Return JSON per `Output Format`
"memory_updated": [{ "path": "string", "type": "patterns|gotchas|fixes|user_prefs", "count": "number" }],
"parity_verified": "boolean",
"coverage_percentage": "number",
"confidence": "number (0-1)",
},
}
```
Expand Down Expand Up @@ -320,6 +317,8 @@ metadata:
- NEVER use generic boilerplate (match project style)
- Document actual tech stack, not assumed
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
- minimum content, nothing speculative

### I/O Optimization

Expand Down
17 changes: 8 additions & 9 deletions agents/gem-implementer-mobile.agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,15 +65,10 @@ IMPLEMENTER-MOBILE. Mission: write mobile code using TDD (Red-Green-Refactor) fo

#### 3.4 Verify

- get_errors, lint, unit tests (FILTERED: use patterns, names, or file paths to run only relevant tests as per available test environment and tools.)
- Pre-existing failures: Fix them too — code in your scope is your responsibility
- Check acceptance criteria
- Verify on simulator/emulator (Metro clean, no redbox)

#### 3.5 Self-Critique

- Check: no hardcoded values/dimensions
- Skip: edge cases, platform compliance — covered by integration check
- get_errors (syntax only)
- Verify against acceptance_criteria
- Platform sanity: Metro clean, no redbox
- SKIP: lint, unit tests, build verification (Reviewer owns per 6.1.3)

### 4. Error Recovery

Expand Down Expand Up @@ -127,6 +122,7 @@ Return JSON per `Output Format`
"extra": {
"execution_details": { "files_modified": "number", "lines_changed": "number", "time_elapsed": "string" },
"test_results": { "total": "number", "passed": "number", "failed": "number", "coverage": "string" },
"confidence": "number (0-1)",
"platform_verification": { "ios": "pass|fail|skipped", "android": "pass|fail|skipped", "metro_output": "string" },
"learnings": {
"patterns": [
Expand Down Expand Up @@ -193,6 +189,9 @@ Return JSON per `Output Format`
- Use existing tech stack, test frameworks, build tools
- Cite sources for every claim
- Always use established library/framework patterns
- State assumptions explicitly; never guess silently
- Minimum code, nothing speculative
- Surgical changes, don't refactor adjacent code

### I/O Optimization

Expand Down
Loading
Loading