feat: Goe rewrite by jacobseunglee · Pull Request #5 · Akshay-Rohatgi/GameOfEverything

jacobseunglee · 2026-06-14T20:24:20Z

No description provided.

Implements the GoE v2 foundation: Pydantic v2 models for the full entity graph + procedure DSL, a step-by-step procedure executor with interpolation/ assertions/output capture, and a thin TestEnvironment adapter over v1 TestEnvironmentTool. 74 tests passing (3 browser/listen xfailed with documented root causes). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…g, and SUID privesc Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

…ations implemented and tests for attack procedures added Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

…ng working: detached background processes, attacker container reset on retry, chromium browser installed via PPA Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

…self-review to address commonly seen custom app development pitfalls Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

…g tests and visualizers Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

* feat: model change * delete game_of_everything/goe/jacobtest.yaml * fix: workflow * fix: switch off plan determined runtime

Adds the v2 evaluation suite (goe/eval), metrics instrumentation (goe/metrics), single-system orchestration/packaging (goe/flow, goe/packaging), workflow artifacts (goe/artifacts), and their tests and fixtures. Correctness fixes from code review: - build.py: a non-zero deploy exit no longer falls through to a possible PASS; it routes into the retry loop as a design_flaw. - runtimes/registry.py: create parent dirs for nested source-file paths (set -e no longer aborts); raise on unknown db_type. - eval/llm_judge.py: print_judge_result tolerates missing keys. - eval/golden.py: edge coverage requires a real connecting edge, not independent provides/requires matches. - eval/runner.py: capture real run start time (durations were ~0). - flow/orchestrator.py + checkpoint.py: persist and restore failure_category on the resume path. Cleanups: - metrics/collector.py: drop dead capture_artifacts ternary. - bedrock.py: cache the bedrock-runtime client per region/creds instead of rebuilding it on every call. - runtimes: consolidate per-runtime knowledge into the template YAMLs (target_image, deps_install_template, pre_start); deploy() is now table-driven and _RUNTIME_IMAGES is removed. Co-authored-by: Jacob Lee <66867022+jacobseunglee@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>

Copilot

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

Combines planner improvements (atom catalog + grading step) with Phase 4 increment 1 (multi-entity chain test + multi-system packaging). ## Planner Improvements Fixes planner producing wrong runtimes (apache_php for SSH) and invented atoms by grounding all prompts in actual atom inventory with few-shot examples. **New pipeline structure:** - Step 1: plan_entities → includes runtime + atoms in stubs - Step 1.5: grade_stubs → LLM validates/corrects stubs (5-check rubric) - Step 2: specify_entities → adds edges with validated runtime/atoms **Atom catalog** (goe/planner/_atom_catalog.py): - Parses 13 web vuln atoms from atoms/web_vulnerabilities/*.md - Extracts descriptions, compatible runtimes (from code examples), capabilities - Provides rich markdown table for prompt injection **Prompts rewritten** (all 4 steps + grading): - design_systems: port-to-runtime mapping + 2 examples - plan_entities: atom catalog injection, runtime selection rules, 2 examples - grade_stubs (new): 5-check rubric (atom exists, runtime matches, web vs system, single responsibility, chain logic) - specify_entities: rich atom table, runtime affinity rules, edge consistency - connect_edges: edge type selection guide, 2 examples **Fixes:** - resolve.py: match structural port to exposed ports (not always first) - topology_environment.py: create containers on network directly (not none→connect) ## Phase 4: Multi-System Orchestration **Chain test** (goe/flow/chain_test.py): - L3 validation after all entities pass L2 - Gates overall run (FAILED → RunResult.success=False, CLI exits non-zero) - TopologyEnvironment: one ubuntu:22.04 container per system + shared Kali attacker - Chain attacker agent (Opus) synthesizes end-to-end procedure - Retries up to 2× on failure **Multi-system packaging** (goe/packaging/packager.py): - Single-system: unchanged (deploy.sh + playbook.yaml) - Multi-system: per-system deploy scripts + docker-compose.yml + chain_playbook.yaml - Port collision detection scoped per system_id **Cross-system addressing** (goe/executor/interpolation.py): - ${system.<system_id>.host} / ${system.<system_id>.port} - Existing ${target_host}, ${edge.*}, ${steps.*} unchanged **Orchestrator** (goe/flow/orchestrator.py): - Runs chain test when len(built) > 1 - Chain test result gates success - Persists chain_test in checkpoint **CLI** (goe/flow/__main__.py): - goe flow test <output_dir> — replays packaged runs - Auto-detects chain_playbook.yaml for multi-system replay ## Verification End-to-end test: "SQLi → SSH" scenario that was failing before: - ✓ SSH entity now has runtime=ubuntu (was apache_php) - ✓ Both entities build successfully - ✓ L3 chain test completes (synthesizes SQLi→SSH attack chain) - ✓ Output includes chain_playbook.yaml with cross-system addressing Co-Authored-By: Jacob Lee <66867022+jacobseunglee@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

Akshay-Rohatgi and others added 8 commits May 20, 2026 19:37

test(procedures): Tests for basic execution, SSH logins, step chainin…

3f05d8f

…g, and SUID privesc Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

feat(custom_app,attack_procedure_tests): Construction crew for applic…

dd5fbb8

…ations implemented and tests for attack procedures added Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

fix(listener,headless chrome): Components for admin bot cookie steali…

183a242

…ng working: detached background processes, attacker container reset on retry, chromium browser installed via PPA Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

feat(self-review) Custom app generation now incorporates a mandatory …

47020f9

…self-review to address commonly seen custom app development pitfalls Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

feat(graph_model,graph_visualization): Graph models with correspondin…

7d48f6e

…g tests and visualizers Signed-off-by: Akshay Rohatgi <52616034+Akshay-Rohatgi@users.noreply.github.com>

feat: model change (#4)

788ffcb

* feat: model change * delete game_of_everything/goe/jacobtest.yaml * fix: workflow * fix: switch off plan determined runtime

Akshay-Rohatgi requested a review from Copilot June 14, 2026 20:25

Copilot AI reviewed Jun 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Goe rewrite#5

feat: Goe rewrite#5
jacobseunglee wants to merge 9 commits into
mainfrom
goe-rewrite

jacobseunglee commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jacobseunglee commented Jun 14, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants