-
Notifications
You must be signed in to change notification settings - Fork 0
add E2E testing framework #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
b489f19
Improve LLM tool parameter guidance and add E2E testing framework
janisz 2cafa48
use gevals
janisz c8c4798
Optimize LLM tool descriptions to reduce unnecessary API calls
janisz e566126
Configure Claude via Vertex AI for E2E testing with improved tool des…
janisz 03fe2e4
fix
janisz 2463626
Address code review feedback from PR #26
janisz 5c4ab07
Update e2e-tests/scripts/build-gevals.sh
janisz a9fffdb
Apply suggestion from @janisz
janisz 9c7a6e1
Apply suggestion from @janisz
janisz 6d9a913
fix
janisz 6c74e90
Migrate e2e tests from gevals to mcpchecker v0.0.4
janisz 3761990
Add E2E smoke test for CI validation
janisz a29703e
Consolidate E2E smoke test into main test workflow
janisz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| version: 2 | ||
| updates: | ||
| # Monitor root Go module | ||
| - package-ecosystem: "gomod" | ||
| directory: "/" | ||
| schedule: | ||
| interval: "daily" | ||
| commit-message: | ||
| prefix: "chore" | ||
| prefix-development: "chore" | ||
| include: "scope" | ||
|
|
||
| # Monitor e2e-tests tools Go module | ||
| - package-ecosystem: "gomod" | ||
| directory: "/e2e-tests/tools" | ||
| schedule: | ||
| interval: "daily" | ||
| commit-message: | ||
| prefix: "chore" | ||
| prefix-development: "chore" | ||
| include: "scope" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| # StackRox MCP E2E Testing | ||
|
|
||
| End-to-end tests for the StackRox MCP server using [mcpchecker](https://github.com/mcpchecker/mcpchecker). | ||
|
|
||
| ## Quick Start | ||
|
|
||
| ### Smoke Test (No Agent Required) | ||
|
|
||
| Validate configuration and build without running actual agents: | ||
|
|
||
| ```bash | ||
| cd e2e-tests | ||
| ./scripts/smoke-test.sh | ||
| ``` | ||
|
|
||
| This is useful for CI and quickly checking that everything compiles. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| - Go 1.25+ | ||
| - Google Cloud Project with Vertex AI enabled (for Claude agent) | ||
| - OpenAI API Key (for LLM judge) | ||
| - StackRox API Token | ||
|
|
||
| ## Setup | ||
|
|
||
| ### 1. Build mcpchecker | ||
|
|
||
| ```bash | ||
| cd e2e-tests | ||
| ./scripts/build-mcpchecker.sh | ||
| ``` | ||
|
|
||
| ### 2. Configure Environment | ||
|
|
||
| Create `.env` file: | ||
|
|
||
| ```bash | ||
| # Required: GCP Project for Vertex AI (Claude agent) | ||
| ANTHROPIC_VERTEX_PROJECT_ID=<GCP Project ID> | ||
|
|
||
| # Required: StackRox Central API Token | ||
| STACKROX_MCP__CENTRAL__API_TOKEN=<StackRox API Token> | ||
|
|
||
| # Required: OpenAI API Key (for LLM judge) | ||
| OPENAI_API_KEY=<OpenAI API Key> | ||
|
|
||
| # Optional: Vertex AI region (defaults to us-east5) | ||
| CLOUD_ML_REGION=us-east5 | ||
|
|
||
| # Optional: Judge configuration (defaults to OpenAI) | ||
| JUDGE_MODEL_NAME=gpt-5-nano | ||
| ``` | ||
|
|
||
| ## Running Tests | ||
|
|
||
| ```bash | ||
| ./scripts/run-tests.sh | ||
| ``` | ||
|
|
||
| Results are saved to `mcpchecker/mcpchecker-stackrox-mcp-e2e-out.json`. | ||
|
|
||
| ### View Results | ||
|
|
||
| ```bash | ||
| # Summary | ||
| jq '.[] | {taskName, taskPassed}' mcpchecker/mcpchecker-stackrox-mcp-e2e-out.json | ||
|
|
||
| # Tool calls | ||
| jq '[.[] | .callHistory.ToolCalls[]? | {name: .request.Params.name, arguments: .request.Params.arguments}]' mcpchecker/mcpchecker-stackrox-mcp-e2e-out.json | ||
| ``` | ||
|
|
||
| ## Test Cases | ||
|
|
||
| | Test | Description | Tool | | ||
| |------|-------------|------| | ||
| | `list-clusters` | List all clusters | `list_clusters` | | ||
| | `cve-detected-workloads` | CVE detected in deployments | `get_deployments_for_cve` | | ||
| | `cve-detected-clusters` | CVE detected in clusters | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-nonexistent` | Handle non-existent CVE | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-cluster-does-exist` | CVE with cluster filter | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-cluster-does-not-exist` | CVE with cluster filter | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-clusters-general` | General CVE query | `get_clusters_with_orchestrator_cve` | | ||
| | `cve-cluster-list` | CVE across clusters | `get_clusters_with_orchestrator_cve` | | ||
|
|
||
| ## Configuration | ||
|
|
||
| - **`mcpchecker/eval.yaml`**: Main test configuration, agent settings, assertions | ||
| - **`mcpchecker/mcp-config.yaml`**: MCP server configuration | ||
| - **`mcpchecker/tasks/*.yaml`**: Individual test task definitions | ||
|
|
||
| ## How It Works | ||
|
|
||
| mcpchecker uses a proxy architecture to intercept MCP tool calls: | ||
|
|
||
| 1. AI agent receives task prompt | ||
| 2. Agent calls MCP tool | ||
| 3. mcpchecker proxy intercepts and records the call | ||
| 4. Call forwarded to StackRox MCP server | ||
| 5. Server executes and returns result | ||
| 6. mcpchecker validates assertions and response quality | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| **Tests fail - no tools called** | ||
| - Verify StackRox Central is accessible | ||
| - Check API token permissions | ||
|
|
||
| **Build errors** | ||
| ```bash | ||
| go mod tidy | ||
janisz marked this conversation as resolved.
Show resolved
Hide resolved
janisz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ./scripts/build-mcpchecker.sh | ||
| ``` | ||
|
|
||
| ## Further Reading | ||
|
|
||
| - [mcpchecker Documentation](https://github.com/mcpchecker/mcpchecker) | ||
| - [StackRox MCP Server](../README.md) | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,110 @@ | ||
| kind: Eval | ||
| metadata: | ||
| name: "stackrox-mcp-e2e" | ||
| config: | ||
| agent: | ||
| type: "builtin.claude-code" | ||
| model: "claude-sonnet-4-5" | ||
janisz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| llmJudge: | ||
| env: | ||
| baseUrlKey: JUDGE_BASE_URL | ||
| apiKeyKey: JUDGE_API_KEY | ||
| modelNameKey: JUDGE_MODEL_NAME | ||
| mcpConfigFile: mcp-config.yaml | ||
| taskSets: | ||
| # Assertion Fields Explained: | ||
janisz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # - toolsUsed: List of tools that MUST be called at least once | ||
| # - minToolCalls: Minimum TOTAL number of tool calls across ALL tools (not per-tool) | ||
| # - maxToolCalls: Maximum TOTAL number of tool calls across ALL tools (prevents runaway tool usage) | ||
| # Example: If maxToolCalls=3, the agent can make up to 3 tool calls total in the test, | ||
| # regardless of which tools are called. | ||
|
|
||
| # Test 1: List clusters | ||
| - path: tasks/list-clusters.yaml | ||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "list_clusters" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 1 | ||
|
|
||
| # Test 2: CVE detected in workloads | ||
| # Claude does comprehensive CVE checking (orchestrator, deployments, nodes) | ||
| - path: tasks/cve-detected-workloads.yaml | ||
| assertions: | ||
mtodor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "get_deployments_for_cve" | ||
| argumentsMatch: | ||
| cveName: "CVE-2021-31805" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 3 | ||
|
|
||
| # Test 3: CVE detected in clusters - basic | ||
| - path: tasks/cve-detected-clusters.yaml | ||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||
| argumentsMatch: | ||
| cveName: "CVE-2016-1000031" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 3 | ||
mtodor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Test 4: Non-existent CVE | ||
| # Expects 3 calls because "Is CVE detected in my clusters?" triggers comprehensive check | ||
| # (orchestrator, deployments, nodes). The LLM cannot know beforehand if CVE exists. | ||
| - path: tasks/cve-nonexistent.yaml | ||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||
| argumentsMatch: | ||
| cveName: "CVE-2099-00001" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 3 | ||
|
|
||
| # Test 5: CVE with specific cluster filter (does exist) | ||
| # Claude does comprehensive checking even for single cluster (orchestrator, deployments, nodes) | ||
| - path: tasks/cve-cluster-does-exist.yaml | ||
janisz marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "list_clusters" | ||
| - server: stackrox-mcp | ||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||
| argumentsMatch: | ||
| cveName: "CVE-2016-1000031" | ||
| minToolCalls: 2 | ||
| maxToolCalls: 4 | ||
|
|
||
| # Test 6: CVE with specific cluster filter (does not exist) | ||
| - path: tasks/cve-cluster-does-not-exist.yaml | ||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "list_clusters" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 2 | ||
mtodor marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| # Test 7: CVE detected in clusters - general | ||
| - path: tasks/cve-clusters-general.yaml | ||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||
| argumentsMatch: | ||
| cveName: "CVE-2021-31805" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 5 | ||
|
|
||
| # Test 8: CVE check with cluster list reference | ||
| - path: tasks/cve-cluster-list.yaml | ||
| assertions: | ||
| toolsUsed: | ||
| - server: stackrox-mcp | ||
| toolPattern: "get_clusters_with_orchestrator_cve" | ||
| argumentsMatch: | ||
| cveName: "CVE-2024-52577" | ||
| minToolCalls: 1 | ||
| maxToolCalls: 5 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| mcpServers: | ||
| stackrox-mcp: | ||
| command: go | ||
| args: | ||
| - run | ||
| - ../../cmd/stackrox-mcp/... | ||
| - --config | ||
| - ../stackrox-mcp-e2e-config.yaml | ||
| # API token loaded from parent shell environment (.env file) | ||
| # No env section = full environment inheritance | ||
| # Auto-approve all tools | ||
| enableAllTools: true |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-cluster-does-exist" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2016-1000031 detected in cluster name staging-central-cluster?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2016-1000031 is detected in staging-central-cluster" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-cluster-does-not-exist" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2024-52577 detected in cluster name prod-hal-9000?" | ||
| verify: | ||
| contains: "A response indicating that CVE-2024-52577 is not detected in prod-hal-9000 cluster, or that the cluster does not exist" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-cluster-list" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2024-52577 detected in any of my clusters?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2024-52577 is detected in clusters" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-clusters-general" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2021-31805 detected in my clusters?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2021-31805 is detected in any clusters, with specific cluster names if detected, or a statement that no clusters have this CVE." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-detected-clusters" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2016-1000031 detected in my clusters?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2016-1000031 is detected in clusters, nodes, or deployments" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-detected-workloads" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is this CVE-2021-31805 detected in my workloads?" | ||
| verify: | ||
| contains: "A response indicating whether CVE-2021-31805 is detected in workloads/deployments, with specific deployment names if detected, or a statement that no deployments have this CVE." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "cve-nonexistent" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "Is CVE-2099-00001 detected in my clusters?" | ||
| verify: | ||
| contains: "A response indicating that CVE-2099-00001 is not found or not detected in any clusters" |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,9 @@ | ||
| kind: Task | ||
| metadata: | ||
| name: "list-clusters" | ||
| difficulty: easy | ||
| steps: | ||
| prompt: | ||
| inline: "List my clusters" | ||
| verify: | ||
| contains: "A response containing a list of cluster names" |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.