Skip to content

V2 Crawlers: Configurable Timeout + Command-Line Execution#288

Open
TheAuditorTool wants to merge 2 commits intoOWASP-Benchmark:mainfrom
TheAuditorTool:feat/v2-crawlers-timeout-cli
Open

V2 Crawlers: Configurable Timeout + Command-Line Execution#288
TheAuditorTool wants to merge 2 commits intoOWASP-Benchmark:mainfrom
TheAuditorTool:feat/v2-crawlers-timeout-cli

Conversation

@TheAuditorTool
Copy link
Copy Markdown

V2 Crawlers: Configurable Timeout + Command-Line Execution

Closes #3 (Add Timeout to Crawlers)
Closes #1 (Add new command line crawler and verification crawler)


Overview

This adds three new Java files alongside the existing crawlers. Nothing existing is changed — the current run-crawler and run-verification-crawler Maven goals continue to work exactly as before. The new functionality is available through separate -v2 goals when you're ready to use it.

What's New

Configurable Timeout (Issue #3)

The current crawlers have a hardcoded 15-second timeout. If a test case takes longer (common on slower machines), the request is dropped.

The v2 crawlers add a -T flag that lets you set the timeout yourself:

Usage What Happens
No -T flag (the default) No timeout at all — the crawler waits as long as the test case needs
-T 300 Each request gets up to 5 minutes before it's dropped
-T 60 Each request gets up to 1 minute

When a request does time out, the crawler logs it and moves on to the next test case, just as Jonathon described in the issue.

Command-Line Test Case Execution (Issue #1)

The existing crawlers work by sending HTTP requests to web endpoints. The v2 crawlers can also run test cases as command-line programs — useful for benchmarking non-web applications like Python scripts or CLI tools.

To use this, test cases in the XML file use tcType="CLI" instead of SERVLET, SPRINGWS, or JERSEYWS:

<benchmarkTest tcType="CLI" tcName="PythonTest00001"
    tcCommand="python3" tcCommandArgs="test001.py"
    tcCategory="sqli" tcVulnerable="true"
    tcAttackSuccess="SQL injection detected" ...>
    <formparam name="input" value="' OR 1=1 --" safeValue="hello"/>
</benchmarkTest>

The crawler runs the command as a subprocess, captures its output, and uses the same attack/safe verification logic that the HTTP crawler uses. The formparam elements still switch between attack and safe values — they're passed to the command as --name value arguments.

New Files

All three are in plugin/src/main/java/org/owasp/benchmarkutils/tools/:

File What It Does
CommandLineTestCaseRequest.java Represents a CLI test case. Extends AbstractTestCaseRequest so it fits into the existing test suite loading and sorting infrastructure.
BenchmarkCrawler_newv2.java The basic crawler with timeout support and CLI execution. Extends BenchmarkCrawler. Maven goal: run-crawler-v2.
BenchmarkCrawlerVerification_newv2.java The verification crawler — sends both attack and safe requests, checks results. Extends BenchmarkCrawler_newv2. Maven goal: run-verification-crawler-v2.

How to Try It

Once installed (mvn install), you can use the new goals in place of the existing ones:

# Basic crawl, no timeout (same behavior as before, minus the 15s cap)
mvn org.owasp:benchmarkutils-maven-plugin:run-crawler-v2 \
    -DcrawlerFile=data/benchmark-crawler-http.xml

# Basic crawl with a 5-minute timeout per request
mvn org.owasp:benchmarkutils-maven-plugin:run-crawler-v2 \
    -DcrawlerFile=data/benchmark-crawler-http.xml -T 300

# Verification crawl with timeout
mvn org.owasp:benchmarkutils-maven-plugin:run-verification-crawler-v2 \
    -DcrawlerFile=data/benchmark-attack-http.xml -T 300

All existing flags (-f, -n, -t) still work. The -T flag is the only addition.

What's Not Changed

  • BenchmarkCrawler.java — untouched
  • BenchmarkCrawlerVerification.java — untouched
  • AbstractTestCaseRequest.java — untouched
  • All other existing files — untouched
  • Existing XML test suite files — fully compatible with the v2 crawlers

Design Notes

  • The v2 crawlers separate connect timeout (always 30 seconds) from response timeout (the -T flag). This means a server that's down still fails fast, while a slow response can take as long as needed.
  • CLI test cases reuse the existing ResponseInfo class: stdout goes into responseString, exit code goes into statusCode, and elapsed wall-clock time goes into timeInSeconds.
  • The verification crawler uses RegressionTesting.verifyResponse() for CLI test cases. This is the same string-matching logic used for HTTP — checking whether the attack success indicator appears in the output.

…port

Closes OWASP-Benchmark#3 (Add Timeout to Crawlers)
Closes OWASP-Benchmark#1 (Add new command line crawler and verification crawler)

New files (zero changes to existing code):
- CommandLineTestCaseRequest.java: CLI test case support (tcType="CLI")
- BenchmarkCrawler_newv2.java: configurable -T/--timeout, CLI dispatch
- BenchmarkCrawlerVerification_newv2.java: verification with timeout + CLI
@TheAuditorTool TheAuditorTool force-pushed the feat/v2-crawlers-timeout-cli branch from 6c7eeed to fc3f61b Compare April 13, 2026 17:28
@davewichers
Copy link
Copy Markdown
Contributor

@dandersonaspect - @TheAuditorTool stepped up and created a command line version of the verification crawler for Benchmark. Can you review/test to see if this seems to do the job we need for running command line verification?

Copy link
Copy Markdown
Contributor

@darkspirit510 darkspirit510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code itself looks good. I don't understand why this is a separate class.

@davewichers can't run https://github.com/OWASP-Benchmark/BenchmarkJava/blob/master/VMs/runDockerImage.sh on my Mac, so I can't verify this code actually works.

* <p>Existing HTTP test suites work identically — this is a drop-in replacement.
*/
@Mojo(name = "run-crawler-v2", requiresProject = false, defaultPhase = LifecyclePhase.COMPILE)
public class BenchmarkCrawler_newv2 extends BenchmarkCrawler {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a weird class name 🤪

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats on me... I guess i was to afraid to directly replace @davewichers previous crawlers and the intention at the time was to create _newv2 so he could test it against his old versions before deciding if he liked/wanted them.
I see in hindsight how it was probably bit to cautious.
Happy to merge the v2 changes into the existing BenchmarkCrawler and BenchmarkCrawlerVerification classes if you and Dave prefer that.

* <p>Existing HTTP test suites work identically — this is a drop-in replacement.
*/
@Mojo(name = "run-crawler-v2", requiresProject = false, defaultPhase = LifecyclePhase.COMPILE)
public class BenchmarkCrawler_newv2 extends BenchmarkCrawler {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I never had a look into BenchmarkCrawler.java before. Why is this a separate class?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This became an artifact of my above choose.
It's a separate class on purpose — it registers as a different Maven goal (run-crawler-v2 vs run-crawler). The existing BenchmarkCrawler and BenchmarkCrawlerVerification are untouched.
Happy to make both official on your and daves approval.

HttpHost httpHost = null;
String pHost = System.getProperty("proxyHost");
String pPort = System.getProperty("proxyPort");
if (pHost != null && pPort != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a fan of DRY. Can you refactor this code so it does the first three commands (until connection manager) and then put the setProxy in the if?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I'll have it fixed. Thank you.

if (selectedTestCaseName != null) {
for (AbstractTestCaseRequest request : this.testSuite.getTestCases()) {
if (request.getName().equals(selectedTestCaseName)) {
java.util.List<AbstractTestCaseRequest> requests =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this require full qualified class name?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not. My mistake of using inline instead of seeing the list. I'll have it fixed.

JAXBContextFactory.createContext(
new Class[] {TestSuite.class, CommandLineTestCaseRequest.class}, null);
Unmarshaller unmarshaller = context.createUnmarshaller();
unmarshaller.setEventHandler(new javax.xml.bind.helpers.DefaultValidationEventHandler());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the same full qualified name. Can't you just import the Handler?

Copy link
Copy Markdown
Author

@TheAuditorTool TheAuditorTool Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was following the pattern from BenchmarkCrawler.java for consistency, but you're right that importing it is cleaner. I'll have it corrected.

if (thisInstance == null) thisInstance = this;

if (null == this.crawlerFile) {
System.out.println("ERROR: A crawlerFile parameter must be specified.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this will end the application in error state, maybe return a non-zero exit code?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I'll have it corrected.

@davewichers
Copy link
Copy Markdown
Contributor

Code itself looks good. I don't understand why this is a separate class.

@davewichers can't run https://github.com/OWASP-Benchmark/BenchmarkJava/blob/master/VMs/runDockerImage.sh on my Mac, so I can't verify this code actually works.

Can you try @TheAuditorTool pull request which fixes/upgrades the Docker Image: OWASP-Benchmark/BenchmarkJava#445 and see if that works for you? If it does, I can merge it in. Let me know if this fixes your issue.

PR OWASP-Benchmark#288 review raised 7 items. 2 were design questions answered in
comments (class name rationale, why separate class). The remaining 5
are code fixes in this commit.

THE PROBLEM:
BenchmarkCrawler_newv2 had three categories of issue:
1. Inline FQNs (java.util.List, java.util.ArrayList,
   javax.xml.bind.helpers.DefaultValidationEventHandler) instead of
   imports — the DefaultValidationEventHandler pattern was copied from
   Utils.java:183 which uses the same inline FQN, but reviewer wants
   imports in new code
2. Duplicated HttpClients.custom() builder chain across proxy/no-proxy
   branches — inherited from parent BenchmarkCrawler.java:204-228
   which has the same duplication
3. execute() prints error on null crawlerFile but returns silently
   with exit code 0 — same pattern as parent BenchmarkCrawler.java:350
   but reviewer wants explicit failure

THE SOLUTION:
- Add imports for ArrayList, DefaultValidationEventHandler,
  HttpClientBuilder and replace all inline FQNs with short names
- Extract HttpClientBuilder as a local variable, build once, then
  conditionally call .setProxy() — eliminates the duplicated 4-line
  builder chain
- Add System.exit(-1) after the error println in both execute()
  methods (crawler + verification crawler)

KEY DECISIONS:
- System.exit(-1) not System.exit(1): codebase has 19 uses of
  System.exit(-1) for errors, zero uses of System.exit(1)
- System.exit(-1) not throw MojoExecutionException: MojoExecutionException
  is declared on every execute() but never thrown anywhere in the
  codebase (0 occurrences). Introducing a new error pattern is scope creep.
- HttpClientBuilder (explicit type) not var: Java 11 target supports var
  but 0 usages exist in codebase. Legacy explicit-type style throughout.
- Spotless (google-java-format 1.17.0 AOSP, pom.xml:222-314) will
  normalize import order on mvn compile. Manual placement is correct
  but academic — formatter is the safety net.

VERIFICATION:
  mvn compile  (spotless-apply runs at compile phase)
  mvn test     (267+ tests, 0 failures expected)
  git diff --stat shows 2 files, +16/-15 lines
@TheAuditorTool
Copy link
Copy Markdown
Author

Addressed code review items:

  • Replaced inline FQNs with proper imports (ArrayList, DefaultValidationEventHandler)
  • Refactored proxy setup to build once, conditionally add .setProxy()
  • Added System.exit(-1) on missing crawlerFile in both execute() methods
  • Fully qualified class name for DefaultValidationEventHandler was copied from Utils.java:183 which uses the same pattern...fixed in our code per your feedback

The class name (_newv2 suffix) and separate class design are intentional. I didn't want to directly replace the existing crawlers. Happy to merge/replace the existing ones but please review and approve first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Timeout to Crawlers Create new command line crawler and verification crawler

3 participants