Implement Code Block Failure Detection (Issue #5) and Extra CWE Detector (Issue #6)#289
Open
TheAuditorTool wants to merge 1 commit intoOWASP-Benchmark:mainfrom
Conversation
…extra CWE detector (OWASP-Benchmark#6) Issue OWASP-Benchmark#5 - CalculateToolCodeBlocksSupport: - Add single-unknown isolation (Pass 2): for each FN, if exactly 1 snippet is unsupported, isolate it as the root cause - Add FP isolation (Stage 2): identify safe snippets that tools fail to recognize, with sanity check for FPs where all snippets are supported - Add scorecard directory support: -r now accepts a directory to process all tool scorecards automatically - Track actual test case names per snippet for bidirectional mapping Issue OWASP-Benchmark#6 - DetectExtraCWEs (new): - Detect CWE findings outside expected test cases using existing Reader parsers - Normal mode: known CWEs in wrong test cases (e.g. CWE-89 in a hash test) - Hard mode: any CWE not in the benchmark's expected set
be7426e to
f813694
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR: Implement Code Block Failure Detection (Issue #5) and Extra CWE Detector (Issue #6)
Summary
This PR completes the
CalculateToolCodeBlocksSupporttool (Issue #5) and adds a newDetectExtraCWEstool (Issue #6). Both are standalone Java tools in thetools/package that analyze which code constructs cause security tools to fail on OWASP Benchmark test suites.What Changed
1.
CodeBlockSupportResults.java(modified)Added fields for Issue #5's isolation analysis:
Set<String> fnTestCases-- tracks which FN test cases use this snippetSet<String> isolatedFnCause-- FNs where THIS snippet is the single unsupported one (isolated root cause)Set<String> isolatedFpCause-- FPs where THIS snippet is the single safe snippet the tool fails to recognizetoIsolationString()-- formatted output for isolated root cause reportingThese track actual test case names (not just counts), enabling bidirectional lookup: given a snippet, find all test cases it causes to fail; given a test case, find which snippet is the likely root cause.
2.
CalculateToolCodeBlocksSupport.java(modified)Three additions to Dave's existing implementation:
A. Single-unknown isolation (Issue #5 Pass 2)
After Pass 1 marks supported snippets from TPs, a new pass iterates each FN test case and counts how many of its snippets are NOT supported:
isolatedFnCauseon the snippet. These are the most actionable findings.B. FP isolation (Issue #5 Stage 2)
For each FP test case (tool flags a safe test case as vulnerable), identifies which snippet(s) make the test case safe using the
truePositivemetadata from the.xmlfiles:C. Scorecard directory support
The
-rparameter now accepts either a single CSV file (original behavior) or a directory path. When given a directory, it discovers all*Scorecard_for_*.csvfiles and processes each tool sequentially. Each tool gets its own analysis section in the output.3.
DetectExtraCWEs.java(new)Implements Issue #6. A standalone Java tool that detects CWE findings outside expected test cases.
How it works:
{test_number -> (name, category, cwe)}Reader.allReaders()parsers (57 parsers covering FindBugs, PMD, ZAP, Semgrep, Checkmarx, Fortify, etc.)Two modes:
Known limitation: Existing Reader parsers filter findings at the parser level -- most parsers only retain findings in
BenchmarkTest*files and discard findings in helper classes (e.g.,DatabaseHelper.java). Detecting extra CWEs in non-test-case infrastructure files would require extending individual parsers, which is a separate effort.Usage
Issue #5: Code Block Analysis
Issue #6: Extra CWE Detector
java -cp target/classes:... org.owasp.benchmarkutils.tools.DetectExtraCWEs \ -e expectedresults-1.2.csv \ -r results/ \ -m both # Modes: normal, hard, both (default: both)Output Examples
Issue #5 Output (per tool)
Issue #6 Output (per tool)
Design Decisions
Kept Dave's existing analysis passes intact. The new Pass 2 and Stage 2 are additive -- they run after the original analysis and produce additional output sections.
Reused existing Reader parsers for Issue Add extra CWEs found detector #6 rather than writing new XML parsers. This means all 57 tool formats are supported automatically. The tradeoff is that findings in non-test-case files are not captured (parsers filter them).
Single-unknown isolation produces the most actionable results. When a tool misses a vulnerability and exactly one snippet in that test case is unsupported, that snippet is almost certainly the cause. This is the core insight from Dave's Issue Create Tool for Detecting which codeblocks are causing tools to fail #5 spec.
Combination failures are reported separately. When all snippets are individually supported but the tool still fails, the specific combination is the problem -- not any individual snippet. These need manual investigation.