-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Java: Add RegexExecution concept and recognise @Pattern annotation as sanitizer
#21310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a shared “RegexExecution” concept for Java CodeQL libraries and migrates existing regex-related models/queries to use it, aiming to unify regex execution modeling across languages and reduce false positives.
Changes:
- Added
semmle.code.java.ConceptswithRegexExecution/RegexExecutionExprconcepts and wired it intojava.qll. - Extended existing Java regex API models (
String.matches,Pattern.compile,Pattern.matches,Matcher.matches) to implement the new concept. - Updated experimental and security libraries to use the new concept types instead of bespoke regex helpers.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| java/ql/src/experimental/Security/CWE/CWE-625/Regex.qll | Removes the (deprecated) local regex helper module. |
| java/ql/src/experimental/Security/CWE/CWE-625/PermissiveDotRegexQuery.qll | Switches to framework regex models/concepts for sinks and matching logic. |
| java/ql/lib/semmle/code/java/security/regexp/RegexInjection.qll | Updates sanitizer modeling to use the new PatternCompileCall abstraction. |
| java/ql/lib/semmle/code/java/security/Sanitizers.qll | Reworks regexp-guard matching to use RegexExecutionExpr::Range. |
| java/ql/lib/semmle/code/java/security/PathSanitizer.qll | Updates directory-character matching guard logic to use the new regex execution concept. |
| java/ql/lib/semmle/code/java/frameworks/Regex.qll | Adds call/model classes (e.g., PatternCompileCall, PatternMatchesCall, MatcherMatchesCall) implementing RegexExecutionExpr::Range. |
| java/ql/lib/semmle/code/java/JDK.qll | Updates StringMatchesCall to implement RegexExecutionExpr::Range. |
| java/ql/lib/semmle/code/java/Concepts.qll | New Java Concepts library including RegexExecution and related modeling modules. |
| java/ql/lib/java.qll | Exports the new Concepts library by importing it. |
Comments suppressed due to low confidence (1)
java/ql/lib/semmle/code/java/frameworks/Regex.qll:96
- Doc comment punctuation: this comment is missing a trailing period, unlike most other doc comments in this file.
/** A call to the `matches` method of `java.util.regex.Matcher` */
class MatcherMatchesCall extends MethodCall, RegexExecutionExpr::Range {
@Pattern annotation as sanitizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.
e571aef to
205a058
Compare
205a058 to
5bdf550
Compare
I expect some reduction in FPs due to the final commit expanded the scope of some sanitizers to include more regex execution methods. I plan to run QA to quantify that.
I welcome discussion about how to best deal with the fact that I want the concept to be the same as other languages, which use data flow nodes, but actually java mostly works at the levels of expressions. I've kind of done both for now. I think as we spread the use of shared libraries this difference in approach will eventually go away, or at least become easier to fix.