-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Prepare dataflow for local annotations #21138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
45e4b92 to
1242a7b
Compare
897bbbc to
b8c0ed3
Compare
e3579c9 to
734702a
Compare
Removes the dependence on the (global) `ModuleVariableNode.getARead()`, by adding a local version (that doesn't include `import *` reads) instead.
This may result in more nodes, but it should still be bounded by the number of global variables in the source code.
With `ModuleVariableNode`s now appearing for _all_ global variables (not just the ones that actually seem to be used), some of the tests changed a bit. Mostly this was in the form of new flow (because of new nodes that popped into existence). For some inline expectation tests, I opted to instead exclude these results, as there was no suitable location to annotate. For the normal tests, I just accepted the output (after having vetted it carefully, of course).
Explicitly adds a bunch of nodes that were previously (using a global analysis) identified as `ExtractedArgumentNode`s. These are then subsequently filtered out in `argumentOf` (which is global) by putting the call to `getCallArg` there instead of in the charpred.
Fixes the test failures that arose from making `ExtractedArgumentNode` local. For the consistency checks, we now explicitly exclude the `ExtractedArgumentNode`s (now much more plentiful due to the overapproximation) that don't have a corresponding `getCallArg` tuple. For various queries/tests using `instanceof ArgumentNode`, we instead us `isArgumentNode`, which explicitly filters out the ones for which `isArgumentOf` doesn't hold (which, again, is the case for most of the nodes in the overapproximation).
Uses the same trick as for `ExtractedArgumentNode`, wherein we postpone the global restriction on the charpred to instead be in the `argumentOf` predicate (which is global anyway). In addition to this, we also converted `CapturedVariablesArgumentNode` into a proper synthetic node, and added an explicit post-update node for it. These nodes just act as wrappers for the function part of call nodes. Thus, to make them work with the variable capture machinery, we simply map them to the closure node for the corresponding control-flow or post-update node.
As we now have many more capturing closure arguments, we must once again exclude the ones that don't actually have `argumentOf` defined.
New nodes means new results. Luckily we rarely have a test that selects _all_ dataflow nodes.
3b632d9 to
958c798
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the Python dataflow library to prepare for overlay[local] annotations. The changes are primarily internal and should not have any visible effects on query results, except for test outputs which now include additional intermediate nodes.
Changes:
- Expanded
ModuleVariableNodecreation to all global variables (previously only escaping variables) - Refactored captured variable argument nodes into separate synthetic nodes and ArgumentNode wrappers
- Distinguished local reads from import star reads via new
getALocalRead()method
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPublic.qll |
Removed restrictions on ModuleVariableNode creation and added getALocalRead() method; refactored ExtractedArgumentNode with approximation logic |
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowDispatch.qll |
Split CapturedVariablesArgumentNode into SynthCapturedVariablesArgumentNode and CapturedVariablesArgumentNodeAsArgumentNode to avoid non-monotonic recursion |
python/ql/lib/semmle/python/dataflow/new/internal/DataFlowPrivate.qll |
Added new synthetic node types to nodeIsHidden predicate |
python/ql/lib/semmle/python/dataflow/new/internal/LocalSources.qll |
Changed from getARead() to getALocalRead() to exclude import star reads |
python/ql/lib/semmle/python/dataflow/new/internal/VariableCapture.qll |
Added asClosureNode mappings for new synthetic argument nodes |
python/ql/src/Classes/InitCallsSubclass/InitCallsSubclassMethod.ql |
Replaced instanceof ArgumentNode check with isArgumentNode predicate |
python/ql/lib/utils/test/dataflow/callGraphConfig.qll |
Replaced instanceof ArgumentNode check with isArgumentNode predicate |
python/ql/lib/utils/test/dataflow/MaximalFlowTest.qll |
Added ModuleVariableNode filtering and replaced instanceof ArgumentNode with isArgumentNode |
python/ql/consistency-queries/DataFlowConsistency.ql |
Added missingArgumentCallExclude for overapproximated argument nodes |
python/ql/test/library-tests/frameworks/internal-ql-helpers/PoorMansFunctionResolutionTest.ql |
Excluded ModuleVariableNodes from test results |
python/ql/test/library-tests/dataflow/global-flow/test.py |
Updated comments and added $writes annotations for newly tracked global variables |
| Test expected outputs | Updated to reflect new ModuleVariableNode entries and additional flow paths |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This PR contains most of the changes that are needed in order to get all of the
overlay[local]annotations to compile.Reviewing on a commit-by-commit basis is highly recommended.
No change note required as these changes should not have any visible effect.