You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR revises the Split Miner implementation to better match the reference miner behavior across filtering, split discovery, join discovery, OR handling, lifecycle-aware SM2 logs, and BPMN reduction.
Key changes include:
Replace the previous split/join handling with an Oracle-style split discovery and RPST/SESE-based join discovery.
Add lifecycle-aware SM2 parsing with overlap-based concurrency detection for complex logs.
Pin SM2 filtering to the reference eta = 1.0 behavior and make SM2 OR handling mandatory.
Add gateway-map based inclusive-join handling, OR-split promotion, compact self-loop export, and optional BPMN gateway collapsing.
Update Split Miner smoke/regression tests to reflect the new reference-aligned behavior.
fit-alessandro-berti
changed the title
BPMN Reduction + Verification of results applied
Align Split Miner gateway handling and SM2 lifecycle behavior with the reference implementation
Jun 15, 2026
Restore or replace coverage for the removed SM2 improper-completion regression. The previous test asserted that A can repeat through a D-free loop in the paper-style example; this branch no longer satisfies that property. If the behavior change is intentional, document it explicitly in the PR and changelog.
Consider compatibility shims for removed modules such as pm4py.algo.discovery.split_miner.heuristics, concurrency.refined, dfg_discovery.refined, and joins.classic, or call out the breaking import changes clearly.
Add focused unit tests for gateway_map.py and rpst_tree.py, especially loop joins, fake entry/exit flows, token generator placement, and non-biconnected graphs.
Add an explicit regression test showing that SM2 ignores user-provided eta and minimize_or_joins by design, so future reviewers do not mistake this for an accidental parameter bug.
Include a short rationale or reference comparison for the large changes to classic filtering, split discovery, and join discovery, because the expected gateway counts changed from the prior tests.
Split Miner: faithful reimplementation of classic & SM 2.0
Classic Split Miner and Split Miner 2.0 are now ports of the reference Java tools (splitminer.jar and sm2.jar/MineWithSMTC), built to reproduce their output exactly rather than the idealized figures in the papers. Validated on the SM-Experiment corpus: classic is byte-identical (isomorphic, same gateway labels) to splitminer.jar on the deterministic real-life logs; SM 2.0 is byte-identical to sm2.jar on 9 of 10 reference logs (the 10th, 2015 BPIC, is non-deterministic in the Java tool itself).
Changed
DFG filtering, concurrency oracle, Oracle split discovery, RPST/SESE join discovery, OR-join replacement and gateway collapse were reworked for byte-faithfulness. Because the target is the tool, not the papers, some hand-computed gateway counts in older tests changed — e.g. on the Augusto et al. (2019) running example the tool yields 2 AND-splits / 4 XOR-splits, not the 1/3 of Fig. 3c. Updated counts were each confirmed against the Java tool.
SM 2.0 now pins eta = 1.0 and always runs its OR handling (replaceIORs = false + OR-split heuristic), matching the reference. The eta and minimize_or_joins arguments are accepted for API symmetry but ignored by variant='sm2' (covered by a dedicated regression test).
Removed (behavior change)
The SM 2.0 improper-completion heuristic (which restructured an A→{B||C}→D→A loop so A could repeat through a D-free branch) was an approximation not present in the reference tool and has been dropped. Verified: Java SM 2.0 closes that loop after D, identical to this port. Its old regression test is replaced by tests/split_miner_2_loop_test.py, which pins the faithful behavior.
Internal modules from the earlier approximate implementation were removed: …split_miner.heuristics, …concurrency.refined, …dfg_discovery.refined, …joins.classic. These were never part of the public API (pm4py.discover_bpmn_split_miner), but since they shipped in 2.7.22.4 each path now raises a descriptive ImportError pointing to the faithful replacement.
Added (tests)
split_miner_2_loop_test.py — SM 2.0 loop handling (faithful, no D-free loop).
split_miner_2_params_test.py — SM 2.0 ignores eta/minimize_or_joins by design.
split_miner_internals_test.py — focused units for rpst_tree (polygon/bond/rigid/rejection) and gateway_map (OR-resolution, loop regions, token-generator placement, inclusive-retention).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR revises the Split Miner implementation to better match the reference miner behavior across filtering, split discovery, join discovery, OR handling, lifecycle-aware SM2 logs, and BPMN reduction.
Key changes include:
eta = 1.0behavior and make SM2 OR handling mandatory.