Skip to content

Align Split Miner gateway handling and SM2 lifecycle behavior with the reference implementation#557

Open
cRennert wants to merge 4 commits into
process-intelligence-solutions:releasefrom
cRennert:release
Open

Align Split Miner gateway handling and SM2 lifecycle behavior with the reference implementation#557
cRennert wants to merge 4 commits into
process-intelligence-solutions:releasefrom
cRennert:release

Conversation

@cRennert

@cRennert cRennert commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This PR revises the Split Miner implementation to better match the reference miner behavior across filtering, split discovery, join discovery, OR handling, lifecycle-aware SM2 logs, and BPMN reduction.

Key changes include:

  • Replace the previous split/join handling with an Oracle-style split discovery and RPST/SESE-based join discovery.
  • Add lifecycle-aware SM2 parsing with overlap-based concurrency detection for complex logs.
  • Pin SM2 filtering to the reference eta = 1.0 behavior and make SM2 OR handling mandatory.
  • Add gateway-map based inclusive-join handling, OR-split promotion, compact self-loop export, and optional BPMN gateway collapsing.
  • Update Split Miner smoke/regression tests to reflect the new reference-aligned behavior.

@fit-alessandro-berti fit-alessandro-berti changed the title BPMN Reduction + Verification of results applied Align Split Miner gateway handling and SM2 lifecycle behavior with the reference implementation Jun 15, 2026
@fit-alessandro-berti

Copy link
Copy Markdown
Contributor

Dear @cRennert

Thanks for the contribution.

Can you double check on the following comments?

Possible Revisions Before Merge

  • Restore or replace coverage for the removed SM2 improper-completion regression. The previous test asserted that A can repeat through a D-free loop in the paper-style example; this branch no longer satisfies that property. If the behavior change is intentional, document it explicitly in the PR and changelog.
  • Consider compatibility shims for removed modules such as pm4py.algo.discovery.split_miner.heuristics, concurrency.refined, dfg_discovery.refined, and joins.classic, or call out the breaking import changes clearly.
  • Add focused unit tests for gateway_map.py and rpst_tree.py, especially loop joins, fake entry/exit flows, token generator placement, and non-biconnected graphs.
  • Add an explicit regression test showing that SM2 ignores user-provided eta and minimize_or_joins by design, so future reviewers do not mistake this for an accidental parameter bug.
  • Include a short rationale or reference comparison for the large changes to classic filtering, split discovery, and join discovery, because the expected gateway counts changed from the prior tests.

@cRennert

cRennert commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Possible revisions were addressed in d59dc67, 76eba37, and 40b3cf4 as follows:

Split Miner: faithful reimplementation of classic & SM 2.0
Classic Split Miner and Split Miner 2.0 are now ports of the reference Java tools (splitminer.jar and sm2.jar/MineWithSMTC), built to reproduce their output exactly rather than the idealized figures in the papers. Validated on the SM-Experiment corpus: classic is byte-identical (isomorphic, same gateway labels) to splitminer.jar on the deterministic real-life logs; SM 2.0 is byte-identical to sm2.jar on 9 of 10 reference logs (the 10th, 2015 BPIC, is non-deterministic in the Java tool itself).

Changed

  • DFG filtering, concurrency oracle, Oracle split discovery, RPST/SESE join discovery, OR-join replacement and gateway collapse were reworked for byte-faithfulness. Because the target is the tool, not the papers, some hand-computed gateway counts in older tests changed — e.g. on the Augusto et al. (2019) running example the tool yields 2 AND-splits / 4 XOR-splits, not the 1/3 of Fig. 3c. Updated counts were each confirmed against the Java tool.
  • SM 2.0 now pins eta = 1.0 and always runs its OR handling (replaceIORs = false + OR-split heuristic), matching the reference. The eta and minimize_or_joins arguments are accepted for API symmetry but ignored by variant='sm2' (covered by a dedicated regression test).

Removed (behavior change)

  • The SM 2.0 improper-completion heuristic (which restructured an A→{B||C}→D→A loop so A could repeat through a D-free branch) was an approximation not present in the reference tool and has been dropped. Verified: Java SM 2.0 closes that loop after D, identical to this port. Its old regression test is replaced by tests/split_miner_2_loop_test.py, which pins the faithful behavior.
  • Internal modules from the earlier approximate implementation were removed: …split_miner.heuristics, …concurrency.refined, …dfg_discovery.refined, …joins.classic. These were never part of the public API (pm4py.discover_bpmn_split_miner), but since they shipped in 2.7.22.4 each path now raises a descriptive ImportError pointing to the faithful replacement.

Added (tests)

  • split_miner_2_loop_test.py — SM 2.0 loop handling (faithful, no D-free loop).
  • split_miner_2_params_test.py — SM 2.0 ignores eta/minimize_or_joins by design.
  • split_miner_internals_test.py — focused units for rpst_tree (polygon/bond/rigid/rejection) and gateway_map (OR-resolution, loop regions, token-generator placement, inclusive-retention).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants