Skip to content

Commit a2cc794

Browse files
sawenzelclaude
andcommitted
AODBcRewriter: restore BC sort in Stage 0, keep track regrouping
Commit 28b44ef replaced the Stage 0 BC sort with an order-preserving dedup that std::abort()s unless the input BC table is already globalBC-sorted. That contradicts the tool's purpose (PURPOSE (a): repairing non-monotonic fGlobalBC in merged AO2Ds), so it aborted on exactly the files it exists to fix -- the "doesn't run to completion" regression. Revert only that Stage 0 change (restore stage0_sortBCs, drop the abort). Keep 28b44ef's Stage 1b track regrouping and the fIndexTracks* remaps: that mechanism is the correct fix for the -1-group split from b11cd3d (track fIndexCollisions values were remapped but rows left in input order, so per-collision groups -- including the -1 ambiguous group -- were no longer contiguous, tripping ArrowTableSlicingCache::validateOrder downstream). It only ever failed because the aborting Stage 0 stopped it running. Sorting BCs implies a BC -> collisions (Stage 1) -> tracks (Stage 1b) reorder cascade, which those stages now handle completely. Add a validator check (checkCollisionGroupContiguity) mirroring O2's slicing invariant: every fIndexCollisions group must be one contiguous run. Added a history note in Stage 0 against re-introducing the order-preserving abort. Not yet validated against a real merged AO2D / analysis task. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1 parent cdb0b6e commit a2cc794

2 files changed

Lines changed: 178 additions & 40 deletions

File tree

MC/utils/AODBcRewriter.C

Lines changed: 86 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,8 @@
1414
//
1515
// This tool fixes all three problems in one pass per DF_ directory:
1616
//
17-
// Stage 0 — Deduplicate the BC table in place (order-preserving; the input
18-
// must already be globalBC-sorted). Build BC permutation map:
19-
// bcPerm[oldBCrow] = newBCrow (monotonic non-decreasing).
17+
// Stage 0 — Sort & deduplicate the BC table. Build BC permutation map:
18+
// bcPerm[oldBCrow] = newBCrow.
2019
//
2120
// Stage 1 — Process every table that carries fIndexBCs / fIndexBC.
2221
// Remap the index via bcPerm, sort rows by the new index, and
@@ -502,31 +501,33 @@ static PermMap rewriteTable(TTree *src, TDirectory *dirOut,
502501
}
503502

504503
// ============================================================================
505-
// SECTION 4 — Stage 0: BC table deduplication (order-preserving)
504+
// SECTION 4 — Stage 0: BC table sort + deduplication
506505
// ============================================================================
507506
//
508-
// Reads fGlobalBC from the BC tree, drops exact-duplicate BC values IN PLACE
509-
// (preserving input row order), and writes the compacted table. Returns
510-
// bcPerm[oldRow] = newRow.
507+
// Reads fGlobalBC from the BC tree, sorts rows, drops exact-duplicate BC
508+
// values, and writes the compacted table. Returns bcPerm[oldRow] = newRow.
511509
//
512-
// The dedup is deliberately order-preserving so that bcPerm is monotonic
513-
// non-decreasing. This matters because every BC-indexed table (collisions,
514-
// FT0/FV0/FDD/Zdc, ...) is sliced per BC and must stay sorted by its fIndexBCs,
515-
// and collisions are in turn the grouping anchor for tracks (sorted by
516-
// fIndexCollisions). A non-order-preserving BC remap would force a full reorder
517-
// cascade BC -> collisions -> tracks to keep all those groupings valid; keeping
518-
// bcPerm monotonic means none of those tables need to be reordered at all.
510+
// IMPORTANT (history — do not "optimize" this back):
511+
// A previous revision replaced this sort with an order-preserving in-place
512+
// dedup that std::abort()ed unless the input BC table was already globalBC-
513+
// sorted. That directly contradicts this tool's PURPOSE (a) at the top of the
514+
// file — repairing *non-monotonic* fGlobalBC in MERGED AO2Ds — so it aborted on
515+
// exactly the files it exists to fix ("doesn't run to completion"). We sort
516+
// unconditionally instead.
519517
//
520-
// This REQUIRES the input BC table to already be sorted by fGlobalBC (the
521-
// standard AO2D invariant; also asserted on the output by validateDF check #1).
522-
// We assert it loudly rather than silently emit a non-monotonic BC table.
518+
// Sorting BCs does imply a reorder cascade: collisions are sorted by their
519+
// remapped fIndexBCs in Stage 1, and the collision-grouped track tables must
520+
// then be re-grouped to follow the new collision order in Stage 1b. That
521+
// cascade is real and unavoidable for non-monotonic input; it is handled
522+
// explicitly and completely by those stages. This is the correct fix — keep
523+
// it. An "assert already sorted" shortcut here is a known dead end.
523524

524525
struct BCStage0Result {
525-
PermMap bcPerm; // bcPerm[oldRow] = newRow in the deduped BC table
526+
PermMap bcPerm; // bcPerm[oldRow] = newRow in sorted/deduped BC table
526527
Long64_t nUnique = 0;
527528
};
528529

529-
static BCStage0Result stage0_dedupBCs(TTree *treeBCs, TDirectory *dirOut) {
530+
static BCStage0Result stage0_sortBCs(TTree *treeBCs, TDirectory *dirOut) {
530531
BCStage0Result res;
531532
Long64_t n = treeBCs->GetEntries();
532533
if (n == 0) return res;
@@ -539,28 +540,20 @@ static BCStage0Result stage0_dedupBCs(TTree *treeBCs, TDirectory *dirOut) {
539540
std::vector<ULong64_t> gbcs(n);
540541
for (Long64_t i = 0; i < n; ++i) { treeBCs->GetEntry(i); gbcs[i] = gbc; }
541542

542-
// The BC table must already be sorted by fGlobalBC: the dedup below is
543-
// order-preserving (merges only adjacent equal-globalBC rows), which keeps
544-
// bcPerm monotonic and avoids a reorder cascade through collisions/tracks.
545-
// A non-monotonic input would silently break that guarantee, so abort loudly.
546-
for (Long64_t i = 1; i < n; ++i) {
547-
if (gbcs[i] < gbcs[i - 1]) {
548-
std::cerr << "FATAL: O2bc_* table is not sorted by fGlobalBC (row " << i
549-
<< " globalBC=" << gbcs[i] << " < row " << (i - 1)
550-
<< " globalBC=" << gbcs[i - 1] << ").\n"
551-
<< " AODBcRewriter requires a globalBC-sorted BC table so that\n"
552-
<< " BC deduplication is order-preserving; aborting.\n";
553-
std::abort();
554-
}
555-
}
543+
// Sort row indices by fGlobalBC (stable, so equal-globalBC rows keep their
544+
// input order before being collapsed below).
545+
std::vector<Long64_t> order(n);
546+
std::iota(order.begin(), order.end(), 0);
547+
std::stable_sort(order.begin(), order.end(),
548+
[&](Long64_t a, Long64_t b){ return gbcs[a] < gbcs[b]; });
556549

557-
// Build the deduplicated row list and the (monotonic) permutation in source
558-
// row order: adjacent rows sharing a globalBC collapse onto one output row.
550+
// Build the deduplicated row list and the permutation: rows sharing a
551+
// globalBC collapse onto one output row.
559552
res.bcPerm.assign(n, -1);
560553
std::vector<Long64_t> rowOrder; // source rows to keep, in output order
561554
ULong64_t prev = 0;
562555
Int_t newRow = -1;
563-
for (Long64_t srcRow = 0; srcRow < n; ++srcRow) {
556+
for (Long64_t srcRow : order) {
564557
if (newRow < 0 || gbcs[srcRow] != prev) {
565558
++newRow;
566559
prev = gbcs[srcRow];
@@ -571,7 +564,7 @@ static BCStage0Result stage0_dedupBCs(TTree *treeBCs, TDirectory *dirOut) {
571564
}
572565
res.nUnique = rowOrder.size();
573566

574-
std::cout << " BC stage: " << n << " rows -> " << res.nUnique << " unique (in-place dedup)\n";
567+
std::cout << " BC stage: " << n << " rows -> " << res.nUnique << " unique (sorted)\n";
575568

576569
// Write the BC table (no index remapping needed for the table itself)
577570
rewriteTable(treeBCs, dirOut, rowOrder, /*indexBranch=*/"", /*parentPerm=*/{});
@@ -1159,10 +1152,10 @@ static void processDF(TDirectory *dirIn, TDirectory *dirOut) {
11591152
return;
11601153
}
11611154

1162-
// ---- Stage 0: deduplicate BCs (order-preserving) ----
1155+
// ---- Stage 0: sort & deduplicate BCs ----
11631156
std::cout << "-- Stage 0: BCs --\n";
11641157
dirOut->cd();
1165-
BCStage0Result s0 = stage0_dedupBCs(treeBCs, dirOut);
1158+
BCStage0Result s0 = stage0_sortBCs(treeBCs, dirOut);
11661159
if (treeFlags) stage0_copyBCFlags(treeFlags, dirOut, s0.bcPerm);
11671160

11681161
// Track which tree names have been written so we don't double-write
@@ -1319,6 +1312,37 @@ static Long64_t checkIndexRange(TTree *t, const char *branchName,
13191312
return bad;
13201313
}
13211314

1315+
// Verify the collision-grouping invariant that O2's ArrowTableSlicingCache
1316+
// (validateOrder) enforces on the consumer side: in a table grouped by
1317+
// fIndexCollisions, every distinct index value — including the -1 "ambiguous"
1318+
// group — must occupy a single contiguous run of rows. A split group is
1319+
// exactly the failure that crashed event-selection downstream
1320+
// ("Table ... index fIndexCollisions has a group with index -1 that is split
1321+
// by N"). This is the post-write counterpart to Stage 1b's regrouping.
1322+
// Returns the number of groups found split into >1 run (0 = OK).
1323+
static Long64_t checkCollisionGroupContiguity(TTree *t) {
1324+
TBranch *br = t->GetBranch("fIndexCollisions");
1325+
if (!br) return 0;
1326+
TLeaf *leaf = (TLeaf*)br->GetListOfLeaves()->At(0);
1327+
if (!leaf || TString(leaf->GetTypeName()) != "Int_t") return 0;
1328+
if (leaf->GetLen() != 1 || leaf->GetLeafCount()) return 0; // scalar only
1329+
1330+
Int_t idx = 0;
1331+
br->SetAddress(&idx);
1332+
std::unordered_set<Int_t> closed; // groups whose run has already ended
1333+
Int_t cur = 0; bool have = false;
1334+
Long64_t split = 0;
1335+
for (Long64_t i = 0; i < t->GetEntries(); ++i) {
1336+
br->GetEntry(i);
1337+
if (have && idx == cur) continue; // current run continues
1338+
if (have) closed.insert(cur); // previous run just ended
1339+
if (closed.count(idx)) ++split; // re-opening a group seen earlier
1340+
cur = idx; have = true;
1341+
}
1342+
br->ResetAddress();
1343+
return split;
1344+
}
1345+
13221346
static bool validateDF(TDirectory *d) {
13231347
bool ok = true;
13241348

@@ -1438,6 +1462,29 @@ static bool validateDF(TDirectory *d) {
14381462
}
14391463
}
14401464

1465+
// ---- Collision-group contiguity (slicing invariant) ----
1466+
// O2's slicing cache requires each fIndexCollisions group (incl. -1) to be a
1467+
// single contiguous run. This is the exact invariant whose violation crashed
1468+
// event-selection; it is what Stage 1b re-establishes. Check every
1469+
// collision-grouped track table (paste-join children follow their parent's
1470+
// row order, so checking the parent suffices).
1471+
TIter it3(d->GetListOfKeys());
1472+
TKey *k3;
1473+
while ((k3 = (TKey*)it3())) {
1474+
TObject *obj = d->Get(k3->GetName());
1475+
if (!obj || !obj->InheritsFrom(TTree::Class())) continue;
1476+
TTree *t = (TTree*)obj;
1477+
if (!isCollGroupedTrackTable(t->GetName())) continue;
1478+
if (isPasteJoinChild(t->GetName())) continue;
1479+
Long64_t split = checkCollisionGroupContiguity(t);
1480+
if (split > 0) {
1481+
std::cerr << " [FAIL] " << t->GetName()
1482+
<< ": fIndexCollisions has " << split
1483+
<< " group(s) split into non-contiguous runs (slicing will abort)\n";
1484+
ok = false;
1485+
}
1486+
}
1487+
14411488
return ok;
14421489
}
14431490

MC/utils/CLAUDE.md

Lines changed: 92 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,77 @@
11
# CLAUDE.md — AODBcRewriter Development Handoff
22

3+
## Current work state (handoff — 2026-06-09)
4+
5+
**Branch:** `fix/aodbcrewriter-track-regroup` · **Last commit:** `8bb9d30b3c`
6+
(committed, not yet pushed/validated).
7+
8+
### The bug being fixed: tracks' `-1` collision group split
9+
10+
Downstream O2 analysis (`o2-analysis-event-selection`) was crashing with:
11+
12+
```
13+
[FATAL] Table Tracks_IU index fIndexCollisions has a group with index -1
14+
that is split by 776
15+
```
16+
17+
O2's `ArrowTableSlicingCache::validateOrder` requires every `fIndexCollisions`
18+
group in a track table — **including the `-1` "ambiguous" group** — to be one
19+
contiguous run of rows. Two commits broke this:
20+
21+
- **`b11cd3de`** added a value-wise remap of tracks' `fIndexCollisions` via
22+
`collPerm` but left the track **rows in input order**. Since Stage 1 reorders
23+
`O2collision` (sort by remapped `fIndexBCs`), the remapped values no longer
24+
formed contiguous groups → the split. (b11cd3de made the *values* correct at
25+
the cost of the *grouping*; the complete fix needs **both** — reorder rows
26+
*and* remap values.)
27+
- **`28b44ef`** (a colleague's attempt) then replaced the Stage 0 BC **sort**
28+
with an order-preserving dedup that `std::abort()`s unless the input BC table
29+
is already `globalBC`-sorted. That contradicts PURPOSE (a) — repairing
30+
*non-monotonic* BCs in merged files — so it aborted on exactly the files the
31+
tool exists to fix ("doesn't run to completion").
32+
33+
### What the current fix does
34+
35+
1. **Reverted only the Stage 0 change** of `28b44ef`: restored `stage0_sortBCs`
36+
(sort + dedup) and removed the abort. Non-monotonic merged BCs are repaired
37+
again, as intended.
38+
2. **Kept `28b44ef`'s Stage 1b** (`stage1b_reorderTrackTables`, Section 9b) and
39+
the `fIndexTracks*` / `fIndexMFTTracks` / `fIndexFwdTracks` remaps in
40+
`processPasteJoinTables`. This regroup-tracks-by-remapped-`fIndexCollisions`
41+
mechanism is the *correct* fix for the split; it only ever failed because the
42+
aborting Stage 0 stopped it from running. Rewriting it from scratch was
43+
judged higher-risk than keeping the reviewed logic.
44+
3. **Added validator check** `checkCollisionGroupContiguity` (Section 11):
45+
mirrors O2's slicing invariant — flags any `fIndexCollisions` group split
46+
into >1 run. Runs over every collision-grouped track table.
47+
48+
**Design note / cascade:** sorting BCs is unavoidable for non-monotonic input
49+
and forces a reorder cascade **BC (Stage 0) → collisions (Stage 1) → tracks
50+
(Stage 1b)**, propagated to paste-join children and all track references. Do
51+
**not** re-introduce an "assert already sorted / order-preserving" Stage 0 — it
52+
is a known dead end (see the history note in the Section 4 code comment).
53+
54+
### Not done yet / next steps for whoever picks this up
55+
56+
- **UNVALIDATED on real data.** Parses cleanly in cling (`.L AODBcRewriter.C`),
57+
but has *not* been run on a real merged AO2D, nor through the analysis task
58+
that crashed. The macro is interpreter-only; `.L AODBcRewriter.C+` (ACLiC)
59+
fails on missing std includes — pre-existing, not a regression.
60+
Test sequence: `AODBcRewriter("AO2D.root","out.root")`
61+
`AODBcRewriterValidate("out.root")` (expect no `[FAIL]`) → then the **real**
62+
`o2-analysis-event-selection` on `out.root` (ground truth).
63+
- **Known fragility (whack-a-mole):** the `fIndexTracks*` reference remap is a
64+
hardcoded enumeration in `processPasteJoinTables` and the validator's
65+
`kIndexBranchToTable`. A missed reference into a reordered track table = silent
66+
corruption. Longer term, *derive* index→referent relationships from the AO2D
67+
column-name conventions instead of enumerating.
68+
- **Biggest gap:** there is no executable analysis-level CI for this tool, so
69+
regressions are only found in production with delay. Building a reproducer
70+
(merged AO2D that triggers the split/abort) + a CI check that runs the real
71+
task is the agreed top priority after this fix lands.
72+
73+
---
74+
375
## What this tool does
476

577
`AODBcRewriter.C` is a ROOT macro that fixes structural integrity problems in
@@ -87,7 +159,8 @@ order strictly follow its paste-join parent.
87159
| 5 | `stage0_copyBCFlags` | Copy BC flags table following BC row selection |
88160
| 6 | `MCCollKey`, `MCCollKeyHash`, `stage1_BCindexedTables` | Process all BC-indexed tables; deduplicate MCCollisions |
89161
| 7 | `stage2_MCCollIndexedTables` | Process all MCCollision-indexed tables; drop rows whose parent was deduped |
90-
| 8 | `rowOrderFromPerm`, `findPermByPrefix`, `processPasteJoinTables` | Reorder paste-joined tables to follow their parent (1:1 row count guaranteed); remap any of their own index columns value-wise; copy unrelated tables verbatim |
162+
| 9b | `isCollGroupedTrackTable`, `stage1b_reorderTrackTables` | **Stage 1b**: regroup collision-grouped track tables (`O2track_iu`, `O2mfttrack`, `O2fwdtrack`) by remapped `fIndexCollisions` (`-1` sinks to a contiguous tail); publish track perms so children/references follow. Restores the O2 slicing invariant after the BC→collision reorder cascade |
163+
| 8 | `rowOrderFromPerm`, `findPermByPrefix`, `processPasteJoinTables` | Reorder paste-joined tables to follow their parent (1:1 row count guaranteed); remap any of their own index columns value-wise (incl. `fIndexTracks*` via the Stage 1b track perms); copy unrelated tables verbatim |
91164
| 9 | `copyNonTreeObjects` | Copy TMap metadata and other non-TTree objects |
92165
| 10 | `processDF` | Orchestrates all stages for one `DF_*` directory |
93166
| 11 | `AODBcRewriter` | Top-level entry: opens files, iterates `DF_*` dirs, preserves compression |
@@ -227,6 +300,24 @@ The new validator catches the regression class as
227300
`[FAIL] paste-join size mismatch: O2mccollisionlabel* has N rows but parent
228301
O2collision* has M`.
229302
303+
### ~~9. Tracks' `-1` collision group split after BC/collision reorder~~ (RESOLVED — pending validation)
304+
305+
See the **Current work state** section at the top for the full story. In short:
306+
after Stage 1 reorders `O2collision`, the collision-grouped track tables must be
307+
**reordered** (not just have their `fIndexCollisions` values remapped) so each
308+
group — including the `-1` ambiguous group — stays one contiguous run, as O2's
309+
`ArrowTableSlicingCache::validateOrder` requires.
310+
311+
**Fix:** Stage 1b (`stage1b_reorderTrackTables`, Section 9b) stable-sorts each
312+
track table by remapped `fIndexCollisions` (`-1` to a contiguous tail), publishes
313+
the track perm, and `processPasteJoinTables` follows it for paste-join children
314+
and remaps every `fIndexTracks*` reference. New validator check
315+
`checkCollisionGroupContiguity` flags split groups as
316+
`[FAIL] ... fIndexCollisions has N group(s) split into non-contiguous runs`.
317+
318+
**Status:** committed on `fix/aodbcrewriter-track-regroup` (`8bb9d30b3c`), parses
319+
in cling, **not yet run on a real merged AO2D or the failing analysis task.**
320+
230321
---
231322
232323
## Testing checklist

0 commit comments

Comments
 (0)