HEPTv2: End-to-End Efficient Point Transformer for Charged-Particle Reconstruction

Siqi Miao^1†, Shitij Govil^1†, Jack P. Rodgers², Mia Liu², Javier Duarte³, Shih-Chieh Hsu⁴, Yuan-Tang Chou^4*, Pan Li^1*

¹Georgia Institute of Technology · ²Purdue University · ³UC San Diego · ⁴University of Washington
_{^† Equal contribution · ^* Corresponding authors}

Figure 1. Overview of charged-particle tracking and the HEPTv2 pipeline. A collider detector records sparse measurements; a high-pileup event produces a large unordered hit cloud, and reconstruction associates hits into particle trajectories. HEPTv2 couples a locality-aware point encoder with a sectorized track decoder under joint end-to-end supervision.

Introduction

Charged-particle tracking is a core reconstruction task in high-energy physics (HEP), in which sparse detector hits must be associated into particle trajectories under severe combinatorial ambiguity. At the High-Luminosity LHC (HL-LHC), this must be done under much higher pile-up while preserving both tracking quality and computational efficiency.

Existing graph-based approaches achieve strong performance, but their end-to-end runtime is often dominated by costly graph construction and processing. Prior transformer-based approaches avoid explicit graph processing, yet still rely on auxiliary stages such as hit filtering or clustering and are therefore not optimized end-to-end.

HEPTv2 is a single-stage, end-to-end efficient point transformer for charged-particle tracking. It couples a locality-aware point encoder with a sectorized track decoder, predicting final tracks within an end-to-end trainable pipeline:

The encoder applies locality-sensitive hashing (LSH) directly in detector coordinate space (η, φ), serializing the unordered hit cloud into 1D sequences in which spatially nearby hits stay close. This preserves tracking-relevant geometric neighborhoods while enabling block-wise local attention with cost linear in the number of hits N.
The decoder maintains a fixed bank of M learnable track slots and predicts the hit-to-track assignment matrix directly. To tame ambiguity at full-event scale, it partitions the event into k broad azimuthal φ-sectors and solves k smaller assignment problems before merging them into the final event-level reconstruction.
Encoder and decoder are trained jointly under a unified objective, so the encoder learns trajectory-informed, discriminative per-hit representations and the decoder can directly predict hit-to-track assignments end-to-end.

On the TrackML benchmark, HEPTv2 reaches 98.6% double-majority (DM) tracking efficiency, with about a 0.8% fake rate, ~15 ms inference latency and 0.4 GB peak memory per event on a single NVIDIA A100 GPU, both scaling near-linearly to 5 × 10⁵ hits. HEPTv2 attains the best accuracy–latency trade-off among compared methods, improving DM by more than 4.5% over the strongest prior transformer baseline and by 1.1–2.2% over highly optimized graph-based pipelines, while reducing latency by 7× and 38–52× respectively.

Results on TrackML

Model	DM efficiency (ε_pT>0.9)	Fake rate (f_pT>0.9)	Latency (ms)	Memory (GB)
OC-GNN	96.4%	0.9%	571.5	5.4
ACORN-GNN	97.5%	0.9%	783.7	16.6
HEPT + DBSCAN	89.6%	3.3%	105.5	7.6
Two-stage MF	94.1%	0.7%	99	–
HEPTv2	98.6%	0.8%	15.1	0.4

_{Evaluated on the TrackML pixel-detector benchmark under pT > 0.9 GeV, |η| < 4 (Two-stage MF reported under the slightly easier pT > 1.0 GeV, |η| < 4). All methods measured on a single NVIDIA A100 GPU. See the paper for full kinematic breakdowns, scalability curves, and ablations.}

Method at a glance

Locality-aware point encoder. For each hit, an OR–AND E²LSH ordering value o_i = LSH(η_i, φ_i) is computed directly in detector space. Sorting by o_i yields a 1D sequence that probabilistically preserves (η, φ) locality; the sequence is partitioned into fixed-size blocks and self-attention is restricted within each block, giving a block-diagonal attention pattern that is linear in N. Independent LSH projections are used across heads so each event is effectively viewed through multiple randomized orderings. Defaults: 4 layers, 8 heads, head dim 128, block size 1024, m_OR = 3, m_AND = 2.

Sectorized track decoder. A fixed bank of M = 3000 learnable track slots is refined against the encoded hits via interleaved cross-attention (slot → hits) and self-attention (slot ↔ slot) over L = 2 decoder layers. The event is split into k = 3 azimuthal sectors (Δφ = 2π/3); per sector the decoder predicts a slot-activity vector and a hit-assignment matrix, which are merged into the event-level reconstruction. This exploits the approximately block-sparse structure of the ground-truth assignment matrix.

Joint end-to-end objective. The model is trained with a weighted sum of encoder-side and decoder-side losses:

Encoder: InfoNCE contrastive loss over per-hit embeddings + BCE target/background classification.
Decoder: Hungarian matching of slots to ground-truth tracks, followed by sigmoid focal loss + Dice overlap loss on the matched hit-assignment matrix, plus slot-activity BCE.

Default weights λ_cls = 0.1, λ_assign = 200, λ_dice = 2, λ_bg = 1.8, λ_InfoNCE = 12; 250 epochs, batch size 1, Muon optimizer, lr 2.5 × 10⁻⁴ with step decay.

Code

This repository ships a self-contained, minimal implementation of inference and single-GPU training in heptv2/. It loads a trained checkpoint and runs encoder → per-sector decoder → post-processing → tracking metrics, and supports single-GPU training (finetune or from scratch) with the same loss composition described above. See heptv2/README.md for full details on supported options and layout.

heptv2/
├── run_inference.py      # inference CLI
├── run_train.py          # single-GPU training CLI
├── model/                # Transformer encoder+decoder, HEPT attention, positional emb
├── data/                 # TrackML loader + preprocessing (eta filter, padding, sectors)
├── training/             # train/eval loops, set criterion (Hungarian matcher, dice/focal)
├── eval/                 # post-processing + tracking metrics (DM, fake/dup rate)
├── utils/                # block-size math, E2LSH hashing, serialization
├── configs/              # inference / training YAML configs
└── scripts/              # sbatch launchers + plotting/benchmark scripts

Installation

conda env create -f environment.yml
conda activate cuda121

Usage

Inference

python -m heptv2.run_inference --config heptv2/configs/infer.yaml

Set eval.limit_events: 3 in the config for a quick smoke test, or submit sbatch heptv2/scripts/infer.sh for the full run.

Training (single GPU)

python -m heptv2.run_train --config heptv2/configs/train.yaml

Per-epoch validation runs the full post-processing + tracking-metrics path, so dm / technical_efficiency / fake_rate / dup_rate are logged alongside the losses. Checkpoint selection is controlled by best_metric_key / best_metric_mode (e.g. select by dm with mode max). See heptv2/README.md for the smoke test and full-finetune launchers.

Relation to HEPT (v1)

HEPTv2 builds on the LSH-based serialization idea of HEPT (ICML 2024), but applies it directly in detector coordinate space rather than in latent space, and replaces the post-hoc DBSCAN clustering stage with a directly-trained sectorized track decoder, yielding a single end-to-end pipeline.

Citation

If you find this work useful, please cite:

@article{miao2026heptv2,
  title   = {HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction},
  author  = {Miao, Siqi and Govil, Shitij and Rodgers, Jack P. and Liu, Mia and
             Duarte, Javier and Hsu, Shih-Chieh and Chou, Yuan-Tang and Li, Pan},
  year    = {2026}
}

License

This project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
example		example
heptv2		heptv2
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HEPTv2: End-to-End Efficient Point Transformer for Charged-Particle Reconstruction

Introduction

Results on TrackML

Method at a glance

Code

Installation

Usage

Inference

Training (single GPU)

Relation to HEPT (v1)

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HEPTv2: End-to-End Efficient Point Transformer for Charged-Particle Reconstruction

Introduction

Results on TrackML

Method at a glance

Code

Installation

Usage

Inference

Training (single GPU)

Relation to HEPT (v1)

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages