Skip to content

fastaggregation: avoid runContainer16.lazyIOR slow path in FastOr#527

Merged
lemire merged 1 commit into
RoaringBitmap:masterfrom
tamirms:master
May 25, 2026
Merged

fastaggregation: avoid runContainer16.lazyIOR slow path in FastOr#527
lemire merged 1 commit into
RoaringBitmap:masterfrom
tamirms:master

Conversation

@tamirms
Copy link
Copy Markdown
Contributor

@tamirms tamirms commented May 25, 2026

Problem

(*Bitmap).lazyOR's same-key branch calls c1.lazyIOR(c2). When c1 is a runContainer16, lazyIOR falls back to iorinplaceUnionAddsearchRange, which is O(N log R) per element merged. This makes FastOr catastrophically slow over inputs containing run-encoded blocks (e.g. the output of AddRange over many overlapping ranges, or anything after RunOptimize).

A synthetic benchmark with 15 bitmaps of ~6000-bit runs takes ~335 ms/op before this change.

Fix

Pre-promote runContainer16 slots to bitmapContainer before the inner lazyIOR call at the single FastOr call site. A bitmapContainer's lazy union is O(1024) regardless of cardinality, so the K-way fan-in stays linear in K.

This mirrors the explicit toBitmapContainer pre-promotion already used in parallel.go's ParHeapOr, and the Java BitmapContainer.lazyor(RunContainer) path. Issue #81 has tracked the deeper fix (proper runContainer16.lazyIOR/lazyOR implementations) since 2016; this is the surgical workaround at the one hot call site.

BenchmarkFastOrRunContainers:
  before:  335 ms/op    12 KB/op   447 allocs/op
  after:   637 µs/op   335 KB/op   257 allocs/op    (~526x)

Notes

The output container type for slots that started as runContainer16 inputs is now bitmapContainer (or arrayContainer after repairAfterLazy if sparse). repairAfterLazy does not re-encode to runs; callers that want a run-optimised result should call RunOptimize() on the FastOr return value. ParHeapOr already behaves the same way.

The full v2 test suite passes.

Refs: #81

(*Bitmap).lazyOR's same-key branch calls c1.lazyIOR(c2). When c1 is
a runContainer16, lazyIOR falls back to ior -> inplaceUnion -> Add ->
searchRange, which is O(N · logR) per element merged. This makes FastOr
catastrophically slow over inputs containing run-encoded blocks: a
synthetic benchmark with 15 bitmaps of ~6000-bit runs takes ~335 ms/op
before this change.

Pre-promote runContainer16 slots to bitmapContainer before the inner
lazyIOR call. The bitmapContainer's lazy union is O(1024) regardless
of cardinality, so the K-way fan-in stays linear in K. Mirrors the
explicit toBitmapContainer pre-promotion in parallel.go ParHeapOr and
the Java BitmapContainer.lazyor(RunContainer) path. Issue RoaringBitmap#81 has been
tracking the deeper fix (proper runContainer16.lazyIOR/lazyOR
implementations) since 2016; this is the surgical workaround at the
single FastOr call site.

BenchmarkFastOrRunContainers (added):
  before:  335 ms/op    12 KB/op   447 allocs/op
  after:   637 µs/op   335 KB/op   257 allocs/op    (~526x)

The output container type for slots that started as runContainer16
inputs is now bitmapContainer (or arrayContainer after repairAfterLazy
if sparse). repairAfterLazy does not re-encode to runs; callers that
want a run-optimised result should call RunOptimize() on the FastOr
return value. ParHeapOr already behaves the same way.

The full v2 test suite passes.

Refs: RoaringBitmap#81
@lemire lemire merged commit f3ceb59 into RoaringBitmap:master May 25, 2026
8 checks passed
tamirms added a commit to stellar/stellar-rpc that referenced this pull request May 27, 2026
The FastOr/runContainer16 fix (RoaringBitmap/roaring#527) we previously
carried as a tamirms/roaring fork has merged upstream and shipped in the
official v2.18.2 release (the v2.18.2 tag is the #527 merge commit). Drop
the replace directive and require v2.18.2 directly.

v2.18.2 is the minimum: it's the first upstream release with the #527
FastOr fix, so dropping below it returns to the runContainer16.lazyIOR
slow path the fork existed to avoid. The borrow-model invariants the
query path relies on (FastAnd/FastOr don't mutate inputs; singleton
FastAnd/FastOr Clone) were re-verified against the v2.18.2 source.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants