VarBinBuilder: eagerly set IsSorted stat on offsets#6452
VarBinBuilder: eagerly set IsSorted stat on offsets#6452dimitarvdimitrov wants to merge 15 commits intodevelopfrom
Conversation
|
Can also do this for runend varbinview if not |
i couldn't find where in the VarBinViewArrayBuilder we create runend offsets for VarBin(View) arrays. Can you point me pls? |
|
varbinview: also maybe sequence array: |
Merging this PR will not alter performance
Comparing Footnotes
|
joseph-isaacs
left a comment
There was a problem hiding this comment.
we need to be careful on validation cost on the release path
Polar Signals Profiling ResultsLatest Run
Previous Runs (4)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: Random AccessSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: CompressionSummary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
26a0461 to
94ed345
Compare
Deploying vortex-bench with
|
| Latest commit: |
92faa3f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://3950788a.vortex-93b.pages.dev |
| Branch Preview URL: | https://mitko-varbin-set-is-sorted-o.vortex-93b.pages.dev |
encodings/runend/src/array.rs
Outdated
| ends.statistics() | ||
| .set(Stat::IsStrictSorted, StatPrecision::Exact(true.into())); | ||
| ends.statistics() | ||
| .set(Stat::IsSorted, StatPrecision::Exact(true.into())); |
There was a problem hiding this comment.
why is this now true?
There was a problem hiding this comment.
mb, i didn't go through the rest of the code after we spoke about it. deleted now, along with the tests
joseph-isaacs
left a comment
There was a problem hiding this comment.
Did you need to add sortedness to the compressor
it already does it - vortex/encodings/runend/src/compress.rs Lines 76 to 77 in 1cd554b |
VarBinBuilder guarantees monotonically increasing offsets, but finish() wasn't setting the IsSorted stat. This caused an O(n) recomputation every time a deserialized VarBinArray was validated, which showed up as a hot path in production profiles. Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
RunEndArray::validate didn't check that ends are strictly increasing, which is required for binary search to work correctly. Also caches IsSorted and IsStrictSorted on the ends after construction so downstream consumers don't recompute it. Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
SequenceArray (A[i] = base + i * multiplier) can determine sortedness from the sign of the multiplier at construction time. Set both stats eagerly so we skip the kernel entirely. Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 1445406 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: b2b7817 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 5c932f3 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 2f5b11b I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 8ae944c Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
The reviewer removed the offset monotonicity validation from VarBinArray::try_new, so the test now expects construction to succeed. Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>
a6e603c to
6cd097d
Compare
I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 6cbf9b0 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: c5fbeb7 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 1cb2660 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: e50a078 I, Dimitar Dimitrov <dimitar@spiraldb.com>, hereby add my Signed-off-by to this commit: 8cd19f4 Signed-off-by: Dimitar Dimitrov <dimitar@spiraldb.com>

Does this PR closes an open issue or discussion?
no
What this PR does?
VarBinBuilder guarantees monotonically increasing offsets, but finish() wasn't setting the IsSorted stat. This caused an O(n) recomputation every time a deserialized VarBinArray was validated.
How is this change tested?
unit tests
Are there any user-facing changes?
VarBin arrays now persist stats for their offsets child array. Existing persisted arrays will still be verified as sorted upon reading from disk.