feat: canonical H2O coverage — q6/q8/q9 adapters + engine-only timing + DataFusion memtable fix#4
Open
ser-vasilich wants to merge 6 commits into
Open
feat: canonical H2O coverage — q6/q8/q9 adapters + engine-only timing + DataFusion memtable fix#4ser-vasilich wants to merge 6 commits into
ser-vasilich wants to merge 6 commits into
Conversation
…karounds Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
register_csv produces a listing table that re-parses CSV on every timed query. register_record_batches with the collected batches caches the columnar layout in memory. q4 154→17ms, q6 312→148ms, q8 367→262ms — DataFusion now apples-to-apples with adapters that hold native columnar storage.
q8's natural rayforce shape is 100k rows with LIST<F64>[2] cells — duckdb's ROW_NUMBER() <= 2 SQL emits 200k exploded rows. Timed bench was unfair: rayforce skipped the row-materialisation cost SQL adapters pay for. Move the explode into the timed engine query via raze + indexed gather (vectorised, no per-element lambda) so both sides materialise 200k. q8 163ms (100k rows) → 215ms (200k rows) vs duckdb 198ms — ~apples-to-apples now. Bundles the q9 two-stage adapter form already in the working tree.
run_groupby_q8's fast vectorised explode assumes K=2 everywhere (true for canonical 10m k100, where every id6 group has ≥2 non-null v3). Small check sizes (10..1m) hit groups with K=1 cells; the K=2-uniform formula produces row-count mismatch. Split: timed path keeps the fast formula; materialize() reverts to a per-cell Python explode for correctness across all check sizes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
61 commits accumulating the canonical H2O (h2oai/db-benchmark) coverage on rayforce-bench: engine-only timing for SQL adapters, rayforce wrappers for q6/q8/q9, fairness fixes across adapters, and dashboard polish.
Headline changes
Engine-only timing across SQL adapters (`20f915a`)
Replace `fetchall()` / IPC-materialization with server-side draining or `CREATE TEMPORARY TABLE` patterns so each adapter is timed on engine work only, not Arrow IPC / Python conversion. Affects DuckDB, chDB, DataFusion, QuestDB, TimescaleDB.
Rayforce q6 / q8 / q9 adapters (`a50ab48`, `611bcb3`, `626cd34`, `99ae025`)
Engine-side explode for q8 (raze + indexed gather) keeps the timed query in row form (200k rows) — matches DuckDB's `ROW_NUMBER OVER PARTITION` shape and SQL adapters' default materialization.
DataFusion memtable fix (`eae3261`)
`register_csv` produced a listing table that re-parsed CSV on every timed query (page cache avoided disk, but parse cost remained). Replaced with `register_record_batches` after one-shot `collect()`. Apples-to-apples vs duckdb/chdb/polars/pandas/rayforce which all hold native columnar storage. q4 154→17 ms, q6 312→148 ms, q8 367→262 ms.
Dashboard / framework polish (multiple)
Perf snapshot (10M rows, k=100 cardinality, engine-only timing)
Rayforce wins 9/10 (q4 within 1ms of duckdb).
Related
Test plan