Skip to content

feat(qwp): scope the SYMBOL dictionary to a single query#58

Open
kafka1991 wants to merge 3 commits into
mainfrom
qwp_query_reset_dict_flag
Open

feat(qwp): scope the SYMBOL dictionary to a single query#58
kafka1991 wants to merge 3 commits into
mainfrom
qwp_query_reset_dict_flag

Conversation

@kafka1991

@kafka1991 kafka1991 commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Tandem PR (server): questdb/questdb#7321 — it carries the protocol change and, because most of this client's real-server coverage runs against a live ServerMain in the questdb repo, the end-to-end and protocol tests for this change. Review the two together.

Why

QWP carries a SYMBOL column as a dictionary (the distinct string values) plus per-row integer codes. The fastest egress consumers — the Python client's to_polars() / to_pandas() dataframe paths — reuse that wire dictionary directly as the pandas/polars Categorical dictionary: the per-row codes are adopted zero-copy, with no re-interning and no per-row remap. That direct reuse is what makes SYMBOL → dataframe maximally efficient.

The catch is that the QWP dictionary is connection-scoped: it accumulates every symbol seen across all queries on a connection and only resets at a soft cap. Because the consumer reuses the dictionary as-is, a result that references few symbols but lands on a connection an earlier high-cardinality query already grew inherits a bloated Categorical — full of categories it never uses. The very zero-copy reuse that makes the path fast is what drags the cross-query bloat in.

The protocol fix (server PR #7321) adds a per-query query_flags field to QUERY_REQUEST whose first bit, QUERY_FLAG_RESET_DICT, asks the server to reset the connection-scoped dictionary before the query streams — keeping the zero-copy speed without the bloat. This PR is the Java client half: it lets a Java caller set that flag per query.

What

QwpQueryClient gains execute(..., boolean resetSymbolDict) overloads. The existing two- and three-argument forms delegate with false, so current callers are unchanged. When the flag is set, the client asks the server to reset the connection-scoped SYMBOL dictionary before the query streams, scoping the dictionary to that query's own symbols. The flag is opt-in and defaults off.

  • resolveQueryFlags maps the request to the QUERY_FLAG_RESET_DICT bit, but only when the server advertised the CAP_QUERY_FLAGS capability in SERVER_INFO; against a server without it the flag is silently dropped.
  • QwpEgressIoThread appends the query_flags varint trailer to QUERY_REQUEST only when the resolved flags are non-zero, so a baseline frame stays byte-identical on the wire.

Compatibility

  • Old server + new client: no capability advertised, so no trailer is sent — unchanged.
  • New server + old client: no trailer, server defaults flags to 0 — unchanged.
  • New server + new client: the flag is honoured.

Tradeoff

Setting the flag gives up cross-query dictionary reuse for that query: the dictionary and its per-batch delta are rebuilt from scratch, so the query's symbols are re-sent rather than amortized across the connection. That is the intended exchange — a lean, query-scoped Categorical in return for re-sending the query's own symbols. Callers that benefit from connection-scoped accumulation should leave the flag off (the default).

Testing

Most of this client's real-server coverage lives in the questdb repo (it runs against a live ServerMain), so the protocol and end-to-end tests for this change ship in the tandem PR #7321:

  • QwpEgressServerInfoTest — the OSS server advertises CAP_QUERY_FLAGS, which is what gates the client appending the trailer.
  • QwpEgressRequestDecoderTest — decoding of the optional query_flags trailer: the reset bit, a trailer after a non-empty bind section, multi-byte varint values, and high/sign bits all decode; a baseline frame leaves it 0.
  • QwpEgressQueryFlagsResetTest — the decode → dict-reset effect: flag absent keeps the conn dict; flag set clears it and stages the CACHE_RESET; an empty dict suppresses the redundant frame; an unknown flag bit is ignored.
  • QwpEgressProcessorStateCacheResetTest — reset-mask boundaries.
  • QwpEgressCacheResetWireTest — the end-to-end client↔server CACHE_RESET wire path this builds on, driving this client against a live server.

In this repo, the existing QWP client/egress unit tests still pass and a flag-off frame stays byte-identical on the wire.

Add client support for the QWP per-query query_flags trailer so a caller
can scope the connection-scoped SYMBOL dict to a single query.

QwpQueryClient gains execute(..., boolean resetSymbolDict) overloads. When
set, resolveQueryFlags maps the request to QUERY_FLAG_RESET_DICT, but only
when the server advertised CAP_QUERY_FLAGS; otherwise the flag is dropped.
QwpEgressIoThread carries the resolved flags on QueryRequest and appends
the query_flags varint trailer to QUERY_REQUEST only when non-zero, so a
baseline frame stays byte-identical and the server defaults the flags to 0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kafka1991 kafka1991 added the enhancement New feature or request label Jun 25, 2026
@kafka1991 kafka1991 changed the title feat(qwp): send per-query SYMBOL dict reset flag feat(qwp): scope the SYMBOL dictionary to a single query Jun 25, 2026
kafka1991 and others added 2 commits June 25, 2026 09:19
QwpQueryClientQueryFlagsTest drives execute() against the mock WebSocket
server, captures the emitted QUERY_REQUEST, and asserts the optional
query_flags trailer:
- reset flag + server CAP_QUERY_FLAGS -> QUERY_FLAG_RESET_DICT trailer
- the trailer is the only difference from the byte-identical flag-off baseline
- no capability -> flag dropped; the flag-less overloads send no trailer

TestWebSocketServer gains setCapabilities(int) so its SERVER_INFO frame can
advertise CAP_QUERY_FLAGS; the default of 0 keeps the existing byte layout
for every other test.

Covers the previously-untested client paths: QwpQueryClient.resolveQueryFlags
and the execute(..., resetSymbolDict) overloads, plus the QwpEgressIoThread
query_flags trailer append.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mtopolnik

Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 18 / 18 (100.00%)

file detail

path covered line new line coverage
🔵 io/questdb/client/cutlass/qwp/client/QwpQueryClient.java 15 15 100.00%
🔵 io/questdb/client/cutlass/qwp/client/QwpEgressIoThread.java 3 3 100.00%

@kafka1991

Copy link
Copy Markdown
Collaborator Author

Code review (level 3 — full pass)

Change-surface map + 10-agent fan-out (incl. fresh-context adversarial & cross-context) with per-finding source verification.

Verdict: Approve — no blocking issues

The change is small, additive, and correctly wired. Existing callers are untouched (the new boolean overloads sit at non-ambiguous positions), the reused QueryRequest holder gets queryFlags written on every submit (no stale leak), failover re-resolves capabilities per attempt against the freshly-published serverInfo, and the wire trailer is correctly gated and byte-identical-when-off. The new test is meaningful and not flaky. No java.util.* on a data path, no zero-GC violations, no Java-8-floor violations. No Critical findings — everything below is advisory.


Moderate

M1 — Feature correctness depends entirely on the tandem server PR; confirm two contracts there (cross-repo, fails safe).
This client half resets only the server's dict (via the wire flag). It never resets its own local dict mirror — the diff doesn't touch the decoder. The per-query path only calls decoder.resetQuerySchema() (QwpEgressIoThread.java:323), which clears the schema, not the connection-scoped SYMBOL dict (connDictSize). The client's local dict is reset only by a server CACHE_RESET frame → applyCacheReset (QwpResultBatchDecoder.java:184).

So the feature is coherent only if the server, when honoring QUERY_FLAG_RESET_DICT, also emits a CACHE_RESET before the query's first RESULT_BATCH. If the server reset its dict without CACHE_RESET, the next batch's deltaStart=0 would mismatch the client's non-zero connDictSize and trip the guard at QwpResultBatchDecoder.java:749-752 ("delta symbol dict out of sync") → terminal transport error → failover → onError. That's a clean, fail-safe error (no corruption / UAF), so it's not a client bug. Before merge, confirm in #7321 that:

  1. The server defines CAP_QUERY_FLAGS = 0x00000002 (matches this client's bit; does not collide with the server's existing CAP_ZONE = 0x01), and
  2. Honoring the flag emits CACHE_RESET so the client's dict mirror stays in lockstep.

The cited QwpEgressQueryFlagsResetTest / QwpEgressCacheResetWireTest should cover exactly this — verify they land and pass.


Minor

m1 — QUERY_FLAG_* constants are byte; sign-extension trap for any future bit-7 flag.
QwpEgressMsgKind.java:69 declares QUERY_FLAG_RESET_DICT = 0x01 as a byte. In resolveQueryFlags (QwpQueryClient.java:1775-1777) the ternary mixes that byte with 0L, so it's widened to long by sign extension. Fine today (0x011L). But the first future flag with bit 7 set — e.g. (byte)0x80 — widens to 0xFFFF_FFFF_FFFF_FF80L; putVarint's signed while (value > 0x7F) guard (NativeBufferWriter.java:310) then emits a single truncated 0x80 byte (continuation bit set, no follow-on) — a malformed varint. Cheap hardening: declare the QUERY_FLAG_* constants as int, or mask in resolveQueryFlags (& 0xFFL). Not a current bug.

m2 — No test for the reset flag together with bind parameters.
QwpQueryClientQueryFlagsTest only exercises the no-bind overloads, and its queryFlagsTrailer parser explicitly assumes zero binds. The client emits the trailer identically regardless of binds, so risk is low — but the highest-risk wire interaction (trailer after a non-empty bind section) is unpinned in this repo. Server-side decode of that case is covered by QwpEgressRequestDecoderTest in the tandem repo, which mitigates it. Consider one local case: execute(sql, binds, handler, true) with a bind, asserting the trailer lands after the bind payload.

m3 — TestWebSocketServer footgun: capabilities written without a matching zone_id trailer.
buildServerInfoFrame now writes the capabilities int verbatim but never emits a zone_id trailer. Since the client decoder reads a mandatory zone_id whenever the CAP_ZONE (0x01) bit is set, a future test that calls setCapabilities with bit 0 would make connect() fail with a decode exception. The new setCapabilities Javadoc warns about this and no current test sets CAP_ZONE, so it's contained — but a trap for the next caller.

m4 — Silent flag-drop gives the caller no direct signal (intended).
When the server doesn't advertise CAP_QUERY_FLAGS, resetSymbolDict=true is silently dropped and the caller gets a bloated Categorical with no indication. This is documented as intended (throwing would break rolling upgrades / old servers, so the choice is right). Callers needing certainty can inspect getServerInfo().getCapabilities() & CAP_QUERY_FLAGS — worth a one-line mention in the execute(..., boolean) Javadoc so users know the check exists.


PR metadata

Clean. Valid Conventional Commits title, correct qwp scope, end-user-impact-first description, enhancement label fits. No Fixes #NNN is correct (feature with a tandem PR, not an issue fix).


Dropped during verification (false positives)

  • Stale queryFlags leaks across queries via the reused holder — overwritten on every submitQuery (QwpEgressIoThread.java:396).
  • Failover replays the flag against a stale capability — re-resolved per attempt after reconnectViaTracker republishes serverInfo.
  • New boolean overload changes resolution of existing callers — distinct parameter types; all callsites (incl. QueryImpl.java:177) bind unambiguously to their original overloads.
  • Trailer could be misread as bind datacount()/bufferLen() move together; count>0 && bufferLen==0 is unreachable.
  • Capture race / flaky test — server sets captured (AtomicReference) before sending EXEC_DONE; server.close() joins the read threads.

Findings: 10 draft → 5 verified (1 Moderate, 4 Minor), 5 dropped. 0 out-of-diff caller bugs (all callsites + submitQuery's sole caller walked and confirmed safe). Only tradeoff is the documented, opt-in loss of cross-query dict reuse when the flag is set; baseline frames stay byte-identical on the wire.

🤖 Review generated with Claude Code (Opus 4.8, 1M context)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants