Skip to content

feat(core): OIDC sign-in via device flow (RFC 8628)#52

Open
glasstiger wants to merge 61 commits into
mainfrom
ia_oidc_device_flow
Open

feat(core): OIDC sign-in via device flow (RFC 8628)#52
glasstiger wants to merge 61 commits into
mainfrom
ia_oidc_device_flow

Conversation

@glasstiger

@glasstiger glasstiger commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Adds interactive OIDC sign-in to the Java client using the OAuth 2.0 Device Authorization Grant (RFC 8628). A process with no local browser — a remote notebook kernel, a container, a headless job — can sign a human in against QuestDB Enterprise: the user authorizes on any device (laptop or phone) while the process only makes outbound calls to the identity provider.

On first use it prints a verification URL and a short code (and, by default, also tries to open the URL in a local browser); once the user authorizes, the token is cached in memory and refreshed silently on later calls.

import io.questdb.client.Sender;
import io.questdb.client.cutlass.auth.OidcDeviceAuth;

// Discover the client id, scope and endpoints from the QuestDB server's /settings:
try (OidcDeviceAuth auth = OidcDeviceAuth.fromQuestDB("https://questdb.example.com:9000")) {
    auth.signIn(); // sign in once: prompts on first use, then caches and refreshes

    // Pass a token provider, not a fixed string: the sender pulls a freshly refreshed token on each
    // request, so a long-lived sender keeps working as the token rotates. getToken() refreshes
    // silently and never prompts on the flush path.
    try (Sender sender = Sender.builder(Sender.Transport.HTTP)
            .address("questdb.example.com:9000")
            .enableTls()
            .httpTokenProvider(auth::getToken)
            .build()) {
        sender.table("trades")
                .symbol("symbol", "ETH-USD")
                .doubleColumn("price", 2615.54)
                .atNow();
    }
}

What's new

OidcDeviceAuth (io.questdb.client.cutlass.auth) — runs the flow and owns the token:

  • OidcDeviceAuth.fromQuestDB(url) discovers the client id, scope, audience and IdP endpoints from the server's unauthenticated /settings; OidcDeviceAuth.fromQuestDB(url, DiscoveryOptions) adds an identity-provider pin (.issuer(...)), a TLS config, an allowInsecureTransport opt-in, and the prompt (see Discovery and trust below); OidcDeviceAuth.builder() configures the identity provider explicitly.
  • signIn() signs in interactively on first use, then serves a cached token and refreshes it silently; getToken() never prompts and never blocks (safe on a request/flush path); getAuthorizationHeaderValue() returns the full Bearer … value; clearCache() drops the cached token so the next signIn() re-signs-in; close() cancels an in-flight sign-in (observed between polls, so it can take up to one HTTP request timeout to return). Calls are serialized by a ReentrantLock; getToken() uses tryLock and fails fast rather than wait behind an interactive sign-in. Token state lives in memory only and does not survive a process restart.

Sender integration — new HttpTokenProvider interface and Sender.builder(...).httpTokenProvider(auth::getToken). The sender pulls a freshly refreshed token on every request, so a long-lived sender keeps working as the token rotates — unlike a fixed httpToken(...), which is captured once and eventually starts returning 401s. Mutually exclusive with httpToken/httpUsernamePassword. Supported over HTTP and WebSocket transport (a WebSocket sender re-queries the provider on every (re)connect/upgrade); rejected for TCP and UDP. The two transports diverge on a sustained token outage: over HTTP a failed pull leaves the request token-pending and is retried on the next row, whereas over WebSocket a pull that keeps failing past the sender's reconnect budget terminates the sender for good, like any persistent reconnect failure. The first pull is deferred off the build path to the first row, so the documented construct → signIn() → send ordering works and a provider that throws leaves the request retriable instead of corrupting the sender.

QWP egress query clientQwpQueryClient.withBearerTokenProvider(HttpTokenProvider) accepts the same on-demand provider, so OidcDeviceAuth::getToken plugs into the egress query path as well as ingress. The provider is queried at every WebSocket upgrade — the initial connect() and each failover reconnect — so a long-lived query client follows token rotation; each returned token is validated before it reaches the header, and a provider that throws fails that connection attempt (matching the ingress sender). Mutually exclusive with withBearerToken/withBasicAuth.

DeviceCodePrompt / DeviceAuthorizationChallenge — how the verification URL and user code are shown. The default, DeviceCodePrompt.openBrowser(), prints the instructions to System.out and also tries to open the verification URL in the local default browser; the browser open is best-effort (skipped on a headless JVM, without the java.desktop module, or for a non-http(s) URL, and disabled by -Dquestdb.client.oidc.open.browser=false) and never blocks or fails sign-in. Use DeviceCodePrompt.SYSTEM_OUT to print only, or supply your own to render a clickable link or a QR code, e.g. in a notebook.

audiencebuilder().audience(...) / discovered from acl.oidc.audience. When set, the audience parameter is sent on the device-authorization and refresh requests, for providers that require it to stamp the aud claim QuestDB expects.

The token can be presented to QuestDB over any auth path the server already validates:

  • REST / ingestionAuthorization: Bearer <token>.
  • PG-wire — connect as _sso with the token as the password (requires acl.oidc.pg.token.as.password.enabled=true on the server).

Discovery and trust

fromQuestDB(...) takes the IdP endpoints from the server's unauthenticated /settings, so by default it trusts that server to designate where the user signs in: a spoofed, compromised, or man-in-the-middled server could otherwise redirect the sign-in — and the long-lived refresh token — to an attacker-controlled identity provider. An optional DiscoveryOptions.issuer(...) pin addresses this, and also covers servers that do not advertise a device-authorization endpoint. The pin separates two sources of endpoints and trusts them differently — an endpoint the untrusted /settings advertised is constrained to the issuer, while an endpoint read from the identity provider's own .well-known is trusted wherever the provider hosts it:

  • .well-known discovery fallback. Current servers do not advertise the device-authorization endpoint. When it (and/or the token endpoint) is missing, a pinned issuer reads it from {issuer}/.well-known/openid-configuration. The discovery origin comes only from the caller-supplied issuer, never from a /settings-supplied value, so a tampered /settings cannot choose where discovery — and the credential POSTs it resolves — are aimed. Without a pin, discovery is refused rather than guessed.
  • Co-location pin. validateEndpointOrigins, enforced on every construction path (discovery and the explicit builder()), requires the token and device-authorization endpoints to share one origin (RFC 8628 co-locates them on a single authorization server), so a tampered /settings or discovery document cannot siphon one of the two credential POSTs off to a different origin.
  • /settings-advertised endpoints are pinned to the issuer. An endpoint the untrusted /settings response supplied must sit on the pinned issuer's origin, and — when the issuer has a path — under that path (compared segment by segment, rejecting ./.., percent-encoded traversal, and a percent-encoded path separator such as %2f or %5c, at every decode level). The path check matters for a path-based provider that shares one origin per tenant (e.g. a Keycloak realm path /realms/<realm>), where the origin check alone cannot stop a tampered /settings from steering credentials to a sibling tenant. The issuer is supplied out of band and cannot be forged.
  • IdP-discovered endpoints are trusted where the issuer hosts them. An endpoint read from the issuer's own .well-known is neither origin-pinned nor path-scoped: that document is fetched from the pinned issuer origin and is authoritative for wherever the provider hosts its endpoints. This is deliberate — some providers (e.g. Google, Azure AD) serve their token and device endpoints from a different origin or path than the issuer, and discovery against them signs in normally. The co-location pin above still applies.
  • Plaintext-channel pin. A /settings response fetched over plaintext http to a non-loopback host (only reachable with allowInsecureTransport) is MITM-able, so its advertised endpoints are not trusted to route credentials without an issuer pin.

Without a pin, the behaviour against an https server that advertises its endpoints is unchanged: that server is trusted, as before.

Security

  • https is required by default for both the QuestDB server and the IdP endpoints; http is rejected unless the caller opts in with allowInsecureTransport(true). That opt-in relaxes only the QuestDB /settings link — the IdP device-authorization and token endpoints always require https (loopback excepted), so the device code and refresh token never cross the network in cleartext (matching the Python client).
  • Tokens never leak into logs or exceptions. Only the HTTP status of a token/device response is captured; the body — which carries access, id and refresh tokens — is never retained or surfaced in a message. The captured status is validated to be exactly three digits, so a malformed or hostile status line cannot splice ANSI/control bytes into a later [httpStatus=…] echo, nor can a short all-digit status (2, 5) be misread as a 2xx/5xx class — a malformed-length status falls through to the terminal reject path.
  • Untrusted IdP text is sanitized before it is shown in a prompt or an exception message: per-code-point stripping of control characters, ANSI escapes, CR/LF, and bidirectional / zero-width / Unicode-format characters (including supplementary-plane "tag" characters that arrive as surrogate pairs, and unpaired surrogates), so an attacker-influenced field cannot reorder, hide, or forge what a human reads — e.g. a right-to-left override that makes the displayed verification URL differ from the one the browser opens. A user code or verification URL that is non-empty on the wire but sanitizes to nothing is rejected (or, for verification_uri_complete, treated as absent) rather than shown as a blank line or handed to the browser launcher.
  • The token itself is validated before use. The token QuestDB will actually receive — the id token when the server encodes groups in the token, the access token otherwise — is rejected if it carries a control or non-ASCII character (outside 0x200x7e) before it is cached, placed in the Authorization: Bearer header, or used as the PG-wire password — so a tampered or hostile identity provider cannot smuggle a CR/LF into the request the client then sends to the trusted QuestDB server. Only the served kind is checked; a stray character in the unused token kind, which never reaches the wire, no longer aborts an otherwise usable grant. (The JsonLexer change below decodes JSON escapes, which is what turns a \r/\n in a token into a real byte rather than two literal characters.)
  • URLs are validated up front. Endpoint.parse rejects control characters, whitespace and display-unsafe code points anywhere in the url (so a tampered endpoint cannot inject a CR/LF into the request line or a bidi char into a log line), rejects bracketed IPv6 literals rather than mis-parsing them, rejects userinfo (user@host) — which the HTTP layer would otherwise try to connect to literally — terminates the authority at the first /, ? or # so a query or fragment is never folded into the host, and range-checks the port to 1..65535.
  • Bounded against a hostile or stalled server. Response reads are capped by a 4 MiB byte limit and a wall-clock deadline that bounds the whole read — covering both a chunked response whose chunk-size line is dribbled a byte at a time and a Content-Length body dribbled through stalled TLS records, either of which previously could keep a single read running well past the deadline. After such a bounded-read abort the half-read poll connection is dropped so the next poll reconnects on a clean socket, rather than the loop spinning on the stalled response's leftover bytes until the device code expires. The device-code lifetime, the poll interval, and the token TTL are all clamped (defaults applied for absent/zero values, hard caps for absurd ones). A 429 with no OAuth error is treated as a transient back-off (a 429 that also carries a terminal error such as access_denied still aborts on the error); a transient transport failure or 5xx during polling keeps polling until the device-code deadline rather than failing the sign-in (RFC 8628; matches the Python client), while a definitive OAuth error or a terminal 4xx aborts immediately. The trade-off is that a persistently flaky network is no longer cut short by a separate error budget — it polls to the device-code deadline.

Supporting changes

  • JsonLexer now resolves JSON string escape sequences (\", \\, \/, \b \f \n \r \t, \uXXXX; lenient on malformed input), so string values arrive fully decoded. This also reaches the existing ILP error-response parser, which now sees decoded message/code/line/errorId fields.
  • Response.recv(int timeout) (both the Content-Length and chunked implementations) bounds the whole read to the timeout in total, not per socket read, so a server that dribbles the body — the chunk-size line of a chunked response, or Content-Length bytes behind stalled TLS records — cannot keep a single read running past the caller's deadline. A non-positive timeout keeps the legacy unbounded behaviour. The existing ILP flush reads go through the no-arg recv(), which passes the configured client timeout (positive by default), so they now bound the whole body read to that timeout instead of re-arming it per socket read: a response that completes within the timeout is unaffected, while one that legitimately dribbles a single fragment for longer than the timeout — previously tolerated as long as each socket read made progress — now aborts.
  • AbstractLineHttpSender plumbs the token provider through with a deferred, retriable per-request pull (a throwing or blank-returning provider leaves the request token-pending for the next row instead of corrupting the half-built request), so the very first send already carries a provider-sourced token. Its error rendering now routes every untrusted server-supplied string through putAsPrintable — the decoded JSON error body, the [http-status=…] field, and the line-protocol-version detection probe body — escaping control and Unicode format characters (bidi overrides, zero-width joiners, the BOM), so a hostile or proxied endpoint cannot reorder, hide, or forge the text shown in a LineSenderException (or spliced into a log line or terminal).
  • QwpWebSocketSender sources its auth header from the token provider too, re-querying it on every connect/reconnect so a rotating token keeps a long-lived WebSocket sender authenticated.

Tradeoffs and limitations

  • The origin pin behaves differently on the two construction paths. fromQuestDB(...) discovery trusts an endpoint read from the issuer's .well-known wherever the provider hosts it, so an off-origin provider (e.g. Google) signs in normally through a pinned issuer; only an endpoint the /settings response itself advertised is held to the issuer's origin and path. The explicit builder().issuer(...) pin is stricter — a plain sanity check that both supplied endpoints sit on the issuer origin — so an off-origin provider configured that way must have its issuer omitted, or its endpoints supplied to match. This matches the Python client.
  • A failed token pull is handled differently per transport: over HTTP it leaves the request token-pending and retries on the next row, but over WebSocket a pull that keeps failing past the sender's reconnect budget terminates the sender, like any persistent reconnect failure. A getToken() provider that fails only transiently recovers on HTTP; a long outage ends a WebSocket sender.
  • The co-location check requires the token and device-authorization endpoints to share an origin. A test that previously pointed the token endpoint at a dead second port to simulate an unreachable endpoint was reworked to drop a co-located connection instead (MockOidcServer.dropConnection()).
  • The plaintext-channel pin's firing path is exercised end to end by reaching the loopback mock through a short-form 127.x address that the loopback classifier deliberately rejects as non-loopback. That trick relies on the OS resolver expanding the short form (BSD inet_aton, on Linux/macOS), which Windows getaddrinfo does not do, so that one end-to-end test is skipped on Windows; the loopback classifier itself is covered cross-platform.

Tests & docs

OidcDeviceAuthTest (~107 cases) + MockOidcServer, BrowserLauncherTest, LineHttpSenderTokenProviderTest, WebSocketTokenProviderTest + TestWebSocketServer, SenderBuilderErrorApiTest, JsonLexerTest, LineHttpSenderErrorResponseTest, DisplaySafeTest, ChunkedResponseTest/ResponseTest, QwpQueryClientTokenProviderTest; runnable OidcDeviceFlowExample / OIDCAuthExample; and a README "OIDC Sign-In (Device Flow)" section. Coverage includes:

  • .well-known discovery via a pinned issuer; a discovery document that omits the device-authorization endpoint
  • the co-location check rejecting split-origin endpoints; the builder().issuer(...) origin pin rejecting off-origin endpoints; a /settings-advertised endpoint rejected when off the issuer origin; an endpoint discovered from the issuer's .well-known accepted even when off the issuer origin (the Google case)
  • issuer path scoping: endpoints under the issuer path accepted; a sibling realm, percent-encoded traversal, an encoded path separator (%2f), and a percent-encoded backslash (%5c) rejected
  • the plaintext-channel pin requiring an issuer pin for advertised endpoints over http (firing path skipped on Windows)
  • the audience parameter discovered from /settings and sent on the device and refresh requests
  • token-provider support over HTTP and WebSocket (initial upgrade and re-queried per reconnect; static token / username-password still work over WebSocket); rejected for TCP/UDP; mutual exclusion with other auth; null/empty provider token rejected; deferred build-time pull; no sender corruption when the provider throws after a flush; the QWP egress query client's token provider — header synthesis, per-resolve re-query, token validation, and mutual exclusion
  • a token with a control or non-ASCII character rejected rather than sent (the served kind validated); tokens never echoed in messages
  • bounded reads: a stalled body and an oversized body aborting on the deadline / 4 MiB cap; a chunked body read aborting when the chunk-size line is dribbled; a bounded-read abort dropping the dirty poll connection so the next poll reconnects and signs in
  • the poll model: a 429 and a transient 5xx/transport failure keep polling to the deadline; a terminal 4xx and an OAuth error fail fast (including a 429 that also carries a terminal error); slow_down growth and the 60 s interval clamp; device-code-lifetime and clock-skew clamps
  • Endpoint.parse rejecting a malformed url: userinfo (user@host), a bracketed IPv6 literal, an out-of-range port, and control/whitespace/display-unsafe characters
  • a malformed HTTP status code rejected on both the device-authorization and the token-poll path: a non-numeric status, and a short all-digit status (2, 5) that must not be read as a 2xx/5xx class
  • display sanitizing: bidi/zero-width, lone-surrogate and supplementary-plane format characters stripped from the challenge and OAuth error; a server's JSON error body, its HTTP status field, and the protocol-version probe body with such characters escaped, not rendered raw; a user code or verification URL that sanitizes to empty rejected, and a verification_uri_complete that sanitizes to empty treated as absent
  • JsonLexer escape decoding, including a \uXXXX escape split across two parse fragments, and the lenient/exotic escape arms
  • getToken() failing fast while another thread holds the lock in an interactive sign-in or a silent refresh; native-memory cleanup on the error/rejection construction paths

Tandem

OSS: questdb/questdb#7331
Ent: https://github.com/questdb/questdb-enterprise/pull/1090

@glasstiger glasstiger changed the title feat(core): OIDC device flow feat(core): OIDC sign-in via device flow (RFC 8628) Jun 17, 2026
glasstiger and others added 19 commits June 17, 2026 19:31
Two fixes for the OIDC device flow in the Java client.

M2 - Bidi / zero-width Unicode bypassed the display sanitizer.
sanitizeForDisplay and OidcAuthException.putSanitized filtered only on
Character.isISOControl, which covers C0/C1 and DEL but not the
bidirectional overrides (U+202A-202E), isolates (U+2066-2069), marks
(U+200E/200F), zero-width characters or the BOM (U+FEFF). Those fields
- user_code, verification_uri(_complete), error and error_description -
all come from the IdP/settings boundary and reach System.out and the
exception messages, so a hostile or MITM'd IdP could embed a
right-to-left override and spoof the verification URL a human reads and
then opens. The JSON lexer's \uXXXX decoding widens the vector, since an
escaped override decodes to the real character before display.

Both sanitizers now share OidcAuthException.isUnsafeForDisplay, which
also strips the Unicode format category (Cf) plus the explicit bidi/BOM
set. The predicate uses hex int literals rather than char escapes,
keeping the source strictly ASCII so the file carries none of the
characters it guards against.

M3 - httpTokenProvider forced a successful sign-in before build().
createLineSender eagerly rebuilt the pending request when a provider was
set, calling getToken() at build time. With the documented
.httpTokenProvider(auth::getTokenSilently), that threw unless the caller
had already signed in, so the natural "construct the sender, sign in,
then send" ordering was impossible.

The first token pull is now deferred off the build path to the first row
(table()). The provider is wired at build but not queried; the initial
request is stamped with a token when the first row starts, and the
pending flag is cleared only after the pull succeeds, so a
not-yet-signed-in provider that throws leaves the stamp pending for a
retry. The Sender.httpTokenProvider Javadoc now states the provider is
not called at build time.

Tests: new bidi/zero-width cases for the challenge fields and the oauth
error message (fed as JSON \uXXXX escapes so they exercise the
decode-then-display path), and a new LineHttpSenderTokenProviderTest
covering the deferred pull and a lazily signing-in provider. Each test
was confirmed to fail without its fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three robustness fixes in the OIDC device-flow parser.

m3 - A JSON null arrives from the lexer as the literal "null". The
token and device parsers used putValue, which stored it verbatim, so
"access_token": null became the 4-char token "null" and "error": null
was read as an OAuth error code "null". Merged putValue with
SettingsDiscoveryParser's null-guarding putNonNull into one shared
helper used by all three parsers, so a JSON null is treated as absent
everywhere.

m4 - Endpoint.parse did not range-check the port, so host:0, host:-1
and host:99999 parsed and flowed to the transport. Added a 1..65535
guard that rejects them with a clear message.

m5 - The token-response expires_in was not clamped, unlike the
device-auth value, so a TTL near Integer.MAX_VALUE cached the token for
~68 years. storeTokens now applies the same boundedSeconds clamp (the
default for a non-positive value, capped at MAX_EXPIRES_IN_SECONDS). The
server still enforces the real expiry; this only bounds how long the
client trusts its cached copy.

Tests: null access_token and null error are rejected/ignored,
out-of-range ports are rejected at build, and a clamped token expiry
forces a fresh sign-in (observed via a clock-skew margin set above the
clamp). Each test was confirmed to fail without its fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The post-flush reset() eagerly rebuilt the next request and pulled the
provider token via httpTokenProvider.getToken() after the current batch
had already been sent and accepted. If that pull threw (e.g.
OidcDeviceAuth::getTokenSilently when a silent refresh fails) it turned
an already-successful flush into a thrown exception and left the shared
Request half-built (contentStart == -1, no withContent()), so the next
row's data went into the header region - a malformed request, lost rows
and a permanently corrupted sender.

Route every request's token pull through the same deferred, retriable
path the initial request already used: newRequest() no longer pulls the
provider token (it marks the request token-pending and builds a valid
token-less request), and stampTokenIfPending() pulls it lazily when the
first row of a request starts. A failed pull leaves the flag set and the
sender untouched, so the next row re-runs the stamp and fully rebuilds
the request. Per-request token rotation is unchanged.

Rename isInitialTokenPending/stampInitialTokenIfPending to
isTokenPending/stampTokenIfPending since the deferral now covers every
request, and stamp the token in putRawMessage() too. Add a regression
test that fails at the first, successful flush without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
getToken() and getTokenSilently() were both synchronized on the instance
monitor, and getToken() holds it for the entire interactive device flow -
up to the device-code lifetime, clamped to one hour. A long-lived Sender
wired with httpTokenProvider(auth::getTokenSilently) therefore stalled on
the flush path for up to an hour whenever another thread ran an
interactive sign-in (e.g. a re-auth after the refresh token died). The
javadoc claimed the opposite ("safe on a request/flush path").

Replace the synchronized methods with a ReentrantLock. getToken() and
clearCache() still acquire it blocking, but getTokenSilently() now uses
tryLock() and fails fast with an OidcAuthException instead of waiting:
while a sign-in is in progress there is no token to serve anyway, so the
caller gets a prompt, retriable exception rather than a wedged flush. The
interactive flow still holds the lock for its whole duration and close()
still sets the volatile cancellation flag before acquiring the lock, so
the no-use-after-free guarantee is unchanged.

Correct the class and getTokenSilently() javadocs, and add a regression
test that fails (getTokenSilently blocks ~10s behind an in-flight
sign-in) without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
isUnsafeForDisplay() inspected one UTF-16 code unit at a time, so a
supplementary-plane (>= U+10000) format or control character - an
invisible U+E00xx "tag" char, for instance - arrived as a surrogate
pair whose halves are each neither a control nor category Cf and so
passed the filter unstripped. Because the JSON lexer reassembles such
😀-style escapes, a hostile or man-in-the-middled identity
provider could smuggle invisible/spoofing characters into a user_code,
a verification_uri, or an error_description and on into the terminal
prompt and exception messages.

Judge a Unicode code point instead: isUnsafeForDisplay() takes an int,
and both sanitizers (putSanitized for exception messages,
sanitizeForDisplay for the prompt) walk the text by code point with
Character.codePointAt/charCount, so Character.getType classifies a
supplementary char as one character. A legitimate astral character
(an emoji) is still preserved.

Make the assertNoUnsafeDisplayChars test helper code-point-aware too -
it shared the blind spot - and add a regression test that fails (the
U+E0001 tag char survives) without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pollOnce() checked for a token before the HTTP status and the OAuth
error field, so a response that carried a token alongside an error, or
under a non-2xx status, was cached as a valid grant. tryRefresh() had
the same flaw: it accepted the refreshed token on token presence alone.
Both contradict RFC 6749 - 5.1 makes a grant a 2xx response carrying a
token, and 5.2 says an error response must not be treated as a grant.

Handle the OAuth error first in pollOnce(), so a token smuggled
alongside an error never counts, and accept a token only when the
status is 2xx; a token under a non-2xx status goes to the transport-
error budget instead of being trusted. Guard tryRefresh() the same way:
cache the refreshed token only from a clean 2xx response with no error,
otherwise fall back to the interactive flow.

The happy path and the existing pending/slow_down/access_denied/empty-
body outcomes are unchanged. Add regression tests for a token alongside
an error, a token under a non-2xx status, and a refresh that smuggles a
token with an error - each fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
newRequest() passed the token from httpTokenProvider.getToken() straight
to authToken(), which does not null- or empty-check it. A provider that
returned null, "", or whitespace therefore produced a malformed
"Authorization: Bearer " header that the server only answered with a 401
far from the cause - no client-side error at all. The HttpTokenProvider
contract forbids such a return but nothing enforced it, and httpToken()
already rejects a blank token, so the provider path was the weaker spot.

Validate the pulled token with Chars.isBlank (as httpToken does) and
throw a clear LineSenderException instead. The check sits inside the
deferred pull, so a rejected token leaves the stamp pending and the next
row retries cleanly, just like a throwing provider does. OidcDeviceAuth
never returns a blank token, so this guards custom providers.

Add tests that a null, an empty, and a whitespace-only provider token is
rejected at first use - each fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLexer.getCharSequence rescanned every decoded value and name from
the start to look for a backslash, even though the parse loop already
detects one when it sets ignoreNext. Record that in a sawEscape flag
(carried across parse() fragments) and resolve escapes only when it is
set, so the common no-escape value returns the assembled sink without a
second pass.

OidcDeviceAuth.Endpoint.parse now rejects a host that contains control
characters or whitespace - a smuggled CR/LF would otherwise flow into
the outbound Host header.

Add the tests these paths lacked: a cross-fragment escape; the lexer's
lenient and exotic escape arms (surrogate pairs, \b/\f, unknown and
malformed escapes, lone surrogates); the version-probe settings parser
reading an escaped key through unescape; HTTP-token-provider rejection
for UDP and WebSocket (not just TCP); and the control-character host
cases above.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Port the issuer feature from py-questdb-client (PR #133) onto
OidcDeviceAuth, so the device flow keeps working against servers that
do not advertise their device-authorization endpoint, and so the
device code and refresh token are only sent where the caller pins.

The issuer plays three roles:

- Discovery fallback: when /settings omits the device (and/or token)
  endpoint, fromQuestDB(url, issuer) reads it from the issuer's
  .well-known/openid-configuration document. The discovery origin comes
  only from the out-of-band issuer (or an explicit discoveryUrl), never
  from a /settings-supplied value, so a tampered /settings cannot
  redirect discovery. Without a pin, discovery is refused.

- Plaintext-channel pin: a /settings response fetched over plaintext
  http to a non-loopback host (only reachable with
  allowInsecureTransport) cannot route credentials to its advertised
  endpoints without a pin.

- Endpoint-origin pin: validateEndpointOrigins, enforced in
  Builder.build() on every construction path, requires the token and
  device endpoints to share one origin (RFC 8628 co-location) and, when
  an issuer is set, to belong to it.

Config surface: Builder.issuer(...); new fromQuestDB overloads
(url, issuer), (url, issuer, allowInsecure), and a 5-arg master taking
issuer, discoveryUrl and a TLS config.

Tradeoffs:

- The co-location check makes the token and device endpoints share an
  origin. testPersistentTransportFailureDuringPollingAborts simulated
  an unreachable token endpoint with a dead second port; it now uses a
  new MockOidcServer.dropConnection() against a co-located path.

- The origin pin compares scheme/host/port and ignores the path, so an
  identity provider that hosts its endpoints on a different origin than
  its issuer must be configured without an issuer. This matches the
  Python client.

- allowInsecureTransport still relaxes the identity provider endpoints
  too (unchanged); the Python client always forces https/loopback for
  the IdP. Left as-is to avoid changing settled transport behavior.

Adds 7 tests and updates the README OIDC section.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Endpoint.parse now rejects control characters and whitespace anywhere
in the url before splitting it. The host was already checked, but the
path was not, so a tampered /settings or discovery document could carry
a CR/LF in an endpoint path that the JSON lexer decodes and postForm
writes verbatim onto the request line via .url(endpoint.path) - a
header-injection / request-smuggling vector that the origin pin (which
compares scheme/host/port only) does not catch. Validating the whole
url up front also keeps it safe to echo in the parse error messages.

fromQuestDB now derives the pin origin from a caller-supplied
discoveryUrl when no issuer was resolved. Previously a discoveryUrl pin
only took effect when discovery actually ran (an endpoint missing from
/settings); when /settings advertised both endpoints the discovery
branch was skipped and validateEndpointOrigins ran with a null issuer,
so a compromised server could advertise both endpoints at an attacker
origin and slip past the pin. The discoveryUrl pin now behaves like the
issuer pin on every construction path.

Adds regression tests for both: a CR/LF-injected advertised endpoint,
path and query cases in Endpoint.parse, and discoveryUrl-pin accept and
reject against on- and off-origin endpoints.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Endpoint.parse already rejected control characters and whitespace in
the url, which kept it safe to echo into the exception messages once it
passed validation. That scan did not catch bidi, zero-width or other
format characters (U+202E, U+200B, U+FEFF, the Cf category, and the
supplementary-plane tag characters), so a tampered /settings or
discovery endpoint url could still smuggle one into an OidcAuthException
message and reorder, hide or forge the log line it lands in.

The url scan now runs per code point and also rejects anything
isUnsafeForDisplay flags, so an OIDC url may carry no control,
whitespace or display-unsafe character. Every raw url echo in
Endpoint.parse, requireSecureTransport and fromQuestDB is therefore
safe on screen as well as on the wire, and the rejection message
sanitizes the url it reports.

Adds a regression test covering a right-to-left override, a zero-width
space, the BOM and a supplementary-plane tag character.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
isUnsafeForDisplay now treats an unpaired UTF-16 surrogate as unsafe,
so a lone surrogate half - which JsonLexer emits verbatim for a single
backslash-u-XXXX escape and which codePointAt surfaces as a SURROGATE
code point - is stripped from a user_code, verification_uri or error
string before it reaches a terminal or a log line. A valid high+low
pair is still reassembled by codePointAt and judged on its real
category, so a legitimate emoji survives. The method comment is
corrected too: codePointAt in the callers reassembles pairs, not the
lexer.

close() and the class Javadoc no longer claim an in-flight sign-in is
cancelled "promptly". The cancel flag is observed between polls (within
about 100ms) but a poll request already in flight is not interrupted,
so close() can take up to one HTTP request timeout to return - still
far short of the device-code lifetime. The docs now say so.

Adds tests: lone high and low surrogates are stripped from the device
challenge while an emoji survives; and the private isLoopbackHost
classifier (which gates the plaintext-channel MITM pin) is pinned for
localhost and the 127.0.0.0/8 block, and against non-loopback and
spoofing hosts such as 127.evil.com, localhost.evil.com, 127.1 and
127.0.0.256.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The poll loop now clamps the slow_down-inflated interval to the same
MAX_POLL_INTERVAL_SECONDS cap the initial interval already respects, so
repeated slow_down responses from the identity provider cannot grow the
wait without bound.

The device-authorization, token and well-known parsers now reset their
current field to FIELD_NONE after each value, matching
SettingsDiscoveryParser. The parsers are not currently confusable - in
well-formed JSON a name event always sets the field before the next
value, array elements arrive as EVT_ARRAY_VALUE, and nested values are
filtered by the depth check - so this is a defensive consistency fix
that removes a latent field-confusion foot-gun rather than a behavior
change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLexer.unescape no longer re-scans the value from the start to
re-find the backslash the lexer already flagged via hasEscape; it walks
the value once, copying plain characters and resolving escapes in place.
That drops the now-dead "no escapes" early return and the separate
prefix copy, so an escaped value is traversed about twice (decode then
unescape) instead of three times. parseHex4 looks the hex digit up in
the shared Numbers.hexNumbers table instead of Character.digit, keeping
the same -1-on-non-hex contract. All of this is on the cold
error/discovery/auth parse path, never on ingestion.

Reorders pollForToken ahead of pollOnce so the private methods stay in
alphabetical order; no behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@glasstiger glasstiger added enhancement New feature or request security labels Jun 19, 2026
glasstiger and others added 7 commits June 19, 2026 15:42
The 4 MiB response-body cap (MAX_RESPONSE_BODY_BYTES) that bounds the
OIDC device flow against a hostile or MITM'd server streaming an
endless body had no test coverage on the parseBody path.

Add an oversizedJson() mode to MockOidcServer that streams a chunked,
mostly-whitespace body past the cap, and a test that drives discovery
against it and asserts the bounded read aborts with the size-limit
error - which also confirms the token-bearing body never reaches the
message. The body is whitespace so the lexer keeps consuming until the
byte cap trips, instead of hitting its per-value length limit first.

Verified both ways: the test passes with the 4 MiB cap and fails when
the cap is disabled, where the full body is read and parsing fails with
"Unterminated object" instead.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three small fixes to the OIDC device authorization flow, all in
OidcDeviceAuth:

- runDeviceFlow now rejects a non-2xx device authorization response.
  Previously it trusted any body that carried device_code/user_code/
  verification_uri and no OAuth error, so a non-2xx response would
  prompt the user and start polling. It now applies the same 2xx gate
  pollOnce and tryRefresh already use before trusting a body.

- pollForToken checks the device-code deadline at the top of the loop
  and never sleeps past it, so an expiry that elapses during a sleep
  times out promptly instead of after one more wasted poll and up to a
  full extra poll interval.

- tryRefresh drops an unreachable branch that rethrew on an OAuth
  error. postForm only throws on a parse failure here, and a real
  OAuth error arrives in tokenParser.error (handled by the
  hasRequiredToken check), so the branch was dead. No behaviour change.

Add testNonSuccessDeviceAuthorizationResponseRejected covering the new
2xx gate; it fails without the check (the 403 is accepted, the user is
prompted, and polling fails later instead).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A discoveryUrl pins the identity provider, yet fromQuestDB adopted the
issuer the discovery document declared about itself and validated the
token and device endpoints against that, never against the pinned
discoveryUrl origin. A document served at the pinned url could therefore
name an attacker issuer, co-locate both endpoints under it, and route the
device code and the long-lived refresh token there while the co-location
and issuer checks passed trivially - so the discoveryUrl pin did not in
fact pin the provider, contradicting its documented guarantee.

Reject a document whose own issuer sits on a different origin than the
pinned discoveryUrl (RFC 8414 section 3.3), and derive the endpoint pin
from the discoveryUrl origin rather than the document's self-declared
issuer. An identity provider that serves its discovery document on a
different origin than its endpoints must instead be configured with
explicit endpoints via OidcDeviceAuth.builder().

The issuer-pinned path is unchanged: it already binds the endpoints to
the caller-supplied issuer. testFromQuestDbDiscoveryUrlPinRejectsForeign
IssuerInDocument covers the new rejection and fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
readResponse copied the response status code into a sink that later
appears in OidcAuthException messages. A well-formed status code is bare
digits, but the HTTP header parser keeps the status-line token verbatim
apart from SP/CR/LF, so a hostile or MITM'd identity provider could
splice ESC or other control bytes into it - smuggling ANSI sequences
into a log or terminal, or fabricating a leading digit that passes the
2xx success gate.

Validate the status code as it is captured: on any non-digit byte, drain
the body so the keep-alive connection stays usable, then reject the
response with a message that echoes none of its bytes. A clean status is
copied digit by digit, so every later [httpStatus=...] echo is bare digits.

testNonNumericStatusCodeRejected drives a status code with a spliced ANSI
reset and asserts the rejection; it fails without the fix. The new
MockOidcServer.raw() helper writes a verbatim response so a test can craft
a malformed status line.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
JsonLexer now resolves JSON string escapes, so the message and errorId
fields a QuestDB endpoint returns in a JSON error body arrive at the
sender fully decoded. The JSON error parser put them into the
LineSenderException verbatim, so a hostile or proxied endpoint could
inject real control characters or ANSI escapes that forge a log line or
rewrite a terminal when the exception text is printed.

Render the server-supplied message, id, code and line through
putAsPrintable - the same escaping the column-name errors in this class
already use - so a decoded control byte arrives escaped.

LineHttpSenderErrorResponseTest flushes against a server returning a
chunked JSON error whose message and errorId carry an ESC and a newline,
and asserts they reach the exception escaped; it fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The plaintext-channel pin refuses /settings-supplied OIDC endpoints
fetched over a non-loopback http channel unless the identity provider is
pinned out of band, so a tampered response cannot route the device code
and refresh token to an attacker. Only its loopback exemption was
exercised end to end, because the test mock binds to 127.0.0.1; the
firing branch had no integration coverage.

Reach the loopback mock through "127.1": the OS resolver expands the
short form to 127.0.0.1 so the mock answers, but the loopback classifier
deliberately rejects the short form, so the server host is non-loopback
and the pin fires. Assert that a plaintext /settings advertising both
endpoints without a pin is refused, and that pinning the issuer over the
same channel is accepted - proving the pin, not an unrelated rejection,
is the gate. The test fails if the firing check is removed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Skip testPlaintextSettingsWithAdvertisedEndpointsRequiresPin on
Windows: it reaches the loopback mock through the "127.1" short-form
address, which Linux/macOS getaddrinfo expands to 127.0.0.1 but Windows
getaddrinfo rejects, so discovery cannot connect there. No host string
is both reachable at the loopback mock and classified non-loopback on
Windows, so the end-to-end firing path cannot run there; the classifier
stays covered cross-platform by
testLoopbackHostClassifierRejectsNonLoopbackAndSpoofing.

Wrap every OidcDeviceAuth construction in try-with-resources so the
native JSON lexer and HTTP clients are always released, including the
rejection paths where build()/fromQuestDB() throws.

Also replace manual StringBuilder fills with String.repeat, switch
index loops to enhanced-for, and collapse the split-value test helper
to a single lexer cache-limit parameter.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
glasstiger and others added 20 commits June 23, 2026 16:51
pollOnce now maps an HTTP 429 to POLL_SLOW_DOWN before the error and
transport-error classification, so a rate-limited identity provider
grows the poll interval (capped at 60s) and keeps polling, like
slow_down, instead of charging the MAX_CONSECUTIVE_POLL_ERRORS budget
and failing fast. Matches the Python client.

Also documents MAX_POLL_INTERVAL_SECONDS (reduced to 60s in the
preceding commit), which now also bounds the slow_down/429 growth.

New tests: a 429 keeps polling to the device-code deadline instead of
aborting with "repeated unexpected responses", and the IdP-reported
interval is clamped to 60s. OidcDeviceAuthTest (98) and
BrowserLauncherTest (4) pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Token polling no longer aborts after a fixed number of consecutive
transport errors; MAX_CONSECUTIVE_POLL_ERRORS and its counter are gone.
Matching the Python client, the poll loop now keeps polling to the
device-code deadline on any transient failure - a network blip, a
non-JSON or garbled body, a 5xx, or a 429 - and fails fast only on a
terminal response: a 4xx other than 429, a well-formed OAuth error, a
malformed status line, or a 2xx without a usable token.

pollOnce classifies a non-2xx with no OAuth error via the new
isHttpStatusTransient() / isHttpStatusTerminal4xx() helpers (5xx keeps
polling, 4xx is terminal). pollForToken retries transient responses and
rethrows terminal ones; a garbled body is transient unless its status
is a terminal 4xx.

Tradeoff: a persistently unreachable or broken token endpoint now polls
until the device code expires (clamped) instead of failing fast.

Tests: the abort test now asserts polling continues to the deadline;
new 5xx-keeps-polling and terminal-4xx-fails-fast tests; the
malformed-body tests use a terminal 4xx so the parse rejection
surfaces. OidcDeviceAuthTest (100) and BrowserLauncherTest (4) pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
httpTokenProvider was an HTTP-only feature: the WebSocket build path
rejected it, and only a fixed httpToken or username/password worked,
captured once as a static Authorization header at connect time. A
long-lived WebSocket sender could therefore not follow a rotating
token (e.g. an OIDC device-flow token).

The WebSocket sender now holds a Supplier<String> for the upgrade
Authorization header and evaluates it on every handshake - the initial
connect and each reconnect - so a refreshing provider
(auth::getTokenSilently) presents a freshly pulled token each time the
link is (re)established. An already-established socket is not
re-authenticated mid-stream; the provider is queried at handshake
time, not per data frame. The fixed-token and username/password paths
become constant suppliers, unchanged in behavior.

QwpWebSocketSender evaluates the supplier inside buildAndConnect's
per-endpoint try, so a provider that throws (a failed silent refresh)
is handled as a connect failure for that attempt and retried within
the reconnect budget rather than escaping the I/O thread. Initial
connect still fails loudly.

Tests: TestWebSocketServer captures the upgrade Authorization header
(pollAuthorizationHeader); new WebSocketTokenProviderTest covers the
token on the initial upgrade, re-query on reconnect, and the fixed
token / username-password regression paths; SenderBuilderErrorApiTest
now asserts TCP/UDP reject the provider while WebSocket accepts it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The OIDC device-flow client's clock skew - the margin by which a
cached token is treated as expired early, to absorb clock drift and
request latency - was a configurable builder option defaulting to 30s
with a flat subtraction. This matches it to the Python client
(questdb.auth): a fixed 30s, no longer configurable, and capped at
half the token lifetime.

The cap (effectiveSkewMillis = min(30s, tokenTtlMillis / 2)) stops a
short-lived token from being reported expired the instant it is
issued - a flat 30s skew marks any sub-60s token born-expired. The
builder's clockSkewSeconds(int) option and field are removed; the
per-instance clockSkewMillis becomes the CLOCK_SKEW_MILLIS constant,
and a new tokenTtlMillis tracks the cached token's clamped lifetime.

Removing the configurable skew also removes the lever several tests
used to force a cached token to look expired (a short TTL was
born-expired under the old flat 30s skew). Those tests now force
expiry deterministically through a reflection helper
(expireCachedToken) that zeroes expiresAtMillis without dropping the
refresh token - no flaky sleeps. testClockSkewSecondsForcesEarlyRefresh
is rewritten as testClockSkewCappedAtHalfTokenLifetime (proven to fail
without the cap), and the expires_in clamp test now asserts the
clamped expiry directly via reflection instead of abusing a large skew.

This is an API change: the public clockSkewSeconds(...) builder method
is removed. Nothing in the client itself calls it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two hardening fixes from a review of the OIDC device-flow change.

Bound the content-length HTTP response read.
AbstractResponse.recv(int) re-armed the full timeout on every
recvOrDie call, so a server returning repeated zero-byte reads - an
incomplete or empty TLS record over a hostile or MITM'd link, where
JavaTlsClientSocket.recv returns 0 on BUFFER_UNDERFLOW without a
disconnect - kept a single recv() running without bound. That defeated
the wall-clock deadline OidcDeviceAuth.parseBody relies on and could
hang the flush or sign-in thread, and block close(), indefinitely.
recv(int) now shares one deadline across reads, matching the chunked
reader; a non-positive timeout keeps the legacy unbounded behaviour.

Escape display-unsafe characters beyond the BMP.
Utf16Sink.putAsPrintable scanned per UTF-16 unit, so a
supplementary-plane format char (a U+E00xx tag char, which arrives as a
surrogate pair) and a lone surrogate passed through raw - able to
reorder, hide, or forge text in a terminal or log. It now scans whole
code points and escapes control, format, and surrogate code points,
matching OidcAuthException.isUnsafeForDisplay; a normal emoji still
prints verbatim, and BMP output is unchanged.

Both fixes add a regression test that fails without the fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
fetchJson allocated the discovery HttpClient and then the JsonLexer
outside the try, so when the lexer's native malloc threw (native
OOM), control had not yet entered the try, the finally never ran, and
the already-allocated client's native buffers leaked.

Allocate the lexer inside the try, null-initialized, so the finally
frees the client on that path too. The client-factory call stays
outside the try, so its exception behaviour is unchanged and nothing
leaks there (the lexer is not allocated yet). Mirrors the
constructor's own "allocate the native lexer last" reasoning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two cleanups in OidcDeviceAuth from the review, no behaviour change.

Sort the private static helper methods alphabetically, per the
member-ordering convention. Pure reorder - no content changed.

Pre-encode the grant_type constants. pollOnce url-encoded the
device-code grant type on every poll, and tryRefresh the refresh
grant type on every refresh; both are constants, so encode them once
at class load instead. The wire output is byte-identical.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The token-poll loop reused a cached keep-alive connection after a
bounded-read abort. parseBody throws HttpClientException on its
wall-clock deadline or the 4 MiB response-body cap, leaving the response
half-read with unconsumed bytes in the socket; readResponse drained the
body only on a JsonException, so that HttpClientException escaped
undrained and pollForToken swallowed it and kept polling the same dirty
connection until the device code expired - defeating the response cap's
own "cannot wedge the thread" guarantee.

postForm now disconnects the client on any HttpClientException (the
parseBody abort, a header-read timeout, or a send failure) before
rethrowing, so the next poll or refresh reconnects with a clean socket.
This mirrors the disconnect-on-failure handling in
AbstractLineHttpSender.flush0.

A second path reached the same dirty connection: a malformed body larger
than 4 MiB throws JsonException (not the cap's HttpClientException), and
discardBody bailed at the cap leaving unconsumed bytes, yet pollForToken
treats that path as transient too. discardBody now reports whether it
fully drained, and readResponse disconnects when it could not. The
common case - a small garbled body that drains fully - keeps the
connection, so the transient-retry behavior is unchanged.

testPollAbortDropsDirtyConnectionAndReconnects stalls the first poll and
succeeds on the second over a 10s device-code lifetime: it fails without
the fix (getToken throws "device code expired" at ~10s) and passes with
it (~2s). The full OidcDeviceAuthTest suite stays green (101 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address the minor findings from the device-flow review, all in the OIDC
client.

Endpoint.parse now terminates the authority at the first '/', '?' or '#'
(so a query or fragment on a path-less url is no longer folded into the
host) and rejects userinfo (user@host), which the HTTP layer would
otherwise try to connect to literally.

isEndpointUnderIssuerPath rejects a percent-encoded path separator (%2f,
%5c, or a double-encoded %25 form). decodePathSegments resolves it before
the segment comparison, so it would split one segment in two and could
let .../realms/acme%2fevil/token slip the issuer-path scope. A real OIDC
endpoint path never encodes a separator. This is defense in depth: the
origin and prefix checks already confine credentials to the pinned
subtree.

pollOnce handles a terminal OAuth error before the 429 rate-limit
backoff, so a 429 that also carries access_denied (or another terminal
error) aborts immediately instead of polling to the device-code deadline.

runDeviceFlow sanitizes the user code and verification URL before the
completeness check and requires them non-empty after sanitizing, and
treats a verification_uri_complete that sanitizes to empty as absent, so
an all-control field is not shown as a blank code/URL or handed to the
browser launcher as "".

Also: correct the class javadoc (the device-code lifetime caps the lock
hold at 30 minutes, not an hour - an hour is the token-cache cap), rename
the JsonLexer sawEscape flag to hasEscape, order Utf16Sink's static
helper before its instance helper and restore the file's trailing
newline, and document on Sender.httpTokenProvider that a sustained token
outage terminates a WebSocket sender past its reconnect budget while the
HTTP path retries on the next row.

Adds four tests (encoded-slash rejection, 429-with-terminal-error
fail-fast, and the two empty-after-sanitize cases) plus userinfo cases in
testEndpointParseRejectsMalformedUrls; each fails without its fix. The
full OidcDeviceAuthTest suite stays green (105 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Sender's HTTP and WebSocket auth paths pulled an HttpTokenProvider
token and checked only that it was non-blank before splicing it into an
Authorization: Bearer header. A token carrying a CR/LF could inject into
the request line, and a non-ASCII byte was silently truncated to one byte
by the ASCII header writer, yielding a corrupt credential the server only
answers with 401. OidcDeviceAuth already guards its own tokens this way,
so a discovered OIDC token was safe, but a custom provider was not.

Add HttpTokenProvider.validateToken(), which both transports now call:
it rejects a null/empty/blank token and any character outside 0x20-0x7e,
the same range OidcDeviceAuth.validateTokenChars enforces. The token is
never placed in the exception message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The httpTokenProvider builder Javadoc claimed flatly "The provider is
not called at build time." That holds over HTTP, where the first pull is
deferred to the first row, but not over WebSocket: build() runs the
initial connection handshake, which queries the provider once. A user
wiring a lazily-signing-in provider (auth::getTokenSilently) over
WebSocket before the interactive sign-in completes would hit a build()
failure, contradicting the doc.

Scope the build-time statement to HTTP and spell out that over WebSocket
a token must already be obtainable when build() runs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the OIDC device-flow review, four minor items:

- isEndpointUnderIssuerPath now scans for an encoded path separator at
  every decode level, not just the raw string, and rejects a literal
  backslash. A split encoding such as %2%66 (which resolves to %2f then
  '/') and a backslash folded to '/' by decodePathSegments previously
  passed the single-pass pre-scan and could make a deeper endpoint
  masquerade as being under the pinned issuer path while a different raw
  path travelled on the wire. The origin pin already kept credentials on
  the trusted host, so this closes a defense-in-depth gap, not an exploit.

- close()'s Javadoc no longer claims a hard one-HTTP-timeout bound: a
  DeviceCodePrompt that blocks in promptUser (the default browser launch)
  holds the lock while it runs, so a racing close() waits it out too.

- testOutOfRangePollIntervalAndExpiryAreClamped now asserts the exact
  clamped values (60s interval, 1800s device-code lifetime) instead of
  bounds 5x and 2x looser than the real maxima.

- Move the JsonLexer hasEscape field to its alphabetical position.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
throwOnHttpErrorResponse echoed the raw HTTP error body into the
LineSenderException message on three paths - the 401/403 auth body, a
non-JSON body, and the toException JSON-parse-failure fallback - while
only the parsed-JSON path went through putAsPrintable. A hostile,
MITM'd or proxied endpoint could splice ANSI escapes, CR/LF or bidi
overrides into a logged or printed exception (terminal hijack, log
forging, visual spoofing).

Route all three paths through LineSenderException.putAsPrintable so the
server body is escaped just like the parsed-JSON fields already are.

Add LineHttpSenderErrorResponseTest cases for the auth, non-JSON and
malformed-JSON paths, each asserting a smuggled ESC or bidi override
surfaces escaped, never raw.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the two divergent display-safety predicates
(Utf16Sink.isDisplaySafe and OidcAuthException.isUnsafeForDisplay)
with one shared DisplaySafe classifier so both judge identically.
DisplaySafe adds a printable-ASCII fast path that skips the
Character.getType table lookup for the common case, and keeps the
explicit bidi/BOM set as defense on a non-conformant JDK.

Fill review-flagged test gaps:
- ResponseTest/ChunkedResponseTest: the no-arg recv() bounded branch
  under a positive default timeout (the ILP flush read path).
- OidcDeviceAuthTest: a non-ASCII (> 0x7e) token rejected, and a
  token response with expires_in <= 0 falling back to the default TTL.
- BrowserLauncherTest: assert the no-op is the intended path (URL
  rejection or the kill-switch), not an incidental headless no-op.
- LineHttpSenderTokenProviderTest and WebSocketTokenProviderTest now
  run under assertMemoryLeak, proving the senders free native buffers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Scope the issuer-origin pin to only the endpoints the untrusted
/settings response advertised. Endpoints discovered from the issuer's
own .well-known are fetched out-of-band from the pinned origin and are
authoritative for wherever the issuer hosts them, so they are no longer
forced onto the issuer's origin. This lets an identity provider that
serves its endpoints off the issuer origin - Google, Azure AD - sign in
through discovery with an issuer pin. The co-location check (token and
device share one origin) still applies to every endpoint, and a
/settings-advertised endpoint is still origin- and path-pinned.

Remove the DiscoveryOptions.discoveryUrl pin: an issuer already drives
.well-known discovery, so it was redundant. With it go its self-issuer
check, its origin-derivation, the discoverFromIdp parameter, and the
now-unused issuer field in the discovery-document parser.

Add a test for the off-origin discovered-endpoint case and drop the
four discoveryUrl tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The JDK 8 CI job failed to compile two Java 9+ constructs that a local
JDK 25 build silently accepts (newer rt.jar has the APIs, and -source
without --release does not enforce the Java 8 platform):

- Utf16Sink declared a private interface method, putUnicodeEscape, a
  Java 9 feature. Move it to DisplaySafe as a package-private static
  helper, putUnicodeEscape(Utf16Sink, int); the two putAsPrintable
  overloads now call it. DisplaySafe already owns the display-escaping
  policy and shares the package, so the helper stays out of the public
  API. The escape logic is moved verbatim.

- OidcDeviceAuth.urlEncode called URLEncoder.encode(String, Charset),
  a Java 10 overload. Switch to the Java 8 encode(String, String) form
  and catch the (unreachable for UTF-8) UnsupportedEncodingException.

Verified: the three changed files compile under --release 8, the full
core module compiles, and LineSenderExceptionTest,
LineHttpSenderErrorResponseTest, JsonLexerTest and OidcDeviceAuthTest
stay green (156 tests).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The PR sanitized the JSON error body a failed flush renders into a
LineSenderException, but two adjacent untrusted-text paths in the same
file still reached the message raw, so a hostile or proxied endpoint
could splice ANSI/control/bidi bytes into a log line or terminal:

- The "[http-status=...]" field was rendered with put(), not
  putAsPrintable(). The HTTP header parser copies the status-line token
  verbatim between the two spaces, and a non-3-char token bypasses the
  numeric status checks to reach the generic error path, so a crafted
  status line could carry an ESC. Route all four status renders through
  putAsPrintable(); the generic path is the exploitable one, the others
  are defensive. A normal 3-digit status renders unchanged.

- Protocol-version detection concatenated the raw probe response body
  via String +. Build it through putAsPrintable() instead.

Tests:
- LineHttpSenderErrorResponseTest: a malformed status line carrying an
  ESC, and a protocol-probe error body with ESC/bidi/newline; both fail
  without the fix and pass with it.
- DisplaySafeTest: direct coverage for the shared classifier (controls,
  format/bidi/BOM, supplementary tag chars and lone surrogates unsafe;
  printable ASCII and emoji safe) - it previously had none.
- SenderBuilderErrorApiTest: the null-provider guard and the
  provider-then-username/password exclusion, neither tested before.
- OidcDeviceAuthTest: a percent-encoded backslash (%5c/%5C) in the
  issuer-path scope check.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Three minor follow-ups to the device-flow review:

- storeTokens validated both the access_token and the id_token for
  control/non-ASCII chars, but getToken() only ever serves (and sends)
  one of them. A stray char in the unused kind - which never reaches a
  header or a PG-wire password - aborted an otherwise-usable grant.
  Validate only the served kind (groupsInToken ? idToken : accessToken);
  the served kind is still strictly checked.

- The HTTP status classifiers keyed off the first digit with only a
  length > 0 guard, so a malformed short all-digit status like "2" or
  "5" could be read as a 2xx/5xx. readResponse already guarantees bare
  digits; require exactly 3 of them, so a malformed-length status falls
  through to the terminal reject path instead of being trusted.

- getTokenSilently's javadoc claimed it "never blocks". It does not wait
  on interactive input or behind another thread's sign-in, but it can
  make one synchronous refresh round-trip bounded by httpTimeoutMillis.
  Reword to say so, matching the HttpTokenProvider contract.

Tests (both proven to fail without their fix):
- a control char in the unused id_token (groupsInToken=false) no longer
  aborts the grant;
- a 1-digit "2" status from the token endpoint is rejected, not accepted
  as success, and the smuggled token never surfaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
OidcDeviceAuthTest called String.repeat(int) at two sites, a Java 11
API. The client has a Java 8 floor (the JDK 8 CI job compiles, tests,
and releases the artifact), so the JDK 8 build failed to compile while
the JDK 25 smoke job passed and hid the break.

Replace both calls with the existing TestUtils.repeat(CharSequence,
int) Java 8 stand-in, which is already imported in the test and used by
other client tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@glasstiger

Copy link
Copy Markdown
Contributor Author

Code review — OIDC device flow

Reviewed at e1f3d53 (base 84da57d1), 32 files, +8129/−121. This is a deep pass over the security-sensitive surface (token/refresh-token/device-code handling, untrusted IdP + /settings input, issuer origin/path pinning, display sanitizing, bounded reads, the lock/cancellation protocol) plus every existing out-of-diff caller of the shared infrastructure this PR changes. Findings were verified against source, several both-ways on the JVM.

Verdict: approve. No Critical or Moderate issues.

The three highest-blast-radius changes all touch existing ILP code paths and were each verified clean:

  • Utf16Sink.putAsPrintableDisplaySafe (column-name errors, ConfStringParser, ILP error rendering): no existing test pinned the old \u00XX low-byte format or a raw bidi char; legitimate non-ASCII letters (é, ā, CJK, emoji) stay unescaped. C1 controls 0x80–0x9F were emitted raw before and are now correctly escaped — a strict improvement.
  • JsonLexer escape decoding reaching the ILP JsonErrorParser: the one pre-existing assertion that would have broken (a\"bc) was correctly updated, the now-redundant interop-harness unescape was removed, and the version-probe key path is covered. No parent-repo ILP test asserts escape-containing text.
  • Response.recv(int) whole-call bound on the flush: the bound is per fragment (re-armed each recv() call, default 600s) and a stall becomes a retryable HttpClientException caught by flush0 — intended hardening, not a regression.

Test coverage is genuinely strong (~107 cases, a realistic socket-based MockOidcServer, both-ways regression proofs, direct native-tag leak assertions).

Minor (polish, non-blocking)

  1. Per-poll re-encoding of invariant fieldspollOnce (OidcDeviceAuth.java:~1047) and tryRefresh re-run URLEncoder.encode on clientId/device_code/scope on every poll (~5s for the whole sign-in) and every silent refresh, though all are invariant for the flow. The class already precomputes GRANT_TYPE_*_ENCODED at class load — extend that pattern (precompute clientIdEncoded/scopeEncoded, encode device_code once in pollForToken). The two most-worth-doing items are this and chore(conf): update JDK version from 17 to 11 and change tag to 1.0.0 #2.

  2. Native-leak regression guard is blind to the real risktestMalformedEndpointDoesNotLeakNativeMemory feeds not-a-url, which throws at Endpoint.parse before the instance is constructed, so it passes even if build() were reordered to construct-then-validate (the actual leak path). Production ordering is correct — verified both ways on the JVM: a deliberate reorder leaks exactly 1024 bytes, the current order leaks nothing — but the guard doesn't protect it. Use a parseable but rejected endpoint instead (e.g. http://idp.example/device with allowInsecureTransport(false)).

  3. Status-validation coverage asymmetry — the short all-digit status trap ("2"/"5" must not read as 2xx/5xx) is tested on the token-poll path but not the device-authorization path, though both share the classifier. A raw("HTTP/1.1 2 OK") device-step case closes it.

  4. OidcAuthException static-method ordering (:59 vs :76) — oauthError precedes isUnsafeForDisplay; i should sort before o.

  5. Builder.httpTimeoutMillis(int) is unvalidated — every other timing input is clamped, but a non-positive value yields an already-expired read deadline and passes ≤0 to recv(int) (treated as unbounded). Reachable only by explicit misconfiguration; a > 0 guard would be consistent.

  6. Javadoc nuanceSender.httpTokenProvider's "over WebSocket the initial handshake pulls [the token] at build()" is inaccurate for the opt-in ASYNC initial-connect mode, where the first pull is deferred to the I/O thread and a provider failure surfaces via the error inbox. (Default is OFF.)

  7. groupsInToken lacks the is/has prefix — public builder API mapping to acl.oidc.groups.encoded.in.token; renaming has API cost, so low priority.

Checked and cleared (so they don't get re-raised)

  • WebSocket: a throwing token provider during reconnect is correctly caught — buildAndConnect has a broad catch (Exception e) right after catch (HttpClientException e) that catches the supplier's OidcAuthException/LineSenderException, closes the client, records a transport error, and continues to the next endpoint within the reconnect budget. The code comment is accurate.
  • Concurrency: single ReentrantLock; getTokenSilently uses tryLock and fails fast; close() sets the closed volatile before acquiring the lock and never frees the native lexer while a flow holds it — no use-after-free. The documented shared-instance case (a getToken() thread + a Sender flush/reconnect thread) is safe.
  • Resource management: no native leak on any error path; jsonLexer is allocated last in the ctor and build() validates before constructing; fetchJson frees the transient lexer + client in finally.
  • ILP flush slow-body abort is the intended documented per-fragment bound, not a regression for healthy responses.

Summary

7 confirmed Minor findings, all in-diff; 0 out-of-diff defects (the cross-context callers were walked and verified, not skipped). None blocks merge — #1 and #2 are the two worth addressing before merge. The only change since the prior review pass is the String.repeatTestUtils.repeat Java-8 test fix in e1f3d53, confirmed correct with no other String.repeat remaining.

🤖 Generated with Claude Code

glasstiger and others added 4 commits June 25, 2026 18:34
Add QwpQueryClient.withBearerTokenProvider(HttpTokenProvider): the
egress query client now accepts an on-demand token provider, the same
HttpTokenProvider the ingress Sender uses, so OidcDeviceAuth::getToken
plugs into both. runUpgradeWithTimeout resolves the Authorization
header at every WebSocket upgrade, so the initial connect and each
failover reconnect present a freshly refreshed token; a throwing
provider fails that connection attempt, matching the ingress sender.
The setter is mutually exclusive with withBearerToken/withBasicAuth
and validates each token before it reaches the header.

Rename the OidcDeviceAuth methods so the safe, non-blocking accessor
takes the plain name and the interactive step is explicit:
  getToken() is now signIn() (interactive sign-in, may block);
  getTokenSilently() is now getToken() (cached/refresh, never prompts).
getAuthorizationHeaderValue() keeps its blocking behavior by calling
signIn(). Update the tests, examples, and doc references across the
client. The Python client is left unchanged for now.

Print the egress timestamp column as a formatted Instant in
OIDCAuthExample instead of the raw microsecond long.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-ups from the device-flow review:

- README: the OIDC quick-start still called the pre-rename API (getToken()
  as the interactive sign-in, auth::getTokenSilently). Use signIn() for the
  one-time sign-in and auth::getToken for the token provider, matching the
  renamed methods.

- OidcDeviceAuth: pre-encode the invariant form params (client_id, scope,
  audience) once in the constructor and the device_code once in pollForToken,
  instead of re-running URLEncoder on every poll (~once/5s through a sign-in)
  and every silent refresh. Mirrors the existing GRANT_TYPE_*_ENCODED
  constants; the wire output is unchanged. Add appendEncodedParam for the
  pre-encoded values and keep appendParam for the dynamic refresh_token.

- Reorder getToken() before signIn() and rename the lagging
  testGetTokenSilently* / testConcurrentGetToken* methods to match the
  renamed API.

- QwpQueryClientTokenProviderTest: cover the real connect path, not just the
  test hook - a throwing provider fails the connection attempt, the pulled
  token reaches the actual upgrade Authorization header, and a null, empty or
  blank provider return is rejected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The follow-up that pre-encodes the OIDC form params changed
urlEncode() to URLEncoder.encode(value, StandardCharsets.UTF_8).
That Charset overload is @SInCE 10, so the source-of-truth JDK 8
build fails to compile: "incompatible types: Charset cannot be
converted to String" at OidcDeviceAuth.java:896.

Revert urlEncode() to the Java 8 String-charset form,
URLEncoder.encode(value, "UTF-8") with the
UnsupportedEncodingException catch, and drop the now-unused
StandardCharsets import. The constructor-time pre-encoding of
clientId/scope/audience/device_code is unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Address three review follow-ups in the OIDC device-flow client.

QwpQueryClient resolved the bearer token inside the per-endpoint
upgrade, so a token-provider failure (not signed in, a failed silent
refresh, a rejected token) was caught as a per-endpoint transport
error and reported as "all QWP endpoints unreachable", and the
provider was queried once per endpoint. Resolve the header once
before the endpoint walk in connect() and reconnectViaTracker() and
thread it through connectToEndpoint/runUpgradeWithTimeout, so a
provider failure - which is cluster-wide - propagates directly with
the provider's own message and the provider is queried once per
connect/reconnect. Strengthen testThrowingProviderFailsConnect to
assert the provider's own exception surfaces, not a wrapped
"unreachable" error.

Validate Builder.httpTimeoutMillis: a non-positive value gave an
already-expired read deadline and an unbounded recv(int), so reject
it like Sender.Builder already does. Add a test.

Drop the always-true scopeEncoded null check in tryRefresh: build()
defaults scope to DEFAULT_SCOPE, so append it unconditionally like
runDeviceFlow().

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mtopolnik

Copy link
Copy Markdown
Contributor

[PR Coverage check]

😍 pass : 1111 / 1170 (94.96%)

file detail

path covered line new line coverage
🔵 io/questdb/client/cutlass/auth/BrowserLauncher.java 14 21 66.67%
🔵 io/questdb/client/cutlass/auth/DeviceCodePrompt.java 22 24 91.67%
🔵 io/questdb/client/cutlass/qwp/client/QwpQueryClient.java 21 23 91.30%
🔵 io/questdb/client/cutlass/auth/OidcAuthException.java 31 33 93.94%
🔵 io/questdb/client/cutlass/auth/OidcDeviceAuth.java 823 867 94.93%
🔵 io/questdb/client/Sender.java 29 30 96.67%
🔵 io/questdb/client/cutlass/line/http/AbstractLineHttpSender.java 35 36 97.22%
🔵 io/questdb/client/std/str/Utf16Sink.java 10 10 100.00%
🔵 io/questdb/client/cutlass/qwp/client/QwpWebSocketSender.java 6 6 100.00%
🔵 io/questdb/client/cutlass/auth/DeviceAuthorizationChallenge.java 12 12 100.00%
🔵 io/questdb/client/cutlass/http/client/AbstractChunkedResponse.java 8 8 100.00%
🔵 io/questdb/client/cutlass/json/JsonLexer.java 65 65 100.00%
🔵 io/questdb/client/cutlass/http/client/AbstractResponse.java 8 8 100.00%
🔵 io/questdb/client/std/str/DisplaySafe.java 20 20 100.00%
🔵 io/questdb/client/HttpTokenProvider.java 7 7 100.00%

@glasstiger

Copy link
Copy Markdown
Contributor Author

Code review — OIDC device flow (level-3 re-review)

Re-reviewed at 1b7fecd (base 84da57d), 36 files, +8546/−129. This pass focuses on the 4 commits since the prior review at e1f3d53 — the QWP egress token provider, the getToken/getTokenSilentlysignIn/getToken method swap, pre-encoding of the OIDC form params, and the Java-8 build fixes — and re-verifies the full security surface (token/refresh-token/device-code handling, untrusted IdP + /settings input, issuer origin/path pinning, display sanitizing, bounded reads, lock/cancellation). Findings were verified against source; the build (JDK 25) and the relevant suites (~120 tests) were run green.

Verdict: approve. No Critical or Moderate issues.

The four highest-blast-radius changes were each verified:

  • The method swap is complete and correct. Zero lingering getTokenSilently references. Every request/flush/reconnect caller (AbstractLineHttpSender:772, Sender:2936, QwpQueryClient:1812) uses the non-blocking getToken(), never the blocking signIn(); the README and examples call signIn() once first. Bodies/locking swapped correctly: signIn() = lock.lock() + runDeviceFlow, getToken() = tryLock fail-fast, never runs the flow.
  • Pre-encoding is wire-byte-identical. clientIdEncoded/scopeEncoded/audienceEncoded (ctor) and deviceCodeEncoded (once in pollForToken) produce exactly the old per-call bytes. The dropped if (scope != null) guard in tryRefresh is safe — build() forces scope = DEFAULT_SCOPE, so scope is never null; audience stays null-guarded.
  • The QWP egress provider is correct. resolveAuthorizationHeader() resolves once before the endpoint walk in both connect() and reconnectViaTracker(), validates the token, and propagates a provider throw with its own error (not folded into "all endpoints unreachable"). Mutual exclusion is bidirectional; withBearerTokenProvider is in the post-connect guard test.
  • Java 8 floor is clean. Exhaustive scan of all 36 changed files (main + test) found no Java 9+ API/syntax; the URLEncoder String-charset revert held.

Security invariants all still hold after the follow-ups: served-kind token validation on every path, full percent-encoding of attacker-influenced form values, display sanitizing (bidi/zero-width/surrogate/tag), the 3-digit status guard on both device-auth and poll paths, endpoint/issuer origin+path pinning (no traversal bypass found under adversarial probing), bounded reads, and no token leakage into messages.

Minor (polish, non-blocking)

  1. Native-leak regression guard is still blind to the real risk. OidcDeviceAuthTest.testMalformedEndpointDoesNotLeakNativeMemory feeds "not-a-url", which throws in Endpoint.parse inside build() before the constructor allocates the native lexer, so the strong NATIVE_TEXT_PARSER_RSS measurement never exercises a parseable-but-rejected endpoint — the actual construct-then-validate reorder it guards. Production ordering is correct (lexer last; build() validates first), verified both ways. Raised at e1f3d53 and still open. Fix: add a measured block using deviceAuthorizationEndpoint("http://idp.example/device") + allowInsecureTransport(false) (parses, then rejected by the https check) and assert the tag is unchanged. This is the one item most worth doing before merge.

  2. Egress/ingress provider-failure asymmetry. Egress QwpQueryClient fail-fasts on a token-provider throw (resolves once before the walk, propagates the real error), but ingress QwpWebSocketSender.buildAndConnect pulls inside the per-endpoint loop and treats a throw as a transport error, retrying within the reconnect budget (catch (Exception e) — correct, and the comment is accurate). Both are correct and documented, but the two WebSocket clients diverge for an identical failure. Consider aligning QwpWebSocketSender to resolve-once-before-walk for consistent errors. (Optional.)

  3. Minor test-coverage gaps (all low): no direct short-all-digit status case ("HTTP/1.1 2 OK") on the device-authorization path (the poll path has one; covered indirectly via the shared classifier); the QWP per-reconnect re-query is asserted only via the @TestOnly hook rather than a real multi-endpoint failover on the wire; withBasicAuth(...).withBearerTokenProvider(...) ordering is untested; QWP bad-token rejection is asserted via the hook, not a real connect().

  4. refresh_token re-encoded per refresh. tryRefresh re-runs urlEncode(refreshToken) on each silent refresh (~once per token lifetime), while every other param is pre-encoded. Cold cadence — pre-encoding it in storeTokens would complete the pattern. Nit.

  5. Comment nit. OidcDeviceAuth.java:103 describes the clock-skew/expiry logic as getToken()'s, but after the rename both getToken() and signIn() share it.

Checked and cleared (so they don't get re-raised)

  • OidcAuthException static-method ordering (oauthError before isUnsafeForDisplay): correct — members are grouped by kind and visibility then alphabetical, so public-static precedes package-private-static. The earlier flag was over-strict.
  • QwpWebSocketSender "caught below" comment: accurate — the provider throw is caught by the broad catch (Exception e) (it doesn't claim a specific handler), which records a transport error and continues. Not a defect.
  • tryRefresh silent-refresh token validation: covered structurally — storeTokens (the single choke point) validates the served kind unconditionally on both the device-flow and refresh paths.
  • groupsInToken naming: no public boolean getter exists; it's a private final field plus a fluent builder setter, matching scope(...)/audience(...).
  • Concurrency / resources: single ReentrantLock; getToken() uses tryLock and fails fast; close() sets the closed volatile before acquiring the lock and never frees the native lexer while a flow holds it — no use-after-free; the jsonLexer is allocated last in the ctor and freed once.

Summary

~12 draft findings verified, ~4 dropped as false positives. 5 in-diff minor items, 0 out-of-diff defects — the cross-context pass walked every out-of-diff caller of the changed shared symbols (Sender, AbstractLineHttpSender, README, examples) and confirmed all safe. None blocks merge; #1 (strengthen the native-leak guard) is the only item worth addressing before merge.

🤖 Generated with Claude Code

@glasstiger

Copy link
Copy Markdown
Contributor Author

Code review — OIDC device flow (independent level-3 pass)

Reviewed at 1b7fecd (base 84da57d), 36 files, +8546/−129. Independent 9-dimension adversarial pass (correctness, concurrency, resources, cross-context callers, performance ×2, tests, code quality, metadata, plus a fresh-context "find what's wrong" agent), every finding verified against working-tree source — not a re-confirmation of the prior passes.

Process note for anyone re-reviewing: gh pr diff 52 currently serves a stale cached diff for this PR — it shows a removed auth::getTokenSilently reference and older Javadoc wording that do not exist at the real head 1b7fecd. The authoritative diff is git diff $(git merge-base HEAD origin/main)...HEAD; read source from the checked-out head. Findings below are against the real head.

Verdict: approve. No Critical or Moderate issues.

The three highest-blast-radius changes all touch the existing ILP path and were each verified safe at their out-of-diff callers (incl. a cross-module compile of OSS questdb/core against the changed client + 522 passing tests across the affected suites):

  • Utf16Sink.putAsPrintable — for 0x00–0x1F/0x7F the new \uXXXX output is byte-identical to the old \u00XX; only C1/FORMAT/surrogate chars change from raw to escaped (strictly safer). Legitimate non-ASCII letters/emoji still pass raw. No existing caller (ConfStringParser, column-name errors, ILP error rendering) depended on the old format.
  • JsonLexer escape decoding — the new hasEscape fast-path returns the raw sink unchanged for escape-free values (the common case for the ILP JsonErrorParser); decoded values are re-escaped via putAsPrintable before display; numeric fields carry no backslash and are unaffected.
  • Response.recv(int) whole-call bound — active on the ILP flush (30s default, re-armed per recv() call, so a large body across many recv()s is unaffected), bites only a single fragment read that stalls ~30s, and surfaces as a retryable HttpClientException caught in flush0.

This is an unusually well-defended PR: CRLF/header injection, bidi/ANSI terminal spoofing, MITM endpoint redirection (origin + path pinning with multi-decode traversal defense), token-kind confusion, use-after-free on close(), and unbounded reads were each anticipated and hold up under adversarial tracing.

Minor (none blocking)

  1. Tautological native-leak guardOidcDeviceAuthTest.testMalformedEndpointDoesNotLeakNativeMemory (~:1993). "not-a-url" throws in Endpoint.parse before the native JsonLexer is allocated (it's the last ctor statement), so the RSS-equality assertion can never fail — it doesn't guard the construct-then-validate ordering it claims to. Production ordering is safe. Fix: use a parseable-but-rejected endpoint, e.g. deviceAuthorizationEndpoint("http://idp.example/device") + allowInsecureTransport(false). (Still open from the prior passes — the one item most worth doing before merge.)

  2. Test gap: short all-digit status on the device-authorization path. The 3-digit isHttpStatusSuccess() guard is tested on the token-poll path but not at the device-auth call site (runDeviceFlow, :1195); the device-auth status tests use a non-numeric status, a different code path. Add a raw("HTTP/1.1 2 OK") device-step case.

  3. Test gap: throwing httpTokenProvider on a WebSocket ingest sender reconnect. Only the QWP QueryClient initial-connect throw and the WS happy-reconnect re-query are covered. Production handling is correct — the throw (OidcAuthException/LineSenderException, both RuntimeException) is caught by catch (Exception e) at QwpWebSocketSender.java:2500, the client is closed, and it retries within the reconnect budget — but a regression narrowing that catch would ship green. (The :2456 "caught below" comment is accurate, just non-specific — not a defect.)

  4. Test gap: ILP flush recv() whole-call timeout end-to-end. ResponseTest/ChunkedResponseTest cover the unit-level bound well, but no integration test drives a real Sender.flush() against a dribbling/stalled server to confirm the abort fires and is classified retryable.

  5. OIDC JSON parsers track object depth only, ignoring array nesting. All four inner parsers bump depth on EVT_OBJ_START/END but ignore EVT_ARRAY_START/END, so array-wrapping ({"config":[{…}]} / [{…}]) extracts fields from RFC-malformed shapes, subverting the depth-based "only top-level config trusted" intent. No practical security/correctness impact: the /settings body is already fully attacker-controlled in the fromQuestDB threat model (array-wrapping grants no new capability), endpoints stay origin/path-pinned regardless of parse shape, a preferences value still can't be read as config, and a real QuestDB server never returns arrays there. Robustness/defense-in-depth nit — optionally track array depth or reject a non-object top level.

  6. Cold-path perf nits (optional). tryRefresh re-encodes refresh_token per silent refresh (invariant per token lifetime — could pre-encode in storeTokens); the issuer-path check runs two independent multi-pass percent-decodes over the same path (endpointPathHasEncodedSeparator then decodePathSegments — fusible); percentDecodeOnce allocates a fresh StringSink per call. All run once per construction/refresh, off the data path. The per-row/per-flush path is allocation-clean and the Authorization: Bearer write is allocation-free.

  7. PR description: disclose the ILP-flush timeout change under "Tradeoffs", not "Supporting changes". The Response.recv(int) whole-call bound affects existing non-OIDC senders; the repo convention is "regressions as prominently as improvements." Worth one line in Tradeoffs noting a flush aborted by the tighter deadline is subject to normal retry (the pre-existing ILP-over-HTTP at-least-once window, marginally widened). Labels enhancement+security are correct (the ILP/REST API/Core labels don't exist in this submodule repo).

  8. Naming/comment nits (non-actionable). groupsInToken/allowInsecureTransport lack the is/has prefix but follow the fluent-builder idiom (mirroring scope()/audience()); the :103 clock-skew comment names only getToken() though signIn() shares the logic (accurate but incomplete).

Checked and dismissed (so they don't get re-raised)

  • Utf16Sink.putAsPrintable "misses supplementary-plane tag chars" — false: the CharSequence overload scans by code point via DisplaySafe and does strip them.
  • Use-after-free of the native jsonLexer on close() — refuted: close() sets the closed volatile then lock.lock(), blocking until any in-flight parse releases the lock; the lexer is never freed mid-parse.
  • OOB read on a trailing backslash in JsonLexer.unescape — refuted: the lexer's ignoreNext guarantees a char follows every \ before the closing quote; the i+1>=n guard is dead-but-correct.
  • OidcAuthException static-method ordering — correct (public-static before package-private-static, then alphabetical).

Summary

~8 Minor items, none blocking: 4 test-coverage/quality gaps (#1#4), 1 robustness nit (#5), 1 optional cold-path perf cleanup (#6), 1 PR-description prominence fix (#7), naming/comment nits (#8). ~6 draft findings dropped as false positives after source verification. Every confirmed finding is in-diff (tests / new parser code / PR body); the high-blast-radius out-of-diff callers (ILP error rendering, JsonLexer consumers, recv()/createLineSender callers, getAuthorizationHeaderForTest) were walked and verified safe — 0 out-of-diff defects. The single item most worth doing before merge is #1.

🤖 Generated with Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request security tandem

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants