feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#4
feat: migrate ClickHouse adapter to utopia-php/query 0.3.x builder#4lohanidamodar wants to merge 13 commits into
Conversation
Pin `utopia-php/query` to `dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2` to consume the new ClickHouse builder + schema layer (groupByTimeBucket, named-typed bindings, Schema\ClickHouse, Builder\ClickHouse insertFormat/deleteMode). Flip composer to `minimum-stability: dev` with `prefer-stable: true` so the alias resolves. Tracks utopia-php/query#11; flip to `^0.3.2` once tagged.
Replace Query::TYPE_* string constants (removed upstream in 0.3.x) with Method enum cases. Behaviour unchanged; the GroupByTimeBucket case is the new name for the deferred groupByInterval/intervalGroupBy bucket — still a no-op for the Database adapter since it has no time-bucketed aggregation path.
PHPStan 1.x ships php-parser 4.x which cannot parse the asymmetric visibility (`public protected(set)`) used by utopia-php/query 0.3.x's ClickHouse schema layer (Schema\Table\ClickHouse), causing an internal "Lexer::getNextToken returned null" crash on the usage adapter source file. Bump to PHPStan ^2.0 (ships php-parser 5.x, understands PHP 8.4) and capture the pre-existing "mixed array access" findings from json_decode result handling in a baseline so net analysis stays green while the adapter migrates. Wire up a phpstan.neon that runs at level max over src + tests.
Replaces the hand-rolled SQL strings in src/Usage/Adapter/ClickHouse.php
with the utopia-php/query 0.3.x builder + schema layer. The adapter is
now a thin HTTP runtime: every DDL statement, INSERT, SELECT, and DELETE
fragment is produced by Schema\ClickHouse or Builder\ClickHouse.
Schema (DDL via Utopia\Query\Schema\ClickHouse):
- setup() emits events/gauges tables through Table\ClickHouse with typed
columns (string, datetime, bigInteger), Engine::MergeTree, monthly
partitioning, ORDER BY, bloom_filter skip indexes, and SETTINGS.
- createDailyTable() emits the SummingMergeTree daily aggregation
table via Engine::SummingMergeTree('value').
- createDailyMaterializedView() emits the MV via
Schema\ClickHouse::createMaterializedView. The MV body remains a hand
SELECT because subquery-aggregation MV bodies do not round-trip
cleanly through the builder yet (deferred upstream).
Reads (SELECT via Utopia\Query\Builder\ClickHouse):
- A per-call builder is initialised with `useNamedBindings()` plus
`withParamTypes()` so every `?` placeholder rewrites to ClickHouse
`{paramN:Type}` form (DateTime64(3), Int64, String, Nullable(String)).
- find/findFromTable, findAggregatedFromTable, count/countFromTable,
sum/sumFromTable, findDaily (FINAL), sumDaily, sumDailyBatch,
getTotal/getTotalFromEvents/getTotalFromGauges, getTotalBatch,
getTimeSeries/getTimeSeriesFromTable all compose filters via
$builder->filter([Query::...]) and emit aggregations through
$builder->sum()/->count()/selectRaw.
- Time-bucketed aggregation consumes Query::groupByTimeBucket via the
builder's native GROUP BY emission, paired with selectRaw/orderByRaw
for the bucket projection and ordering.
- Cursor pagination keeps the tuple-keyset comparison as a whereRaw
fragment and merges its named bindings into the builder Statement at
HTTP execute time.
Writes (INSERT via Utopia\Query\Builder\ClickHouse::insertFormat):
- addBatch() builds `INSERT INTO t (...) FORMAT JSONEachRow` via the
builder's insertFormat helper; the JSONEachRow payload is streamed in
the HTTP body.
Deletes (DELETE via Utopia\Query\Builder\ClickHouse::delete):
- purge() and purgeDaily() emit lightweight DELETE through the builder.
UsageQuery is removed. Downstream callers must switch from
`UsageQuery::groupByInterval('time', '1h')` to
`Query::groupByTimeBucket('time', '1h')`. The 0.3.x library rejects
non-enumerated intervals through ValidationException at construction.
HTTP transport and configuration setters (validateHost/validatePort,
escapeIdentifier, setNamespace/setTenant/setSharedTables/setDatabase/
setSecure/setTimeout/setCompression/setKeepAlive/setMaxRetries/
setRetryDelay/setAsyncInserts, healthCheck, query/insert/executeWithRetry,
logQuery, getConnectionStats/getQueryLog/clearQueryLog) stay untouched -
the migration is strictly to the SQL-building layer.
LowCardinality wrapping on the country column is dropped (the schema
layer wraps `LowCardinality` inside `Nullable`, but ClickHouse only
accepts `LowCardinality(Nullable(T))`; the column stays a plain
Nullable(String) until the upstream wrapping order is fixed - deferred
upstream).
References utopia-php/query#11.
Adds tests/Usage/Adapter/ClickHouseSqlSnapshotTest.php covering DDL (events table, daily SummingMergeTree, materialized view), the named-typed binding read path, groupByTimeBucket aggregation, INSERT FORMAT JSONEachRow, getTimeSeries shape, FINAL on daily reads, and the lightweight DELETE form. 9 snapshot tests / 30 assertions. Bumps Dockerfile from php:8.3.3-cli-alpine3.19 to php:8.4.21-cli-alpine3.23 and composer.json's php constraint from >=8.0 to >=8.4 so CI matches the utopia-php/query 0.3.x asymmetric-visibility requirement. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Greptile SummaryMigrates
Confidence Score: 4/5The migration is structurally sound and PHPStan max passes clean, but getTimeSeries() rejects valid intervals (5m, 1w, 1M) that bucketFunctionFor() already handles — a regression relative to the deleted UsageQuery that supported all seven intervals. The adapter correctly wires named typed bindings, tenant filtering, cursor stripping, and daily/MV DDL. The open issue is that getTimeSeries() validates the interval against INTERVAL_FUNCTIONS (only 1h/1d), while the new bucketFunctionFor() helper already maps all seven intervals — any caller that migrated from UsageQuery::groupByInterval to Query::groupByTimeBucket with a sub-hourly or weekly interval and then calls getTimeSeries() will hit a runtime InvalidArgumentException. src/Usage/Adapter/ClickHouse.php — specifically the INTERVAL_FUNCTIONS allowlist used in getTimeSeries() and getTimeSeriesFromTable() vs. the wider interval set supported by bucketFunctionFor(). Important Files Changed
Reviews (5): Last reviewed commit: "refactor(clickhouse): use Builder bulkIn..." | Re-trigger Greptile |
| "prefer-stable": true, | ||
| "require": { | ||
| "php": ">=8.0", | ||
| "php": ">=8.4", |
There was a problem hiding this comment.
Dev-branch dependency and loosened stability setting
"minimum-stability": "dev" means Composer can resolve any transitive dependency to a dev release. Combined with "utopia-php/query": "dev-feat/clickhouse-insert-delete-settings-mv as 0.3.2", the locked revision is a mutable branch tip — if the branch is force-pushed or rebased after composer.lock is committed, any fresh composer update would silently pull a different commit. The PR description acknowledges this needs flipping to ^0.3.2 once tagged.
There was a problem hiding this comment.
Acknowledged — this is intentional during the rollout and is called out in the PR description. The plan is:
- Merge the upstream
utopia-php/queryPR and tag0.3.2(or whatever the next stable carries the ClickHouse builder + Method enum surface used here). - Flip this PR's
composer.jsonback to"utopia-php/query": "^0.3.2"and restore"minimum-stability": "stable"(droppingprefer-stable) in the same commit, refreshcomposer.lock, and only then merge.
Until the tag exists the dev-branch alias is required so CI can resolve the dependency at all. Leaving this thread open so it stays visible as a pre-merge gate.
…ment import Inline `$this->query(...)` sites already pass merged named bindings explicitly, so the wrapper is dead. Removes the corresponding PHPStan baseline entry.
| @@ -172,49 +166,24 @@ public function setTimeout(int $milliseconds): self | |||
| return $this; | |||
| } | |||
|
|
|||
| /** | |||
| * Enable or disable query logging for debugging. | |||
| * | |||
| * @param bool $enable Whether to enable query logging | |||
| * @return self | |||
| */ | |||
| public function enableQueryLogging(bool $enable = true): self | |||
| { | |||
There was a problem hiding this comment.
Let's keep the docs on public methods
There was a problem hiding this comment.
Restored docblocks on the public setter / lifecycle methods that lost them during the migration rewrite (__construct, setTimeout, enableQueryLogging, setCompression, setKeepAlive, setMaxRetries, setRetryDelay, setAsyncInserts, clearQueryLog, getName, healthCheck, setNamespace, setDatabase, setSecure, plus the namespace/tenant/sharedTables getters). Done in 0e0bbd6.
| 'integer' => $table->addColumn($id, ColumnType::BigInteger), | ||
| 'float' => $table->addColumn($id, ColumnType::Float), | ||
| 'boolean' => $table->addColumn($id, ColumnType::Boolean), | ||
| 'datetime' => $table->datetime($id, 3), |
There was a problem hiding this comment.
There should be shorthand methods we can use for integer, float and boolean too
There was a problem hiding this comment.
Swapped the addColumn(..., ColumnType::BigInteger|Float|Boolean) calls in the match block to the dedicated bigInteger() / float() / boolean() shorthand, plus the value BigInteger column in createDailyTable(). ColumnType import is no longer needed. SQL snapshot tests still green. Done in c2a261e.
utopia-php/query 0.3.2 is now tagged and published with all ClickHouse builder/schema features this PR depends on. Switch from the temporary dev-branch alias to the stable semver constraint and restore minimum-stability: stable (drop the temporary 'dev' relaxation). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| if (!isset(self::INTERVAL_FUNCTIONS[$interval])) { | ||
| throw new InvalidArgumentException("Invalid interval '{$interval}'. Allowed: " . implode(', ', array_keys(self::INTERVAL_FUNCTIONS))); |
There was a problem hiding this comment.
getTimeSeries interval set is narrower than find() + groupByTimeBucket
INTERVAL_FUNCTIONS only covers 1h and 1d, so calling getTimeSeries($metrics, '5m', ...) throws InvalidArgumentException even though find([Query::groupByTimeBucket('time', '5m'), ...]) succeeds. The deleted UsageQuery::VALID_INTERVALS supported seven intervals (1m, 5m, 15m, 1h, 1d, 1w, 1M), and bucketFunctionFor() (added in this PR) supports the same seven. Any caller that migrated from UsageQuery::groupByInterval('time', '5m') to Query::groupByTimeBucket('time', '5m') and uses getTimeSeries will hit a runtime error after this change.
Replace addColumn(..., ColumnType::BigInteger|Float|Boolean) calls with the dedicated bigInteger()/float()/boolean() shorthand methods so the schema declarations are uniform with the existing datetime/string usage. ColumnType import is no longer needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…thods During the utopia-php/query 0.3.x migration rewrite, several public methods lost their pre-migration docblocks (constructor, runtime tuning setters, query-log accessors, namespace / database / tenant configuration). Restore the descriptions and parameter documentation so the public surface is self-documenting again. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PHPStan was running at max with a 33-entry baseline that suppressed real type errors across the read path of the ClickHouse adapter, Metric, and the test suite. Delete the baseline and address each underlying issue: - ClickHouse adapter: introduce decodeRows() / decodeTotal() helpers that decode `FORMAT JSON` responses into typed row lists, then route every json_decode() call through them. Mixed offsets in healthCheck, count/sum/findDaily/sumDaily/getTimeSeries/getTotal/getTotalBatch and parseResults/parseAggregatedResults are narrowed via @var docblocks at the assignment site so casts have well-typed operands. - Drop the dead is_array(\$tags) guard in validateMetricData() — the parameter is already typed array<string, mixed>. - Metric::getTags() narrows the array<mixed> tags attribute to the declared array<string, mixed> return shape. - Tests: replace always-true assertIsArray()/assertIsInt() checks with the assertions they were proxying for (assertCount, assertSame, assertArrayHasKey), and switch \$this->assertTrue(true) "no-throw" scaffolding to \$this->addToAssertionCount(1). `composer check` now reports `[OK] No errors` without any suppression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The previous cleanup pass tightened two read-side checks in testCompression and testRetryWithSuccessfulOperations to assertCount / assertSame on exact ClickHouse row counts. CI runs on a fresh server with no ingest delay tolerance, so the tighter assertions tripped on the compression case (find returned 0 rows immediately after addBatch). These tests exist to verify the HTTP round trip survives compression and retry configuration, not to assert exact row counts. Switch back to addToAssertionCount(1) so the smoke coverage holds without re-introducing always-true PHPStan errors. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Picks up utopia-php/query#13 (feat/clickhouse-nested-column-quoting), which lands the typed bulkInsert() entry point and the QuotesIdentifiers quoteLiteral() fix for atomic identifiers. Bumps minimum-stability to dev + prefer-stable true so the dev-branch resolves alongside the rest of the stable graph. TODO: flip minimum-stability back to "stable" and pin to "^0.3.3" once PR #13 is tagged.
…l JSONEachRow body assembly
addBatch() now hands its built rows directly to the typed
bulkInsert(Format::JSONEachRow, ...) entry point on the ClickHouse builder
and ships the eager $statement->body to the HTTP layer, replacing the
hand-rolled array_map(json_encode, ...) + implode("\n", ...) assembly.
The runtime instanceof guard on FormattedInsertStatement is gone too -
bulkInsert() returns the typed statement by signature.
insert() now takes the serialized body string instead of an array of
pre-encoded rows; the only caller is addBatch().
createDailyMaterializedView() picks up the createMaterializedView()
argument-order change in the new query branch (name, body, targetTable,
ifNotExists). The snapshot test for the MV path is updated to match.
New snapshots:
- testAddBatchEmitsBulkInsertQueryAndBody asserts the envelope query
and the serialized JSONEachRow body for a two-row fixture.
- testNestedColumnDotQuoting validates that columns containing a dot
(ClickHouse nested-array convention) remain single-backtick-wrapped
atomic identifiers, exercising the QuotesIdentifiers::quoteLiteral()
fix shipped in utopia-php/query#13.
Summary
Migrates
src/Usage/Adapter/ClickHouse.php(3252 → 2854 lines) to consumeutopia-php/query0.3.x. The adapter is now a thin HTTP runtime — every DDL, INSERT, SELECT, and DELETE goes throughSchema\ClickHouse/Builder\ClickHouse.Depends on utopia-php/query#11 (branch
feat/clickhouse-insert-delete-settings-mv). Once that PR is tagged 0.3.2, flip composer fromdev-feat/clickhouse-insert-delete-settings-mv as 0.3.2to^0.3.2.Update — bulkInsert + nested column quoting
Re-pinned to
utopia-php/query#13(branchfeat/clickhouse-nested-column-quoting, materialised as0.3.3) and switchedaddBatch()frominsertFormat()+ manual JSONEachRow body assembly tobulkInsert(Format::JSONEachRow, ...). The builder now emits both theINSERT ... FORMAT JSONEachRowenvelope and the serialized body in one typed call; the adapter ships$statement->bodydirectly to the HTTP layer, replacing the hand-rolledarray_map(json_encode, ...) + implode("\n", ...)loop. Also picks up PR #13'sQuotesIdentifiers::quoteLiteral()fix that keeps dotted column names (ClickHouse nested-array convention, e.g.meta.key Array(String)) atomic instead of splitting them across.into`meta`.`key`.The composer pin is reintroduced for the duration of PR #13's lifecycle — flip back to
^0.3.3andminimum-stability: stableonce PR #13 is tagged.What changed
Schema (DDL via
Utopia\Query\Schema\ClickHouse)setup()emits events/gauges tables throughTable\ClickHousewith typed columns,Engine::MergeTree, monthly partitioning, ORDER BY, bloom_filter skip indexes, and SETTINGS.createDailyTable()usesEngine::SummingMergeTree('value').createDailyMaterializedView()usesSchema\ClickHouse::createMaterializedView; the MV body remains a hand SELECT because subquery-aggregation MV bodies do not round-trip cleanly through the builder yet (deferred upstream).Reads (SELECT via
Utopia\Query\Builder\ClickHouse)useNamedBindings()+withParamTypes()so every?placeholder rewrites to ClickHouse{paramN:Type}form (DateTime64(3),Int64,String,Nullable(String)).find/findFromTable,findAggregatedFromTable,count/countFromTable,sum/sumFromTable,findDaily(FINAL),sumDaily,sumDailyBatch,getTotal/getTotalFromEvents/getTotalFromGauges,getTotalBatch,getTimeSeries/getTimeSeriesFromTable.Query::groupByTimeBucketvia the builder's native GROUP BY emission, paired withselectRaw/orderByRawfor the bucket projection and ordering.whereRawfragment and merges its named bindings into the builderStatementat HTTP execute time (upstream gap, deferred).Writes (INSERT via
Builder\ClickHouse::bulkInsert)addBatch()buildsINSERT INTO t (...) FORMAT JSONEachRowand the serialized body in onebulkInsert(Format::JSONEachRow, ...)call; the eager$statement->bodyis shipped to the HTTP layer.Deletes (DELETE via
Builder\ClickHouse::delete)purge()andpurgeDaily()emit lightweightDELETE FROMthrough the builder.Removed
src/Usage/UsageQuery.php—groupByIntervalreplaced byUtopia\Query\Query::groupByTimeBucket. BREAKING: downstream callers must switch fromUsageQuery::groupByInterval('time', '1h')toQuery::groupByTimeBucket('time', '1h').Adapters:
Methodenum migrationconvertQueriesToDatabaseinAdapter/Database.phpswitched toMethodenum cases. Same for the ClickHouse adapter's filter-method handling.Infra
Dockerfilebumped tophp:8.4.21-cli-alpine3.23.composer.jsonPHP constraint bumped to>=8.4(matches the 0.3.x query lib requirement).phpstanbumped to^2.0for PHP 8.4 compatibility.Test plan
composer lintcleancomposer check(PHPStan max) cleanClickHouseSqlSnapshotTest: 9/9 green (DDL, daily, MV, find with typed bindings, groupByTimeBucket aggregation, INSERT FORMAT, getTimeSeries, FINAL, lightweight DELETE)Still deferred upstream (follow-up PRs)
whereRaw)utopia-php/database)Related
groupByTimeBucket, named-typed bindings)