Skip to content

Named Scalars#1720

Open
filimonov wants to merge 6 commits intoantalya-26.3from
named_scalars-antalya-26.3
Open

Named Scalars#1720
filimonov wants to merge 6 commits intoantalya-26.3from
named_scalars-antalya-26.3

Conversation

@filimonov
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Experimental Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Adds named scalars - server-side cached values that you define once and reuse across queries instead of recomputing or storing them in a one-row table. Use CREATE [LOCAL|SHARED] NAMED SCALAR <name> [REFRESH EVERY N {SECOND|MINUTE|HOUR|DAY}] AS SELECT ... to define a scalar, getNamedScalar('<name>') (or getNamedScalarOrDefault('<name>', default)) to read it, and SYSTEM REFRESH NAMED SCALAR <name> to force an out-of-schedule refresh. LOCAL scalars are cached per server; SHARED scalars are coordinated cluster-wide via Keeper so every replica sees the same value and exactly one replica refreshes per tick. Refresh bodies run under SQL SECURITY DEFINER, are visible in system.processes and system.query_log (is_internal = 1), and can be interrupted with KILL QUERY. Inspect state via system.named_scalars. Gated behind the experimental setting allow_experimental_named_scalars.

Documentation entry for user-facing changes

Add server-side named, refreshable scalar values backed by either a local on-disk cache or a shared Keeper-backed cache, accessed via getNamedScalar / getNamedScalarOrDefault and surfaced in system.named_scalars.

Surface:

  • DDL: CREATE [OR REPLACE] [LOCAL|SHARED] NAMED SCALAR [IF NOT EXISTS] [ON CLUSTER ...] [DEFINER = ...] [SQL SECURITY DEFINER] [REFRESH EVERY ] AS <SELECT ...> DROP NAMED SCALAR [IF EXISTS] [ON CLUSTER ...]
  • Functions: getNamedScalar(name), getNamedScalarOrDefault(name, default).
  • SYSTEM commands: REFRESH NAMED SCALAR ; {STOP|START} NAMED SCALAR REFRESHES [].
  • system.named_scalars table (two-tier: value tier via getNamedScalar grant, operator tier via SHOW_NAMED_SCALARS).
  • Profile-events / metrics: NamedScalarRefresh{Attempts,Successes, Failures,SkippedByPeer,DurationMicroseconds}; BackgroundNamedScalarRefreshPool{Task,Size}.
  • Server settings: background_named_scalar_refresh_pool_size, named_scalar_definitions_path, named_scalar_definitions_zookeeper_path, named_scalar_local_cache_path, default_named_scalar_cache, max_named_scalars, named_scalar_max_value_size.
  • User setting: allow_experimental_named_scalars (experimental gate).
  • Access: CREATE_NAMED_SCALAR, DROP_NAMED_SCALAR, SHOW_NAMED_SCALARS, SYSTEM_REFRESH_NAMED_SCALAR, SYSTEM_NAMED_SCALAR_REFRESHES, getNamedScalar (function-execute, with getNamedScalarOrDefault alias).
  • Error codes 766–771 (NAMED_SCALAR_NOT_FOUND, NAMED_SCALAR_ALREADY_EXISTS, SHARED_NAMED_SCALARS_NOT_CONFIGURED, NAMED_SCALAR_NOT_REFRESHABLE, NAMED_SCALAR_VALUE_TOO_LARGE, NAMED_SCALAR_HAS_NO_VALUE).

Architecture:

  • Definitions are immutable (UUID-identified parsed records). Local scalars persist their definition on disk; shared scalars publish to Keeper. The manager dispatches reads through tryGetScalar; refreshes run on a dedicated BackgroundSchedulePool thread per server.
  • Refresh bodies execute via executeQuery({.internal=true}) so they appear in system.processes (killable via KILL QUERY) and system.query_log; DROP / OR REPLACE / shutdown cancel the in-flight body via QueryStatus::cancelQuery.
  • DEFINER privileges are used during refresh; setUser(definer_id) applies the definer's profile (max_execution_time, max_memory_usage, etc.) so resource limits inherit the standard policy without a bespoke cap.
  • Shared scalars are coordinated by SharedNamedScalarsWatcher: a Keeper child-watch on the definitions root drives reconcile; ephemeral leases serialise refresh evaluation across replicas.
  • system.named_scalars column names align with system.view_refreshes (last_refresh_time, last_success_time, next_refresh_time, exception).
  • Two-tier disclosure: value-tier columns are non-Nullable / always populated for getNamedScalar grantees; operator-tier columns are NULL unless the caller holds SHOW_NAMED_SCALARS.

Tests:

  • 17 stateless tests (03800–03816) covering CRUD, refresh, persistent cadence, OR REPLACE under refresh, definer database resolution, reload from disk, no-Keeper fallback, query_log/processes visibility, KILL QUERY interruption, two-tier access matrix, and orphan cleanup.
  • A 712-line integration suite (test_shared_named_scalars_cluster) covering cross-node discovery, shared refresh failover, ZK session loss, restart-during-refresh, OR REPLACE racing the watcher, and drop-while-discovery-in-flight.

Documentation:

  • docs/en/sql-reference/statements/create/named-scalar.md (full DDL, cache kinds, OR REPLACE, examples, when-to-use patterns, access).
  • docs/en/sql-reference/functions/named-scalar-functions.md (getNamedScalar, getNamedScalarOrDefault).
  • docs/en/operations/system-tables/named_scalars.md (column reference, operational signals query, refresh visibility & cancellation).
  • docs/en/sql-reference/statements/system.md (SYSTEM REFRESH/STOP/START NAMED SCALAR REFRESHES).

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Workflow [PR], commit [670d475]

@filimonov filimonov force-pushed the named_scalars-antalya-26.3 branch 6 times, most recently from 9af4eb0 to e4c7e02 Compare May 4, 2026 13:46
filimonov and others added 2 commits May 4, 2026 23:51
Add server-side named, refreshable scalar values backed by either a
local on-disk cache or a shared Keeper-backed cache, accessed via
getNamedScalar / getNamedScalarOrDefault and surfaced in
system.named_scalars.

Surface:
- DDL: CREATE [OR REPLACE] [LOCAL|SHARED] NAMED SCALAR [IF NOT EXISTS]
       <name> [ON CLUSTER ...] [DEFINER = ...] [SQL SECURITY DEFINER]
       [REFRESH EVERY <N> <unit>] AS <SELECT ...>
       DROP NAMED SCALAR [IF EXISTS] <name> [ON CLUSTER ...]
- Functions: getNamedScalar(name), getNamedScalarOrDefault(name, default).
- SYSTEM commands: REFRESH NAMED SCALAR <name>;
                   {STOP|START} NAMED SCALAR REFRESHES [<name>].
- system.named_scalars table (two-tier: value tier via getNamedScalar
  grant, operator tier via SHOW_NAMED_SCALARS).
- Profile-events / metrics: NamedScalarRefresh{Attempts,Successes,
  Failures,SkippedByPeer,DurationMicroseconds};
  BackgroundNamedScalarRefreshPool{Task,Size}.
- Server settings: background_named_scalar_refresh_pool_size,
  named_scalar_definitions_path,
  named_scalar_definitions_zookeeper_path,
  named_scalar_local_cache_path, default_named_scalar_cache,
  max_named_scalars, named_scalar_max_value_size.
- User setting: allow_experimental_named_scalars (experimental gate).
- Access: CREATE_NAMED_SCALAR, DROP_NAMED_SCALAR, SHOW_NAMED_SCALARS,
  SYSTEM_REFRESH_NAMED_SCALAR, SYSTEM_NAMED_SCALAR_REFRESHES,
  getNamedScalar (function-execute, with getNamedScalarOrDefault alias).
- Error codes 766–771 (NAMED_SCALAR_NOT_FOUND, NAMED_SCALAR_ALREADY_EXISTS,
  SHARED_NAMED_SCALARS_NOT_CONFIGURED, NAMED_SCALAR_NOT_REFRESHABLE,
  NAMED_SCALAR_VALUE_TOO_LARGE, NAMED_SCALAR_HAS_NO_VALUE).

Architecture:
- Definitions are immutable (UUID-identified parsed records). Local
  scalars persist their definition on disk; shared scalars publish to
  Keeper. The manager dispatches reads through tryGetScalar; refreshes
  run on a dedicated BackgroundSchedulePool thread per server.
- Refresh bodies execute via executeQuery({.internal=true}) so they
  appear in system.processes (killable via KILL QUERY) and
  system.query_log; DROP / OR REPLACE / shutdown cancel the in-flight
  body via QueryStatus::cancelQuery.
- DEFINER privileges are used during refresh; setUser(definer_id)
  applies the definer's profile (max_execution_time, max_memory_usage,
  etc.) so resource limits inherit the standard policy without a
  bespoke cap.
- Shared scalars are coordinated by SharedNamedScalarsWatcher: a
  Keeper child-watch on the definitions root drives reconcile;
  ephemeral leases serialise refresh evaluation across replicas.
- system.named_scalars column names align with system.view_refreshes
  (last_refresh_time, last_success_time, next_refresh_time, exception).
- Two-tier disclosure: value-tier columns are non-Nullable / always
  populated for getNamedScalar grantees; operator-tier columns are
  NULL unless the caller holds SHOW_NAMED_SCALARS.

Tests:
- 17 stateless tests (03800–03816) covering CRUD, refresh, persistent
  cadence, OR REPLACE under refresh, definer database resolution,
  reload from disk, no-Keeper fallback, query_log/processes visibility,
  KILL QUERY interruption, two-tier access matrix, and orphan cleanup.
- A 712-line integration suite (test_shared_named_scalars_cluster)
  covering cross-node discovery, shared refresh failover, ZK session
  loss, restart-during-refresh, OR REPLACE racing the watcher, and
  drop-while-discovery-in-flight.

Documentation:
- docs/en/sql-reference/statements/create/named-scalar.md (full DDL,
  cache kinds, OR REPLACE, examples, when-to-use patterns, access).
- docs/en/sql-reference/functions/named-scalar-functions.md
  (getNamedScalar, getNamedScalarOrDefault).
- docs/en/operations/system-tables/named_scalars.md (column reference,
  operational signals query, refresh visibility & cancellation).
- docs/en/sql-reference/statements/system.md
  (SYSTEM REFRESH/STOP/START NAMED SCALAR REFRESHES).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Antalya enforces a per-thread component tag on every Keeper request:
ZooKeeper::pushRequest aborts the server with LOGICAL_ERROR
('Current component is empty, please set it for your scope using
Coordination::setCurrentComponent') if a Keeper RPC is issued without
an active scope guard. The check is a telemetry / observability
contract — every other Keeper-touching subsystem (UDF storage,
Workload entities, NamedCollections, MergeTree, ACME, etc.) already
uses these guards.

Add an RAII guard at the top of every public method in our three
shared-state classes that ends up making a Keeper RPC:

- NamedScalarDefinitionStoreShared: createRootNodesIfNeeded,
  definitionExists, definitionCount, removeDefinition,
  publishDefinition, loadAll, listDefinitionsWithChildrenWatch,
  readDefinition, readDefinitionWithDataWatch.
- NamedScalarValueBackendShared: createRootNodesIfNeeded,
  readValueBlob, readValueWithDataWatch, removeValue,
  tryReserveRefresh, publishRefreshValue. (readValueBlobAndWatch /
  tryAcquireRefreshLease delegate to guarded methods.)
- SharedNamedScalarsWatcher: initialLoad, resyncAll, reconcileScalar.
  (watchLoop drives those; readDefinitionsAndInstallChildrenWatch /
  readDefinitionData are thin pass-throughs to the already-guarded
  store methods.)

Without these guards Context::initializeNamedScalars() crashed the
server on startup (createAncestors → pushRequest → LOGICAL_ERROR),
which in turn failed every integration test in the
amd_asan_db_disk_old_analyzer and amd_tsan suites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
@filimonov filimonov force-pushed the named_scalars-antalya-26.3 branch from 7b0c4b5 to 36f1486 Compare May 4, 2026 21:51
filimonov and others added 4 commits May 5, 2026 09:52
EphemeralNodeHolder's destructor calls tryRemove on Keeper. When a
RefreshReservation is dropped without ever publishing — synchronous
CREATE-time eval throws between acquire and publish, or the lease
goes out of scope unused — the holder release runs on whatever
thread destroyed the reservation, with no current component set,
and aborts with LOGICAL_ERROR (the same check we addressed for the
direct-call paths in commit 36f1486).

Add a custom RefreshReservation destructor that wraps the holder
release in a Coordination::setCurrentComponent scope guard. The
existing publishRefreshValue path still calls setAlreadyRemoved()
on success, so the holder is a no-op there; this guard only matters
on the cleanup paths.

Surfaced by CI: 7 integration tests in test_shared_named_scalars_cluster
crashed with 'Current component is empty' originating from
~RefreshReservation -> ~EphemeralNodeHolder -> tryRemove.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
CI flake on slow runners: after the 4 s settle window plus the
metric-gauge poll, system.named_scalars.refresh_in_flight could still
read 1 because a 1 s-cadence tick fired between the gauge read and
this single-shot SELECT. Poll for refresh_in_flight = 0 the same way
we poll the gauge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Three groups of small .sql tests covering related properties were folded
into broader hosts so each test exercises one coherent slice of the
feature instead of one assertion per file:

  03800_named_scalars.sql          basic CRUD + IF NOT EXISTS + OR REPLACE
                                   + getNamedScalarOrDefault paths +
                                   CREATE rollback + duplicate-CREATE
                                   (absorbed 03803, 03811, 03813)
  03802_named_scalars_grammar.sql  parser/grammar + SHARED-without-ZK
                                   + ON CLUSTER rejection + TEMPORARY
                                   rejection + dotted-name rejection
                                   + OR REPLACE kind-change rejection
                                   + SYSTEM-command name validation
                                   (absorbed 03804, 03806, 03812)

Three runtime-state-on-restart tests used clickhouse-local with
fragile timing assertions. They moved to the integration suite where
real stop_clickhouse / start_clickhouse + assert_eq_with_retry is the
proper idiom:

  test_local_restart_reloads_state            (replaced 03802 stateless)
  test_persistent_cadence_resumes_schedule    (replaced 03807)
  test_creator_database_normalization         (replaced 03810 runtime
                                              half; 03810 stateless
                                              kept the persistence-grep
                                              assertion only)

03801 trimmed (drop redundant cv_refresh_const / cv_refresh_table_const
blocks — covered by cv_refresh / cv_flap), 03808 dropped a decorative
SELECT sleep(2) line whose `getNamedScalar > 0` assertion always held
post-CREATE. Refresh-fires-after-tick coverage is owned by 03801.

Net: 7 stateless files (03800/01/02/05/08/09/10/16), 25 integration
tests (was 22). All green locally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Files merged into 03800 / 03802_grammar in the previous commit, or
moved to the integration suite (03802 reload / 03807 cadence / 03810
runtime half), are now removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signed-off-by: Mikhail Filimonov <mfilimonov@altinity.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant