Skip to content

Antalya 26.3: apassos-3: combined port of 12 PRs#1718

Open
zvonand wants to merge 21 commits intoantalya-26.3from
feature/antalya-26.3/apassos-3
Open

Antalya 26.3: apassos-3: combined port of 12 PRs#1718
zvonand wants to merge 21 commits intoantalya-26.3from
feature/antalya-26.3/apassos-3

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented May 1, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add support for exporting MergeTree parts and partitions to object storage, including experimental exports to Apache Iceberg, along with a cache for S3 ListObjects calls.
Add support for exporting MergeTree parts and partitions to object storage, including experimental exports to Apache Iceberg, along with a cache for S3 ListObjects calls (#1388 by @arthurpassos, #1405 by @arthurpassos, #1478 by @arthurpassos, #1402 by @arthurpassos, #1484 by @arthurpassos, #1490 by @arthurpassos, #1500 by @arthurpassos, #1517 by @arthurpassos, #1499 by @arthurpassos, #1593 by @arthurpassos, #1618 by @arthurpassos, #1692 by @arthurpassos).

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

Combined port of 12 PR(s) (group apassos-3). Cherry-picked from #1388, #1405, #1478, #1402, #1484, #1490, #1500, #1517, #1499, #1593, #1618, #1692.

zvonand added 11 commits May 1, 2026 14:17
…rtition

Antalya 26.1 - Forward port of export part and partition

Source-PR: #1388 (#1388)
…ache

Antalya 26.1 - Forward port of list objects cache #1040

Source-PR: #1405 (#1405)
…_export_tests_that_block_minio

Skip remote suite on export tests that block minio

Source-PR: #1478 (#1478)
Improvements to partition export

Source-PR: #1402 (#1402)
…_size

Use serialized metadata size to calculate the cache entry cell
add setting to define filename pattern for part exports

Source-PR: #1490 (#1490)
…le_possibly_missing_not_pending_entries

Fix local replicated_partition_exports table might miss entries

Source-PR: #1500 (#1500)
…trategy

Fix IPartitionStrategy race condition

Source-PR: #1517 (#1517)
…eduled_after_scheduling

Bump scheduled exports count only in case it has been scheduled

Source-PR: #1499 (#1499)
…ing_the_lock

Export Partition - release the part lock when the query is cancelled

Source-PR: #1593 (#1593)
Export partition to apache iceberg

Source-PR: #1618 (#1618)
@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude labels May 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 1, 2026

Workflow [PR], commit [8eed524]

zvonand and others added 3 commits May 1, 2026 15:39
Background tasks `exportMergeTreePartitionUpdatingTask`,
`selectPartsToExport`, `exportMergeTreePartitionStatusHandlingTask` and
the query handler `exportPartitionToTable` issue ZooKeeper requests but
never call `Coordination::setCurrentComponent`. When the server setting
`enforce_keeper_component_tracking` is enabled (as it is in fast tests),
every Keeper request from these tasks throws `LOGICAL_ERROR: Current
component is empty`. In a fast-test run this produced ~480k logged
exceptions with stack traces from `exportMergeTreePartitionUpdatingTask`
alone, contributing to log noise and load on the embedded Keeper.

Wrap the four entry points with `Coordination::setCurrentComponent` so
they match the convention used by every other `StorageReplicatedMergeTree`
task that talks to ZooKeeper.

Observed on fast-test report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1718&sha=9502b3ed726d1c5605c5b315dc9182713d813fb5&name_0=PR&name_1=Fast%20test
The previous commit covered the four `BackgroundSchedulePool` /
query-handler entry points that talk to ZooKeeper from the partition-
export feature, but missed the completion path. The part-export
completion callback is invoked from a `MergeTreeBackgroundExecutor`
thread, which does not inherit any thread-local component set by the
scheduling task. As a result, when
`enforce_keeper_component_tracking` is on, every Keeper request issued
from `ExportPartitionTaskScheduler::handlePartExportCompletion` (and
its callees `handlePartExportSuccess`,
`handlePartExportFailure`, `tryToMovePartToProcessed`,
`areAllPartsProcessed`) raises `LOGICAL_ERROR: Current component is
empty`, aborting the server in debug builds.

Observed on PR #1718 (commit `e239d42`):
- Stateless tests (amd_debug, sequential)
- Stateless tests (amd_asan, db disk, distributed plan, sequential)
- Stateless tests (amd_debug, distributed plan, s3 storage, sequential)
- Stateless tests (arm_binary, sequential)
- Stateless tests (arm_asan, azure, sequential)
- Many `test_export_replicated_mt_partition_to_*` integration tests

All show the same fatal stack ending in `tryToMovePartToProcessed` ->
`zkutil::ZooKeeper::tryGet`.

Wrap `handlePartExportCompletion` with `Coordination::setCurrentComponent`
to cover all four sub-paths in one place. Also wrap
`ExportPartFromPartitionExportTask::executeStep`, which is the same
threading model (a `MergeTreeBackgroundExecutor` task issuing
`zk->tryCreate`) and would fail identically when
`export_merge_tree_partition_lock_inside_the_task` is enabled.

CI: #1718
@zvonand zvonand changed the title Antalya 26.3: apassos-3: combined port of 11 PRs Antalya 26.3: apassos-3: combined port of 12 PRs May 4, 2026
@zvonand zvonand added ai-needs-attention Releasy stopped on a conflict it could not resolve — needs human review and removed ai-needs-attention Releasy stopped on a conflict it could not resolve — needs human review labels May 4, 2026
zvonand and others added 7 commits May 4, 2026 19:56
Change export_merge_tree_partition_manifest_ttl to 86400

Source-PR: #1692 (#1692)
The two preceding commits covered the export partition background tasks,
the query handler `exportPartitionToTable`, and the part-export
completion path. They missed one more entry point: the system table
`system.replicated_partition_exports`.

`StorageSystemReplicatedPartitionExports::fillData` runs on a query
thread and calls `StorageReplicatedMergeTree::getPartitionExportsInfo`,
which then calls `ExportPartitionManifestUpdatingTask::getPartitionExportsInfo`.
The latter issues `tryGetChildren` and `tryMulti` requests against
ZooKeeper. Query threads do not inherit any component set by the
scheduling tasks, so when `enforce_keeper_component_tracking` is on
(the default in CI), every read of this system table raises
`Logical error: 'Current component is empty, please set it for your scope using Coordination::setCurrentComponent'`.

The integration tests under `test_export_replicated_mt_partition_to_iceberg`
poll `system.replicated_partition_exports` to wait for export status, so
all 143 parameterized cases failed with that exception in the
amd_asan, targeted job:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1718&sha=ec56ec639adf487c51133ea85fb204abedffcd5d&name_0=PR&name_1=Integration%20tests%20%28amd_asan%2C%20targeted%29

Wrap `getPartitionExportsInfo` with `Coordination::setCurrentComponent`
so the system-table read path matches the convention used for the
other ZooKeeper-touching entry points. The local fallback
`getPartitionExportsInfoLocal` does not talk to ZooKeeper, so the
guard placed at the top covers both branches at no extra cost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`StorageObjectStorageConfiguration::initialize` builds `file_path_generator`
from `getRawPath().path` while the path still contains the literal
`{_schema_hash}` placeholder. `setSchemaHash` later replaces the
placeholder in `read_path` and the configuration's raw path, but the
generator kept a stale copy and continued returning paths with the
unreplaced `{_schema_hash}`.

For partitioned writes that path is fed back into
`PartitionedSink::validatePartitionKey` via
`PartitionedStorageObjectStorageSink::createSinkForPartition`, which
then throws `Illegal character '\x7b' in partition id starting with
'…'` because of the literal `{`. Recreate the generator with the
substituted path so `getPathForWrite` returns a path without the
placeholder.

Addresses 1 failing test(s) in Stateless tests (amd_asan, db disk,
distributed plan, sequential) on
#1718. After this fix the
still-failing set shrank from 1 -> 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`test_sharded_export_partition_with_filename_pattern` used fixed names
`sharded_mt_table` / `sharded_s3_table`, but the targeted check repeats
each test 10 times via `pytest-repeat`. The autouse drop-tables fixture
clears local table metadata between iterations but cannot clean the S3
bucket, so all iterations after the first observed leftover Parquet files
under the same hive prefix and the row count assertion failed (got 9
rather than the expected 6). Suffix the table names with a UUID like
every other test in this file does, so each repeat runs against a fresh
S3 prefix.

Addresses 9 failing test(s) in Integration tests (amd_asan, targeted) on
#1718. After this fix the
still-failing set shrank from 9 -> 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commits covered the export partition background tasks, the
query handler `exportPartitionToTable`, the part-export completion path,
and the system table `system.replicated_partition_exports`. They missed
one more entry point: `KILL EXPORT PARTITION`.

`StorageReplicatedMergeTree::killExportPartition` is invoked from a query
thread via `InterpreterKillQueryQuery` and issues `tryGet`, `trySet`,
`getChildren` requests against ZooKeeper to find and mark export tasks
as `KILLED`. Query threads do not inherit any component set by the
scheduling tasks, so when `enforce_keeper_component_tracking` is on (the
default in CI), every `KILL EXPORT PARTITION` raises
`Logical error: 'Current component is empty, please set it for your scope using Coordination::setCurrentComponent'`.

Wrap `killExportPartition` with `Coordination::setCurrentComponent` so
the kill path matches the convention used for the other ZooKeeper-touching
entry points.

Addresses 4 failing tests in Integration tests (amd_binary, 1/5) on
#1718:
- test_export_replicated_mt_partition_to_object_storage::test_kill_export[0]
- test_export_replicated_mt_partition_to_object_storage::test_kill_export[1]
- test_export_replicated_mt_partition_to_object_storage::test_kill_export_resilient_to_status_handling_failure
- test_export_replicated_mt_partition_to_object_storage::test_export_partition_resumes_after_stop_moves_during_export

The fourth test fails at `cluster.shutdown()` because the cluster
fixture is module-scoped and the shutdown check greps the cumulative
log for `LOGICAL_ERROR`, picking up the exception logged earlier by the
`test_kill_export*` cases.

CI report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1718&sha=89687ba0636e8d0a07ae150e86dc9e665d8a0022&name_0=PR&name_1=Integration%20tests%20%28amd_binary%2C%201%2F5%29

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…port paths

`supportsImport`, `import`, and `commitExportPartitionTransaction` access
the external metadata for data-lake destinations without first ensuring
the configuration has been lazily initialized. Call `lazyInitializeIfNeeded`
on the configuration before reading the external metadata so the
metadata pointer is populated when these entry points are reached.

This change was already present in the working tree at the start of the
shard investigation; committing it so it does not get lost.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvonand

This comment was marked as outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude releasy Created/managed by RelEasy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant