Cherry-pick: Native CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP#1896
Cherry-pick: Native CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP#1896zvonand wants to merge 9 commits into
CREATE TABLE / DROP TABLE for DataLakeCatalog; optionally prune on DROP#1896Conversation
- Port \`IcebergPathResolver::reverseResolve\` from upstream (predates the PR upstream but was missing on this branch). - Handle the Altinity-only \`S3_TABLES\` catalog type in \`getLocationSchemeForTableCreation\` (it is always S3-backed). - Drop the \`unique_key\` check in \`DatabaseDataLake::createTable\`: \`ASTStorage\` has no \`UNIQUE KEY\` clause on this branch. - \`getInMemoryMetadataPtr\` does not take a context argument on this branch. ClickHouse#98670 Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
3983fab to
3ac9ae4
Compare
|
AI audit note: This review comment was generated by AI (claude-4.6-opus). Audit update for PR #1896 (Native CREATE TABLE / DROP TABLE for DataLakeCatalog)Confirmed defectsMedium: GlueCatalog orphans metadata file on CreateTable API failure
Medium:
|
| Category | Status |
|---|---|
| Scope reviewed | InterpreterCreateQuery engine-less path, InterpreterDropQuery ON CLUSTER check, InterpreterAlterQuery ON CLUSTER check, DatabaseDataLake::createTable, DatabaseDataLake::dropTable, GlueCatalog::createTable/dropTable, RestCatalog::createTable/dropTable/namespaceToJSONArray/encodeNamespaceForURI, S3TablesCatalog::dropTable, ICatalog::storageTypeToScheme, constructTableLocation, IcebergMetadata::createInitial metadata filename, MetadataGenerator::generateNextMetadata parent_snapshot/metadata-log changes, IcebergWrites previous_metadata_file_path tracking, Mutations.cpp reverseResolve + writeMetadataFiles, Utils.cpp cleanup-on-failure, IcebergPath::reverseResolve, StorageObjectStorage::alter/checkAlterIsPossible/drop |
| Categories failed | Partial-update rollback (GlueCatalog metadata write), path resolution edge case (reverseResolve) |
| Categories passed | Concurrency/thread-safety, memory lifetime, exception safety (DDL guard RAII), integer overflow (N/A), ON CLUSTER rejection, namespace encoding, setting rename/alias, DROP purge semantics, metadata-log correctness, metadata filename UUID, test coverage |
| Assumptions/limits | Static analysis only; GlueCatalog AWS interactions not runtime-tested; Mutations.cpp retry logic assumed correct based on pre-existing patterns |
There's an export partition regression fail that looks like it's caused by this PR.Under ice catalog, exporting multiple partitions in one ALTER does some concurrent writes. Exports are marked as COMPLETED but DataLakeCatalog read-back is short on rows. The same scenarios pass on baseline Antalya and under no_catalog (direct S3). The regression lines up with the PR’s Iceberg commit changes in IcebergWrites.cpp / MetadataGenerator.cpp: when one ALTER fires several parallel exports to the same REST-catalog table, later commits can land without chaining onto earlier snapshots’ manifests, so only the latest partition’s data is visible even though every export succeeded. |
|
@DimensionWieldr how it could've happened? this PR is not yet merged, and the latest run has not yet finished build jobs (there are no regression results yet) |
Sorry I should've specified. This is a fail from CI on this PR, from the previous commit. I was asked to take a look at export partition fails earlier today. Maybe the new CI run for the most recent commit will look better. |
|
AI audit note: This review comment was generated by AI (claude-4.6-opus). Audit update for PR #1896 — commit 7a5ddbf (Native CREATE TABLE / DROP TABLE for DataLakeCatalog)Confirmed defectsMedium: Missing null-safety for
|
| Category | Status |
|---|---|
| Scope reviewed | All 36 changed files; focused re-analysis on IcebergWrites.cpp retry/concurrency path, MetadataGenerator.cpp parent-snapshot logic, generateManifestList chaining, GlueCatalog cleanup, IcebergMetadata::createInitial cleanup, reverseResolve fix |
| Categories failed | Null-safety (IcebergWrites.cpp current-snapshot-id), UX (S3Tables CREATE TABLE error message) |
| Categories passed | Partial-update rollback (fixed), path resolution (fixed), concurrency/manifest-chaining (correct), memory lifetime, exception safety, ON CLUSTER rejection, namespace encoding, setting rename/alias, DROP purge semantics, metadata-log correctness, test coverage |
| Assumptions/limits | Static analysis only; export partition regression analyzed by code tracing only, not runtime-reproduced; Poco JSON isNull vs has semantics assumed standard |
Hmm, are we sure? Regression test Fix directions
|
Cherry-picked from ClickHouse#98670.
Native
CREATE TABLE/DROP TABLEfor DataLakeCatalog; optionally prune onDROPChangelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Supports
CREATE TABLE,CREATE TABLE … AS source, andDROP TABLEfor DataLakeCatalog; optional catalog-side data purge can be triggered ( newdatabase_iceberg_purge_on_dropsetting).Documentation entry for user-facing changes