Skip to content

Support for 'time' type in Iceberg, read and write#1731

Open
ianton-ru wants to merge 7 commits intoantalya-26.1from
bugfix/antalya-26.1/1535_time_type_write_support
Open

Support for 'time' type in Iceberg, read and write#1731
ianton-ru wants to merge 7 commits intoantalya-26.1from
bugfix/antalya-26.1/1535_time_type_write_support

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Support for 'time' type in Iceberg, read and write.

Documentation entry for user-facing changes

Solved #1535

This changes time format,.
Was - seconds from midnight:

SELECT * FROM datalake.`namespace.table`

43200

Now - time with microseconds

SELECT * FROM datalake.`namespace.table`

12:00:00.000000

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Workflow [PR], commit [f35b4e9]

@ianton-ru ianton-ru force-pushed the bugfix/antalya-26.1/1535_time_type_write_support branch from 53090c4 to f262c29 Compare May 5, 2026 14:03
@ianton-ru ianton-ru force-pushed the bugfix/antalya-26.1/1535_time_type_write_support branch from f262c29 to f35b4e9 Compare May 5, 2026 14:30
@ianton-ru
Copy link
Copy Markdown
Author

Audit: PR #1731 — Iceberg time / Time64 (read & write)

AI audit note: This document was generated by AI (gpt‑5 codex).

Scope: Diff antalya-26.1...HEAD — Iceberg manifest Avro typing, SchemaProcessor, Utils / IcebergWrites, AvroRowInputFormat, Parquet Time / Time64 write paths, integration tests.


Confirmed defects

No confirmed defects in reviewed scope (static review of current branch state; no full binary or integration run).


Coverage summary

Item Detail
Scope reviewed Iceberg/Utils.{cpp,h} (getAvroType, getAvroLogicalType, getIcebergType), IcebergWrites.cpp (stats serialization, extendSchemaForPartitions, manifest partition values), SchemaProcessor.cpp (timeDataTypeTime64(6)), AvroRowInputFormat.cpp, Parquet/PrepareForWrite.cpp, Parquet/Write.cpp, Iceberg/Constant.h, test_database_iceberg/test.py, test_storage_iceberg_no_spark/test_write_time.py
Categories failed None with code-path proof
Categories passed Nullable recursion in getAvroLogicalType; Time64-only Avro logical types with scale 0/3/6; manifest partition branch gated on which.isTime64(); Time and Time64 both mapped to Avro "long", consistent with GenericDatum(int64_t); explicit rejection of unsupported Time64 scales for Avro logical typing; Parquet Time / Time64 preparation and write
Assumptions / limits No local INSERT/CI execution in this audit. Cross-engine semantics (Iceberg time as microseconds vs ClickHouse plain Time as seconds) remain an interop/design concern unless proven with a failing consumer or incorrect CH pruning—not counted as a confirmed defect without that evidence

Regression / prior notes (resolved on current branch)

  • Earlier static finding Avro int vs long mismatch for plain Time partitions is addressed: getAvroType places Time under the "long" branch together with Time64.
  • getDecimalScale on DataTypeTime avoided: getAvroLogicalType only treats which.isTime64().
  • Nullable handled by recursion in getAvroLogicalType.
  • Misleading which.isTime() under Field::Types::Decimal64 removed; only Time64 uses the specialized Avro logical write path.

Suggested follow-up (optional, not defects)

  1. Integration / CI run: test_storage_iceberg_no_spark/test_write_time.py (Time and Time64) and test_database_iceberg time-partition scenarios.
  2. If strict Iceberg interoperability for Time partitions with non–ClickHouse engines is required, validate manifest partition numeric units (seconds vs microseconds) against the official spec with an external reader.

@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f35b4e91c1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

case TypeIndex::Decimal128: decimal(16, getDecimalPrecision(*type), getDecimalScale(*type)); break;
case TypeIndex::Decimal256: decimal(32, getDecimalPrecision(*type), getDecimalScale(*type)); break;

case TypeIndex::Time: types(T::INT32, C::UINT_32, int_type(32, false)); break;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Encode Iceberg time columns as TIME_MICROS int64

TypeIndex::Time is emitted as INT32 with UINT_32 semantics here, but Iceberg requires Parquet time to be stored as INT64 TIME_MICROS (microseconds from midnight). Writing seconds as 32-bit integers makes produced Iceberg data files non-compliant and can be misread or rejected by other Iceberg engines; it also breaks negative Time values because they are annotated as unsigned.

Useful? React with 👍 / 👎.

Comment on lines +522 to +523
case 3:
return "time-millis";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use valid Avro logical type pairing for time-millis

Returning time-millis for Time64(3) produces a schema object combined with Avro long ({"type":"long","logicalType":"time-millis"}), but Avro defines time-millis only for int. For partitioned Time64(3) tables this creates an invalid logical-type combination in manifest schemas, so readers may ignore the logical type or fail schema validation, leading to incorrect partition interpretation/pruning.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant