Support for 'time' type in Iceberg, read and write#1731
Support for 'time' type in Iceberg, read and write#1731ianton-ru wants to merge 7 commits intoantalya-26.1from
Conversation
53090c4 to
f262c29
Compare
f262c29 to
f35b4e9
Compare
Audit: PR #1731 — Iceberg
|
| Item | Detail |
|---|---|
| Scope reviewed | Iceberg/Utils.{cpp,h} (getAvroType, getAvroLogicalType, getIcebergType), IcebergWrites.cpp (stats serialization, extendSchemaForPartitions, manifest partition values), SchemaProcessor.cpp (time → DataTypeTime64(6)), AvroRowInputFormat.cpp, Parquet/PrepareForWrite.cpp, Parquet/Write.cpp, Iceberg/Constant.h, test_database_iceberg/test.py, test_storage_iceberg_no_spark/test_write_time.py |
| Categories failed | None with code-path proof |
| Categories passed | Nullable recursion in getAvroLogicalType; Time64-only Avro logical types with scale 0/3/6; manifest partition branch gated on which.isTime64(); Time and Time64 both mapped to Avro "long", consistent with GenericDatum(int64_t); explicit rejection of unsupported Time64 scales for Avro logical typing; Parquet Time / Time64 preparation and write |
| Assumptions / limits | No local INSERT/CI execution in this audit. Cross-engine semantics (Iceberg time as microseconds vs ClickHouse plain Time as seconds) remain an interop/design concern unless proven with a failing consumer or incorrect CH pruning—not counted as a confirmed defect without that evidence |
Regression / prior notes (resolved on current branch)
- Earlier static finding Avro
intvslongmismatch for plainTimepartitions is addressed:getAvroTypeplacesTimeunder the"long"branch together withTime64. getDecimalScaleonDataTypeTimeavoided:getAvroLogicalTypeonly treatswhich.isTime64().Nullablehandled by recursion ingetAvroLogicalType.- Misleading
which.isTime()underField::Types::Decimal64removed; onlyTime64uses the specialized Avro logical write path.
Suggested follow-up (optional, not defects)
- Integration / CI run:
test_storage_iceberg_no_spark/test_write_time.py(TimeandTime64) andtest_database_icebergtime-partition scenarios. - If strict Iceberg interoperability for
Timepartitions with non–ClickHouse engines is required, validate manifest partition numeric units (seconds vs microseconds) against the official spec with an external reader.
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f35b4e91c1
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case TypeIndex::Decimal128: decimal(16, getDecimalPrecision(*type), getDecimalScale(*type)); break; | ||
| case TypeIndex::Decimal256: decimal(32, getDecimalPrecision(*type), getDecimalScale(*type)); break; | ||
|
|
||
| case TypeIndex::Time: types(T::INT32, C::UINT_32, int_type(32, false)); break; |
There was a problem hiding this comment.
Encode Iceberg
time columns as TIME_MICROS int64
TypeIndex::Time is emitted as INT32 with UINT_32 semantics here, but Iceberg requires Parquet time to be stored as INT64 TIME_MICROS (microseconds from midnight). Writing seconds as 32-bit integers makes produced Iceberg data files non-compliant and can be misread or rejected by other Iceberg engines; it also breaks negative Time values because they are annotated as unsigned.
Useful? React with 👍 / 👎.
| case 3: | ||
| return "time-millis"; |
There was a problem hiding this comment.
Use valid Avro logical type pairing for time-millis
Returning time-millis for Time64(3) produces a schema object combined with Avro long ({"type":"long","logicalType":"time-millis"}), but Avro defines time-millis only for int. For partitioned Time64(3) tables this creates an invalid logical-type combination in manifest schemas, so readers may ignore the logical type or fail schema validation, leading to incorrect partition interpretation/pruning.
Useful? React with 👍 / 👎.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Support for 'time' type in Iceberg, read and write.
Documentation entry for user-facing changes
Solved #1535
This changes time format,.
Was - seconds from midnight:
Now - time with microseconds
CI/CD Options
Exclude tests:
Regression jobs to run: