Partition statistics metadata reading#2146
Conversation
8f720b7 to
591b954
Compare
| file_size_in_bytes: int = Field(alias="file-size-in-bytes") | ||
|
|
||
|
|
||
| class StatisticsFile(StatisticsCommonFields): |
There was a problem hiding this comment.
In https://github.com/apache/iceberg-python/pull/2033/files I did class StatisticsFile(StatisticsCommonFields, IcebergBaseModel): (subclassing both) for consistency with table metadata where we have e.g.
iceberg-python/pyiceberg/table/metadata.py
Line 469 in 6b19f0c
Despite
iceberg-python/pyiceberg/table/metadata.py
Line 127 in 6b19f0c
There was a problem hiding this comment.
Actually not sure what's right here but could perhaps do the above for StatisticsFile and PartitionStatisticsFile for consistency?
| from pyiceberg.table.statistics import PartitionStatisticsFile | ||
|
|
||
|
|
||
| def test_partition_statistics_file() -> None: |
There was a problem hiding this comment.
WDYT about adding a small , similar test for StatisticsFile in this file too with its additional fields?
There was a problem hiding this comment.
Sounds good, just added one. LMKWYT
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the team work @smaheshwar-pltr @Fokko
Would be interesting to see this use on the read path in the future :)
| table_metadata = TableMetadataV1(**example_table_metadata_v1) | ||
| table_metadata_json = table_metadata.model_dump_json() | ||
| expected = """{"location":"s3://bucket/test/location","table-uuid":"d20125c8-7284-442c-9aea-15fee620737c","last-updated-ms":1602638573874,"last-column-id":3,"schemas":[{"type":"struct","fields":[{"id":1,"name":"x","type":"long","required":true},{"id":2,"name":"y","type":"long","required":true,"doc":"comment"},{"id":3,"name":"z","type":"long","required":true}],"schema-id":0,"identifier-field-ids":[]}],"current-schema-id":0,"partition-specs":[{"spec-id":0,"fields":[{"source-id":1,"field-id":1000,"transform":"identity","name":"x"}]}],"default-spec-id":0,"last-partition-id":1000,"properties":{},"snapshots":[{"snapshot-id":1925,"timestamp-ms":1602638573822,"manifest-list":"s3://bucket/test/manifest-list"}],"snapshot-log":[],"metadata-log":[],"sort-orders":[{"order-id":0,"fields":[]}],"default-sort-order-id":0,"refs":{},"statistics":[],"format-version":1,"schema":{"type":"struct","fields":[{"id":1,"name":"x","type":"long","required":true},{"id":2,"name":"y","type":"long","required":true,"doc":"comment"},{"id":3,"name":"z","type":"long","required":true}],"schema-id":0,"identifier-field-ids":[]},"partition-spec":[{"name":"x","transform":"identity","source-id":1,"field-id":1000}]}""" | ||
| expected = """{"location":"s3://bucket/test/location","table-uuid":"d20125c8-7284-442c-9aea-15fee620737c","last-updated-ms":1602638573874,"last-column-id":3,"schemas":[{"type":"struct","fields":[{"id":1,"name":"x","type":"long","required":true},{"id":2,"name":"y","type":"long","required":true,"doc":"comment"},{"id":3,"name":"z","type":"long","required":true}],"schema-id":0,"identifier-field-ids":[]}],"current-schema-id":0,"partition-specs":[{"spec-id":0,"fields":[{"source-id":1,"field-id":1000,"transform":"identity","name":"x"}]}],"default-spec-id":0,"last-partition-id":1000,"properties":{},"snapshots":[{"snapshot-id":1925,"timestamp-ms":1602638573822,"manifest-list":"s3://bucket/test/manifest-list"}],"snapshot-log":[],"metadata-log":[],"sort-orders":[{"order-id":0,"fields":[]}],"default-sort-order-id":0,"refs":{},"statistics":[],"partition-statistics":[],"format-version":1,"schema":{"type":"struct","fields":[{"id":1,"name":"x","type":"long","required":true},{"id":2,"name":"y","type":"long","required":true,"doc":"comment"},{"id":3,"name":"z","type":"long","required":true}],"schema-id":0,"identifier-field-ids":[]},"partition-spec":[{"name":"x","transform":"identity","source-id":1,"field-id":1000}]}""" |
There was a problem hiding this comment.
diff here is just the
"partition-statistics":[],
There was a problem hiding this comment.
@kevinjqliu That's a great question. According to the spec, there is no difference:
There are some more examples: "snapshot-log":[],"metadata-log":[],, maybe defer that question to a separate PR?
# Rationale for this change Took @smaheshwar-pltr's draft PR and added a test: apache#2033 Cherry-picked his original work to ensure it gets attributed to the original author. # Are these changes tested? # Are there any user-facing changes? <!-- In the case of user-facing changes, please add the changelog label. --> --------- Co-authored-by: Sreesh Maheshwar <smaheshwar@palantir.com>



Rationale for this change
Took @smaheshwar-pltr's draft PR and added a test: #2033
Cherry-picked his original work to ensure it gets attributed to the original author.
Are these changes tested?
Are there any user-facing changes?