Skip to content

[python] Support value stats with truncate mode by default#7701

Open
XiaoHongbo-Hope wants to merge 1 commit into
apache:masterfrom
XiaoHongbo-Hope:support_value_stats
Open

[python] Support value stats with truncate mode by default#7701
XiaoHongbo-Hope wants to merge 1 commit into
apache:masterfrom
XiaoHongbo-Hope:support_value_stats

Conversation

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor

@XiaoHongbo-Hope XiaoHongbo-Hope commented Apr 26, 2026

Purpose

Python-written append tables have no value stats in data files, preventing predicate pushdown from skipping irrelevant files during upsert-by-key reads. This PR enables default value stats for append table pruning. A follow-up PR will use these stats in the upsert_by_key lookup path.

Skip us/ns/tz timestamps: _serialize_timestamp only supports ms precision (8-byte millis). Java's TIMESTAMP(4-9) uses a compound millis+nanos format that requires a different serialization path. Timezone is also not yet supported in serialization.

Tests

@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft April 26, 2026 10:25
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review May 7, 2026 06:49
@XiaoHongbo-Hope XiaoHongbo-Hope requested a review from JingsongLi May 7, 2026 11:49
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as draft May 7, 2026 12:52
@XiaoHongbo-Hope XiaoHongbo-Hope marked this pull request as ready for review May 8, 2026 10:26
Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The binary column in Java does not generate min/max stats. If Java reads the manifest written in Python and pushes predicates down to the binary column, it may result in incorrect file skipping.

@XiaoHongbo-Hope
Copy link
Copy Markdown
Contributor Author

The binary column in Java does not generate min/max stats. If Java reads the manifest written in Python and pushes predicates down to the binary column, it may result in incorrect file skipping.

Thanks, fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants