[python] Generate input changelogs from Python writer#7739
[python] Generate input changelogs from Python writer#7739junmuz wants to merge 5 commits intoapache:masterfrom
Conversation
932d471 to
d93459e
Compare
| CHANGELOG_FILE_FORMAT: ConfigOption[str] = ( | ||
| ConfigOptions.key("changelog-file.format") | ||
| .string_type() | ||
| .no_default_value() |
1b1441e to
01bc7fb
Compare
|
@JingsongLi @XiaoHongbo-Hope The PR adds support for generating changelogs for input changelog producer table (primarily for inserts). Can you review it? |
sure |
|
Could you add coverage for input changelog with row tracking enabled? |
@XiaoHongbo-Hope Row tracking requires tables without primary keys (row-tracking.enabled is only valid on append-only tables with bucket=-1), while changelog-producer requires primary keys to be defined. These two features looks mutually exclusive by design. I don't see a valid table configuration where both can be active simultaneously. The existing test test_reject_changelog_producer_on_append_only_table already verifies that we reject changelog-producer on tables without primary keys, which should cover all row-tracking tables. |
This validation can still be bypassed because it only runs in Schema.from_pyarrow_schema(). Directly constructing Schema(fields=..., primary_keys=[], options={'changelog-producer': 'input'}) and passing it to catalog.create_table() still creates an append-only table |
…e format from data file format
8adc86c to
9365d7a
Compare
Purpose
Tests
There are new tests added. Python scripts have been executed manually and the generated changelogs are verified to be readable from a FlinkSQL job.
Limitation
Changelogs are currently only generated for inserts.