Skip to content

branch-4.1: [Improve](Streamingjob) support exclude_columns for Postgres streaming job #61267#61537

Merged
yiguolei merged 1 commit intobranch-4.1from
auto-pick-61267-branch-4.1
Mar 20, 2026
Merged

branch-4.1: [Improve](Streamingjob) support exclude_columns for Postgres streaming job #61267#61537
yiguolei merged 1 commit intobranch-4.1from
auto-pick-61267-branch-4.1

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #61267

…g job (#61267)

### What problem does this PR solve?

Add column-level filtering support for PostgreSQL CDC streaming jobs via
the
`table.<tableName>.exclude_columns` property. Users can specify a
comma-separated
  list of columns to exclude from synchronization.

  **Syntax example:**
  ```sql
  CREATE JOB my_job
    ON STREAMING
    FROM POSTGRES (
      ...
      "include_tables" = "my_table",
      "table.my_table.exclude_columns" = "secret,internal_col"
    )
    TO DATABASE my_db (...)
```

#### Changes

  FE (validation & table creation)

  - DataSourceConfigKeys: add TABLE and TABLE_EXCLUDE_COLUMNS_SUFFIX constants
  - DataSourceConfigValidator: recognize table.<name>.exclude_columns as a valid
  per-table config key (using suffix allowlist)
  - StreamingJobUtils.generateCreateTableCmds(): parse excluded columns, validate
  they exist in the upstream PG table and are not PK columns, then exclude them
  from the Doris CREATE TABLE statement

  cdc_client (DML filtering & schema change handling)

  - ConfigUtil: add parseExcludeColumns(config, tableName) utility
  - DebeziumJsonDeserializer: skip excluded fields when building INSERT/UPDATE/DELETE rows
  - PostgresDebeziumJsonDeserializer: skip DROP/ADD DDL for excluded columns during
  schema change detection, so the Doris table is never modified for columns it
  was never meant to have

#### Behavior

  | Scenario                      | Behavior                                                   |
|--------------------------------|------------------------------------------------------------|
| Snapshot / incremental DML     | Excluded column values are not written to Doris            |
| PG DROP excluded column        | DDL skipped; stored schema updated; sync continues         |
| PG ADD excluded column back    | DDL skipped; sync continues; Doris never gains the column  |
| Exclude non-existent column    | CREATE JOB fails with clear error                          |
| Exclude PK column              | CREATE JOB fails with clear error                          |

#### Tests

  - test_streaming_postgres_job_col_filter.groovy: covers validation errors,
  snapshot filtering, incremental DML filtering, DROP excluded column, re-ADD
  excluded column; uses Awaitility polling instead of fixed sleeps
@github-actions github-actions bot requested a review from yiguolei as a code owner March 20, 2026 02:08
@Thearas
Copy link
Contributor

Thearas commented Mar 20, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 20, 2026
@Thearas
Copy link
Contributor

Thearas commented Mar 20, 2026

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 3.33% (1/30) 🎉
Increment coverage report
Complete coverage report

@yiguolei yiguolei merged commit 23b290d into branch-4.1 Mar 20, 2026
25 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants