Skip to content

[python] Add streaming reads to paimon CLI (table stream command)#7456

Draft
tub wants to merge 3 commits intoapache:masterfrom
tub:python-streaming-2e-cli-stream
Draft

[python] Add streaming reads to paimon CLI (table stream command)#7456
tub wants to merge 3 commits intoapache:masterfrom
tub:python-streaming-2e-cli-stream

Conversation

@tub
Copy link
Contributor

@tub tub commented Mar 17, 2026

Summary

  • Adds paimon table stream <db.table> CLI subcommand that continuously polls a Paimon table and prints new rows as they arrive until Ctrl+C
  • Adds StreamReadBuilder.with_scan_from() to the library so programmatic users can also control starting position ("latest", "earliest", or a snapshot ID integer)
  • Timestamp support in --from is resolved to a snapshot ID at the CLI layer (no timestamps in the library API)
    • I can push this down into the library if we think it'll be useful

Flags

Flag Default Description
--from latest Starting position: latest, earliest, snapshot ID, or timestamp (YYYY-MM-DD, ISO 8601)
--select Column projection (comma-separated)
--where SQL-subset filter predicate
--format table Output format: table or json
--poll-interval-ms 1000 Milliseconds between snapshot polls
--include-row-kind off Prepend _row_kind column (+I, -U, +U, -D)
--consumer-id Persist scan progress for at-least-once resume

Changes

  • paimon-python/pypaimon/cli/cli_table_stream.py — new command handler + parse_from_position() timestamp resolver
  • paimon-python/pypaimon/cli/cli_table.py — register stream subparser
  • paimon-python/pypaimon/read/stream_read_builder.pywith_scan_from()
  • paimon-python/pypaimon/read/streaming_table_scan.pyscan_from startup resolution (consumer restore always wins)

Test plan

  • pypaimon/tests/stream_read_builder_test.py — 7 new unit tests for with_scan_from()
  • pypaimon/tests/streaming_table_scan_test.py — 4 new integration tests (earliest, numeric ID, latest, consumer-overrides-scan_from)
  • pypaimon/tests/cli_table_stream_test.py — 20 new tests (CLI integration + parse_from_position unit tests)

🤖 Generated with Claude Code

tub and others added 3 commits March 17, 2026 18:54
…reamReadBuilder

- Add `paimon table stream <db.table>` CLI subcommand that continuously
  polls a table and prints new rows as they arrive until Ctrl+C
- Flags: --select, --where, --format (table|json), --from, --poll-interval-ms,
  --include-row-kind, --consumer-id
- --from accepts 'latest' (default), 'earliest', a numeric snapshot ID,
  or a timestamp string (YYYY-MM-DD, ISO 8601 with/without timezone)
- Add StreamReadBuilder.with_scan_from() accepting 'latest', 'earliest',
  or an integer snapshot ID; passed through to AsyncStreamingTableScan
  with consumer restore taking highest priority over scan_from
- Tests: 7 new unit tests in stream_read_builder_test.py, 4 integration
  tests in streaming_table_scan_test.py, 20 tests in cli_table_stream_test.py

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…check in table stream CLI

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pport named timezones

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant