Skip to content

[core] Optimize overwrite retry with incremental scan#7858

Open
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:optimize-overwrite-retry-delta
Open

[core] Optimize overwrite retry with incremental scan#7858
leaves12138 wants to merge 1 commit into
apache:masterfrom
leaves12138:optimize-overwrite-retry-delta

Conversation

@leaves12138
Copy link
Copy Markdown
Contributor

Purpose

Overwrite commits rebuild the target partition file set before every atomic commit retry. For large partitions this repeatedly performs a full manifest scan after commit conflicts, which can make retries very expensive.

Changes

  • Add a stateful overwrite CommitChangesProvider in CommitScanner.
  • Keep the first overwrite attempt behavior as a full target scan.
  • On later retries, update the cached target file set from snapshot DELTA manifests between the previous checked snapshot and the latest snapshot.
  • Use the provider from the common overwrite path, so both normal insert overwrite and drop/truncate partition paths are covered.
  • Keep index overwrite changes based on the latest index manifest on each retry.
  • Add a retry test for partial-partition insert overwrite with a concurrent commit between attempts.

Tests

  • mvn -pl paimon-api,paimon-test-utils,paimon-common,paimon-codegen,paimon-codegen-loader,paimon-arrow,paimon-format -DskipTests install
  • mvn -pl paimon-core -DskipITs -Dtest=FileStoreCommitTest#testOverwriteRetryUpdatesCurrentFilesWithDelta test
  • mvn -pl paimon-core -DskipITs -Dtest=FileStoreCommitTest#testOverwritePartialCommit+testDropPartitions+testIndexFiles+testDropStatsForOverwrite+testOverwriteRetryUpdatesCurrentFilesWithDelta test
  • mvn -pl paimon-core -DskipITs -Dtest=FileStoreCommitTest test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant