Skip to content

feat(iceberg): Enable delete files processing in snapshot producer#2367

Open
CTTY wants to merge 6 commits into
apache:mainfrom
CTTY:ctty/rewrite-process-delete
Open

feat(iceberg): Enable delete files processing in snapshot producer#2367
CTTY wants to merge 6 commits into
apache:mainfrom
CTTY:ctty/rewrite-process-delete

Conversation

@CTTY
Copy link
Copy Markdown
Collaborator

@CTTY CTTY commented Apr 24, 2026

Which issue does this PR close?

What changes are included in this PR?

  • Support writing delete manifest in SnapshotProducer

Are these changes tested?

Added uts in snapshot.rs, we should add end-to-end tests when actual rewrite/delete actions are supported

Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
@CTTY CTTY marked this pull request as ready for review April 24, 2026 22:24
Copy link
Copy Markdown
Contributor

@xanderbailey xanderbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Took a pass at reviewing, let me know if it makes sense!

Comment on lines -87 to 94
let snapshot_producer = SnapshotProducer::new(
table,
self.commit_uuid.unwrap_or_else(Uuid::now_v7),
self.key_metadata.clone(),
self.snapshot_properties.clone(),
self.added_data_files.clone(),
);
let snapshot_producer = SnapshotProducer::builder()
.with_table(table)
.with_commit_uuid(self.commit_uuid.unwrap_or_else(Uuid::now_v7))
.with_key_metadata(self.key_metadata.clone())
.with_snapshot_properties(self.snapshot_properties.clone())
.with_added_data_files(self.added_data_files.clone())
.build();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice change

Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/spec/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/append.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs
Copy link
Copy Markdown
Contributor

@dannycjones dannycjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

Comment thread crates/iceberg/src/spec/snapshot.rs Outdated
Comment thread crates/iceberg/src/transaction/snapshot.rs
Comment on lines 391 to 395
async fn manifest_file<OP: SnapshotProduceOperation, MP: ManifestProcess>(
&mut self,
snapshot_produce_operation: &OP,
manifest_process: &MP,
) -> Result<Vec<ManifestFile>> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename this trait function?

It was unclear to me what it was doing, and may be lingering with an old name from when we only had simple appends.

Ultimately, it is building a list of manifest files that will be part of the new snapshot / manifest list. Perhaps build_manifest_file_list?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense, but maybe we should tackle this in a separate PR

Comment thread crates/iceberg/src/transaction/snapshot.rs
Comment on lines +597 to +598
#[cfg(test)]
mod tests {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to have a follow-up to provide simple coverage of other manifest changes? i.e. added data files.

I'm happy to own this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think added data files are tested via integration tests. But I'm not opposed to adding more unit tests :D

@CTTY CTTY force-pushed the ctty/rewrite-process-delete branch from a82836d to d1ef3c2 Compare May 26, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Process delete files when writing snapshots

3 participants