Skip to content

Fix strip spec-mandated DV blob framing before deserializing#3576

Open
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/puffin-offset
Open

Fix strip spec-mandated DV blob framing before deserializing#3576
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/puffin-offset

Conversation

@ebyhr

@ebyhr ebyhr commented Jun 28, 2026

Copy link
Copy Markdown
Member

Rationale for this change

Fix _payload storing puffin[8:] instead of puffin — blob offsets
in the footer are file-relative (from byte 0), not relative to byte 8.
The two bugs cancelled for single-blob files at offset 4; the fix was
only testable once PuffinFile supported multi-blob/compressed files.

Also widen PuffinBlobMetadata.type from Literal["deletion-vector-v1"]
to str so PuffinFile can parse files containing non-DV blobs without
a Pydantic validation error.

Are these changes tested?

Yes. Copied files from https://github.com/apache/iceberg/tree/main/core/src/test/resources/org/apache/iceberg/puffin/v1
sample-metric-data-compressed-zstd.bin will be added separately in #3575

Are there any user-facing changes?

No

@ebyhr ebyhr force-pushed the ebi/puffin-offset branch from 98ddd63 to 7b1cee9 Compare June 28, 2026 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant