Decode PBF blobs by data field#895
Open
Symmetricity wants to merge 1 commit into
Open
Conversation
Blob.raw_size is optional metadata for compressed payloads. The reader was using the absence of raw_size to decide that a blob was raw, so a zlib_data blob without raw_size could be returned to protozero still compressed and surface as invalid_tag_exception while parsing a PrimitiveBlock. Track the actual Blob data field instead. Return raw payloads only when the raw field is present, inflate zlib_data even when raw_size is omitted, and report missing or unsupported blob payloads explicitly.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is AI generated.
Summary
This fixes PBF Blob decoding so tilemaker uses the actual Blob data field to
decide whether payload bytes are raw or zlib-compressed.
The current reader uses
raw_size == -1to decide that a Blob is uncompressed:That is not the right discriminator. In the OSM-binary storage format,
raw_sizeis optional metadata for the uncompressed size, while the payloadtype is carried by the Blob
oneof datafield:raw: uncompressed payloadzlib_data: zlib-compressed payloadlzma_data,lz4_data,zstd_data: other compressed payloadsReference:
https://github.com/openstreetmap/OSM-binary/blob/master/osmpbf/fileformat.proto
If a Blob contains
zlib_databut omitsraw_size, the old code returns thecompressed bytes directly to the protozero parser. Protozero then attempts to
parse compressed data as a
HeaderBlockorPrimitiveBlock, which can surfaceas errors such as
protozero::invalid_tag_exceptionorprotozero::end_of_buffer_exception.Implementation
The patch keeps the change local to
PbfReader::readBlob():rawfield.zlib_dataeven whenraw_sizeis omitted.raw_sizeis available.No block scheduling, offset handling, storage, or protozero object parsing logic
is changed.
Expected Behavior
Before this change, a zlib-compressed Blob without
raw_sizecould be silentlymisclassified as raw data. The failure would happen later, when protozero tried
to parse compressed bytes as protobuf message content.
After this change, tilemaker determines the payload type from the Blob data
field. Raw payloads still avoid a copy/decompression step, while zlib payloads
are decompressed before parsing.
Possible Regressions
The normal path for common PBFs with
zlib_dataandraw_sizeshould behavethe same, except the compression decision is now based on the data field rather
than the metadata field.
For zlib Blobs that omit
raw_size, the decompression buffer may need to growdynamically because tilemaker does not know the uncompressed size up front.
That is a small allocation cost on this uncommon path and avoids parsing the
wrong bytes.
For unusual raw Blobs that include
raw_size, this changes behavior fromattempting to decompress raw bytes to returning the raw payload. That follows
the Blob data field and should be more correct.
Unsupported compression fields remain unsupported, but the error message is now
explicit.
Related Issues And PRs
I did not find an existing upstream PR for the same
raw_size/zlib_datachange.
I also did not find an upstream issue that directly reports a zlib Blob without
raw_size. This may be relevant to broader PBF parsing failures, but this PRdoes not claim to fix those reports without a matching reproducer.
Testing
Code checks:
Local build:
cmake -S . -B local-builds/pbf-zlib-without-raw-size -DCMAKE_BUILD_TYPE=RelWithDebInfo cmake --build local-builds/pbf-zlib-without-raw-size --target tilemaker -j1Focused parser test:
Result: