Skip to content

Fix PBF block offsets on Windows#894

Open
Symmetricity wants to merge 1 commit into
systemed:masterfrom
Symmetricity:fix/windows-large-pbf-offset
Open

Fix PBF block offsets on Windows#894
Symmetricity wants to merge 1 commit into
systemed:masterfrom
Symmetricity:fix/windows-large-pbf-offset

Conversation

@Symmetricity
Copy link
Copy Markdown

This PR is AI generated.

Summary

This fixes a Windows-specific large-PBF crash by storing PBF block offsets as
std::streamoff instead of long int.

The current code records each PBF data block offset like this:

blocks[blocks.size()] = { (long int)infile->tellg(), bh.datasize, true, true, true, 0, 1 };

On Windows, long remains 32-bit in 64-bit builds. Once an .osm.pbf is large
enough that a block payload offset exceeds the signed 32-bit range, that cast
can truncate the offset. Later reads then seek to the wrong byte position and
parse unrelated data as a PBF blob/primitive block.

This changes the stored offset type to std::streamoff and casts tellg() to
that type at the point the offset is recorded.

Background

This matches #776, which reports Windows crashes around the 2GB boundary:

  • us-midwest at 2,129,162,240 bytes worked.
  • britain-and-ireland at 2,185,531,392 bytes crashed.
  • The reporter noted it looked like a signed 32-bit limit.
  • A later comment reported the same class of crash with files over 2GB.

Issue: #776

Microsoft documents long as 4 bytes for both x86 and x64 MSVC targets:
https://learn.microsoft.com/en-us/cpp/cpp/data-type-ranges

That makes this failure plausible on 64-bit Windows even though the same source
code can work on LP64 Unix-like systems where long is commonly 64-bit.

Implementation

The patch is intentionally small:

  • Include <ios> in include/pbf_processor.h.
  • Change BlockMetadata::offset from long int to std::streamoff.
  • Store block offsets with static_cast<std::streamoff>(infile->tellg()).

No PBF parsing logic, scheduling logic, or block ordering behavior is changed.

Expected Behavior

Before this change, a 64-bit Windows build can truncate offsets once block
payload positions pass the signed 32-bit boundary. Depending on the truncated
position, the later read can fail as a silent crash, a protozero parse
exception, or a generic tilemaker failure.

After this change, tilemaker keeps the stream offset width instead of narrowing
to Windows long, so later seekg() calls can return to the intended block
positions in large files.

Possible Regressions

The intended behavior change is limited to platforms where long int is too
small for large file offsets.

On platforms where long int was already large enough, this should preserve the
same values while using a more appropriate stream offset type.

On 32-bit builds, this does not make the rest of the process capable of handling
arbitrarily large PBFs. It avoids the explicit long int narrowing in
BlockMetadata, but the practical file-size limit still depends on the C++
library, OS large-file support, address space, and tilemaker's memory/storage
requirements.

Related Issues And PRs

Testing

Code checks:

git diff --check origin/master..fix/windows-large-pbf-offset

Earlier local verification on this branch:

make tilemaker -j2
make test_pbf_reader -j2
cmake --build build --target tilemaker -j2

The branch was also built as a MinGW Windows binary during local investigation
for Windows large-PBF testing.

PBF block offsets were stored as long int after casting from tellg(). On Windows, long remains 32-bit even in 64-bit builds, so offsets past 2GB can be truncated before later seekg() calls reread the block.

Store the offset as std::streamoff instead, matching the stream offset type used by tellg()/seekg(). This keeps the existing block metadata flow but avoids narrowing large PBF positions.

This matches the failure pattern reported in systemed#776, where Windows builds crash when reading extracts just over the signed 32-bit boundary.

Checked with make tilemaker -j2, make test_pbf_reader -j2, cmake --build build --target tilemaker -j2, and a MinGW Windows build smoke-tested with wine --help.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash on larger (~2 GB) pbf's on Windows

1 participant