Skip to content

fix(dif): allow PE DWARF debug companion upload#3237

Open
supervacuus wants to merge 5 commits intogetsentry:masterfrom
supervacuus:fix/dif/allow_pe_dwarf_companion_upload
Open

fix(dif): allow PE DWARF debug companion upload#3237
supervacuus wants to merge 5 commits intogetsentry:masterfrom
supervacuus:fix/dif/allow_pe_dwarf_companion_upload

Conversation

@supervacuus
Copy link

@supervacuus supervacuus commented Mar 24, 2026

Description

I am sorry right away because this kinda of a rabbit hole PR.

So this started with a bump of symbolic to 12.17.3 to include getsentry/symbolic#960.

But that immediately raised a couple of issues

  1. symbolic updated its zip dependency (2.4.2 to 7.2.0) since the last bump. This changed the encoding and invalidated a bunch of test assertions.
  2. The new permissive PE parser fixes led to two performance issues:
    1. A considerable amount of logs (which were never printed) in goblin degraded performance only due to formatting.
    2. The new goblin leniency in parsing exposed performance edge-cases for some of the debug companions, during PE import-table parsing, the result of which neither symbolic nor sentry-cli really needs.

The PR addresses the issues in the following way:

  1. I rewrote the affected tests so that they actually decode the incoming chunks and compare the contents with the fixtures, rather than snapshotting, which could break with every change in the encoder. If you actually wanted to test whether the encoding was stable, I am happy to revert my changes. Specifically, build_deterministic() in src/utils/source_bundle.rs was not entirely clear, whether you want two runs of the same version to have predictably the same output (which I changed it to) or whether it should actually be stable across future versions (which a zip bump would break).
  2. With these changes, uploading multiple directories of PE DWARF debug companions became blazing fast:
    1. I moved the should_skip_log() from log() to enabled() so that we don't pay the formatting cost of log lines that we skip after formatting anyway. This required only a change in parameter from the log record to its metadata. Since that filter only existed for goblin to begin with, I think it is fine.
    2. I introduced a new goblin option that excludes the PE import table from parsing. Of course, this requires further upstream PRs (cargo points goblin and symbolic to dev branches), but I wanted to check whether you are generally in agreement with these changes before going upstream.

Upstream changes required:

Issues

At least partially fixes getsentry/sentry#104738

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

@supervacuus supervacuus requested review from a team and szokeasaurusrex as code owners March 24, 2026 11:20
@supervacuus
Copy link
Author

@loewenheim, I am going to ping you too here, since you might have an opinion on the upstream approach I took.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Copy link
Contributor

@loewenheim loewenheim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @supervacuus, no need to be sorry; on the contrary, thank you very much for working on this!

The logging change is simple and obviously sensible on its own. It could easily be split out into its own PR. Similarly, the goblin/symbolic update doesn't have to happen in the same PR as the test changes—if we want to go through with the test changes, they're correct without the dependency updates too.

That leaves the elephant in the room, which is the test changes and the attendant question of what kind of guarantees we want to make. I'm not aware that we ever intended to guarantee that zip files produced by sentry-cli would be byte-reproducible across versions, and I don't think it causes us any trouble if they aren't. Certainly, in #2109 I updated the zip dependency, which also made it necessary to change a hash digest in a test. This was totally uncontroversial at the time.

As such, I believe the testing changes you're making make sense. Still, I would like @szokeasaurusrex's opinion when he comes back next week.

@supervacuus
Copy link
Author

supervacuus commented Mar 24, 2026

The logging change is simple and obviously sensible on its own. It could easily be split out into its own PR. Similarly, the goblin/symbolic update doesn't have to happen in the same PR as the test changes—if we want to go through with the test changes, they're correct without the dependency updates too.

I am fine with splitting this. The reason I didn't was that the logging change by itself seemed too small and was particularly evident with the upstream changes, and integrating goblin required a solution for the tests anyway.

So, while everything can be reviewed in isolation, you can only verify the upside of the log-fix when you introduce the upstream goblin changes with trace logging enabled (I understand that you don't actually need to execute the program with test-data to see the win, but it might lack the motivation, I was honestly surprised to see the profiler pointing to log formatting).

And to get a green output for the goblin bump would mean we have to wait with merging until the test changes are accepted, or we introduce updated snapshots + magic digest only to overwrite them later.

All of these options are fine, but I wanted to offer the full picture first. Let me know which route you prefer. Especially, should I go forward with the goblin stuff as sketched? Because then I would continue upstream.

That leaves the elephant in the room, which is the test changes and the attendant question of what kind of guarantees we want to make. I'm not aware that we ever intended to guarantee that zip files produced by sentry-cli would be byte-reproducible across versions, and I don't think it causes us any trouble if they aren't. Certainly, in #2109 I updated the zip dependency, which also made it necessary to change a hash digest in a test. This was totally uncontroversial at the time.

As such, I believe the testing changes you're making make sense. Still, I would like @szokeasaurusrex's opinion when he comes back next week.

Maybe then splitting is the right choice. This way, we can decouple the topics somewhat.

Co-authored-by: Sebastian Zivota <loewenheim@users.noreply.github.com>
@supervacuus
Copy link
Author

As such, I believe the testing changes you're making make sense. Still, I would like @szokeasaurusrex's opinion when he comes back next week.

Maybe then splitting is the right choice. This way, we can decouple the topics somewhat.

@loewenheim, this is the current split proposal:

If this makes more sense to you, then we can close this PR.

@loewenheim
Copy link
Contributor

Sorry, I just realized I never specifically weighed in on the goblin upstream changes. I definitely think those make sense! And your split proposal looks good too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for Windows applications compiled with MSYS2+Mingw-w64 GCC

2 participants