[SPARK-56865][INFRA] Suggest a minimal Fix Versions set when resolving JIRA by cloud-fan · Pull Request #55878 · apache/spark

cloud-fan · 2026-05-14T11:02:46Z

What changes were proposed in this pull request?

When dev/merge_spark_pr.py resolves a JIRA after merging a PR to multiple branches, it computes a default Fix Versions list from the merge targets. This PR rewrites that computation to produce the minimal set of Fix Versions that tells users which Spark releases will contain the commit, leveraging the Apache Spark Upstream-First backporting policy (cherry-picks flow master -> branch-M.x -> branch-M.N).

Each merge target contributes a candidate version:

master -> the greatest unreleased N.0.0
branch-M.x -> that major's greatest unreleased M.minor.0
branch-M.N -> its greatest unreleased M.N.patch

Two suppression rules then trim redundancy:

Drop master's N.0.0 when any branch-M.x is in the merge set: a cherry-pick to branch-M.x has already landed on master, so the next master release is implied.
Drop branch-M.x's minor.0 when a sibling branch-M.N contributes M.N.0: an unreleased M.N.0 means branch-M.N has not diverged from branch-M.x yet, so M.N.0 alone covers both lineages.

Sanity check against the current unreleased SPARK versions (5.0.0, 4.3.0, 4.2.0, 4.1.2, 4.0.3, 3.5.9) plus the hypothetical "4.2.0 has released" scenarios raised in the review discussion:

Merge branches	Before	After
`master`	`[5.0.0]`	`[5.0.0]`
`master`, `branch-4.x`	`[5.0.0, 4.3.0]`	`[4.3.0]`
`master`, `branch-4.x`, `branch-4.2` (4.2.0 unreleased)	`[5.0.0, 4.2.0]`	`[4.2.0]`
`master`, `branch-4.x`, `branch-4.2` (4.2.0 released)	`[5.0.0, 4.2.1]`	`[4.3.0, 4.2.1]`
`master`, `branch-4.2` (4.2.0 unreleased)	`[5.0.0, 4.2.0]`	`[5.0.0, 4.2.0]`
`master`, `branch-4.2` (4.2.0 released)	`[5.0.0, 4.2.1]`	`[5.0.0, 4.2.1]`

The rules are parameterized on M and N, so they generalize to any major/minor pair as the project moves forward (e.g. post-5.0.0 master + branch-5.x + branch-5.1 with 5.1.0 unreleased -> [5.1.0]; double-digit minors like branch-4.11 work the same way).

Why are the changes needed?

The previous defaults forced committers to manually edit Fix Versions in multiple common cases:

master + branch-M.x: strip the extra N.0.0.
master + branch-M.x + branch-M.N (just-cut branch-M.N): the change becomes [M.N.0] rather than [5.0.0, M.N.0] or [M.minor.0, M.N.0].

Both are now produced as defaults that match the convention Spark committers apply by hand.

Does this PR introduce any user-facing change?

No. The committer prompt still lets the user override the defaults, so any committer who prefers the old behavior can type it in. Only the suggested default changes.

How was this patch tested?

python3 -m doctest dev/merge_spark_pr.py -v -- 34 doctests pass, including 16 cases in compute_merge_default_fix_versions covering all six rows in the table plus the existing scenarios.
Manually traced future scenarios (post-4.3.0, post-5.0.0, mixed cross-major, double-digit branch-4.11) to confirm the rules generalize across M and N.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

…ranch-M.x is also being merged When the merge script targets master plus a branch-M.x integration dev branch, master's N.0.0 entry is redundant because branch-M.x -> master is a merge-forward chain, so master's next release inherits anything that lands on branch-M.x. The prior `_dedupe_fix_versions_adjacent_major_zero` rule also wrongly dropped the branch-M.x entry whenever the corresponding branch-M.N release branch was in the merge set, because both versions ended up in the defaults list. Replace it with a rule that skips master's contribution only when a branch-M.x is present, and let dict.fromkeys handle exact duplicates.

cloud-fan · 2026-05-14T11:08:17Z

cc @HyukjinKwon @dongjoon-hyun

dongjoon-hyun

To @cloud-fan , I agree with the purpose and functionality of this PR.

However, let's not make a confusion to the community via this PR because Apache Spark community's backporting policy has never been merge-forward chain. Of course, we allow some exceptions, but that's not the general policy.

branch-M.x -> master is a merge-forward chain

If we want to choose a terminology, the Apache Spark's backporting policy has been Upstream-First policy always to prevent a regression.

cloud-fan · 2026-05-14T17:42:13Z

@dongjoon-hyun do you have a concrete proposal? Look at the table, when we merge a PR to master, branch-4.x, branch-4.2, it's wrong to set released version to [5.0.0, 4.2.0], which means the feature is in 5.0.0 release notes, not 4.3.0.

Another idea is to be verbose. Merged to 4.x means 4.3, merged to master means 5.0. The case above will become [5.0.0, 4.3.0, 4.2.0]

dongjoon-hyun · 2026-05-14T17:47:12Z

@cloud-fan , what I suggesting you is simply revising your wording in the phrase, branch-M.x -> master is a merge-forward chain. As I mentioned, I agree with your PR's the purpose and functionality itself already.

@dongjoon-hyun do you have a concrete proposal?

cloud-fan · 2026-05-14T17:57:15Z

@dongjoon-hyun Your suggestion makes me think what should be the rule of deciding the released versions. In the past, when we merge a PR to master only, the released version is the next feature release version (e.g. 4.1.0), which implies that all releases after 4.1.0 will contains the commit. If the PR is also backported to a release branch, like branch-4.0, we additionally include the next maintenance release version (e.g. 4.0.2), which indicates that releases before 4.0.2 do not have the commit.

So the principle is to use a minimal set of released versions to tell users which Spark versions contain the commit. That being said:

Merge branches	released versions
`master`	`[5.0.0]`
`master`, `branch-4.x`	`[4.3.0]`
`master`, `branch-4.x`, `branch-4.2`	`[4.2.0]`
`master`, `branch-4.x`, `branch-4.2` if 4.2.0 has released	`[4.3.0, 4.2.1]`
`master`, `branch-4.2`	`[5.0.0, 4.2.0]`
`master`, `branch-4.2` if 4.2.0 has released	`[5.0.0, 4.2.1]`

The last 2 rows are mostly for completeness, but it should not happen in practise.

How do you think of this principle? No backport or forward-merge wording, just explain released versions from a user-facing view.

dongjoon-hyun · 2026-05-14T17:58:36Z

To @cloud-fan , sorry but it sounds like a generated answer? Are you reading my comments?

dongjoon-hyun · 2026-05-14T17:59:56Z

To be clear,

I asked you not to use the terminology, merge-forward chain.
I gave you a clear alternative, Upstream-First policy.
For the other suggestions, I already mentioned twice that I agree with you and your PR (except the above terminology)

cloud-fan · 2026-05-14T18:00:39Z

@dongjoon-hyun yes, and the previous comment was 100% hand crafted. I got your point of changing the wording of pr desc, but I want to go deeper to clarify what "released versions" mean. The last comment actually gives a different proposal of how to pick the released versions, than this PR currently implements

dongjoon-hyun · 2026-05-14T18:04:07Z

Specifically, for the following, if you want to suggest anything new to the community about the released versions, we need to discuss on dev@spark first. Otherwise, we stick to the existing policy only. It's not a thing we can decide by two PMC members.

I want to go deeper to clarify what "released versions" mean

dongjoon-hyun · 2026-05-14T18:05:42Z

As you know, this PR's code is simply to mimic the existing PMC member's behavior. I also dropped 5.0.0 in these days from the JIRA issue when I merge the PR to master/4.x/4.2. I prefer to set only 4.2.0 for both Affected and Fixed versions in that case. In that sense, I agree with this PR.

…am-First framing Extends the Fix Version default to also drop branch-M.x's minor.0 when a sibling branch-M.N contributes M.N.0 (an unreleased M.N.0 means branch-M.N has not diverged from branch-M.x yet, so M.N.0 alone covers both lineages). Reframes the comments from "merge-forward chain" to the Apache Spark "Upstream-First" backporting policy. Co-authored-by: Isaac

cloud-fan · 2026-05-14T18:17:44Z

ok new proposal implemented. The principle is actually the same as the old days, just with adoption to the new .x branch. The only difference is when the current release is between branch cut and rc cut. In the old days, e.g. when merge a PR to master and branch-4.1 when 4.1 rc not cut yet, we set released versions to [4.1.0, 4.2.0]. The new proposal is to set [4.1.0]. This is corresponding to the row master, branch-4.x, branch-4.2 -> [4.2.0] in the table above.

Let me know if this needs dev list discussion, or it's better to drop it to be safe.

cloud-fan · 2026-05-14T18:21:17Z

PR description is updated as well, hopefully it explain the philosophy of this proposed change.

dongjoon-hyun

Thank you so much.

dongjoon-hyun · 2026-05-14T19:37:42Z

cc @peter-toth , too.

peter-toth

Makes sense.

HyukjinKwon

ok LGTM

cloud-fan · 2026-05-15T03:06:40Z

The ParquetIOSuite failure is unrelated, thanks for review, merging to master (using this PR's merge script)

dongjoon-hyun requested changes May 14, 2026

View reviewed changes

cloud-fan changed the title ~~[SPARK-56865][INFRA] Drop master's N.0.0 fix-version default when a branch-M.x is also being merged~~ [SPARK-56865][INFRA] Suggest a minimal Fix Versions set when resolving JIRA May 14, 2026

dongjoon-hyun approved these changes May 14, 2026

View reviewed changes

peter-toth approved these changes May 14, 2026

View reviewed changes

HyukjinKwon approved these changes May 14, 2026

View reviewed changes

cloud-fan closed this in a9e0749 May 15, 2026

Conversation

cloud-fan commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

cloud-fan commented May 14, 2026

Uh oh!

dongjoon-hyun left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented May 14, 2026

Uh oh!

cloud-fan commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented May 14, 2026

Uh oh!

dongjoon-hyun commented May 14, 2026

Uh oh!

cloud-fan commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dongjoon-hyun commented May 14, 2026

Uh oh!

dongjoon-hyun commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cloud-fan commented May 14, 2026

Uh oh!

cloud-fan commented May 14, 2026

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented May 14, 2026

Uh oh!

peter-toth left a comment

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon left a comment

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cloud-fan commented May 14, 2026 •

edited

Loading

dongjoon-hyun left a comment •

edited

Loading

cloud-fan commented May 14, 2026 •

edited

Loading

cloud-fan commented May 14, 2026 •

edited

Loading

cloud-fan commented May 14, 2026 •

edited

Loading

dongjoon-hyun commented May 14, 2026 •

edited

Loading

cloud-fan commented May 15, 2026 •

edited

Loading