Skip to content

[SPARK-56865][INFRA] Suggest a minimal Fix Versions set when resolving JIRA#55878

Closed
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:cloud-fan/merge-script-fix-versions
Closed

[SPARK-56865][INFRA] Suggest a minimal Fix Versions set when resolving JIRA#55878
cloud-fan wants to merge 2 commits into
apache:masterfrom
cloud-fan:cloud-fan/merge-script-fix-versions

Conversation

@cloud-fan
Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan commented May 14, 2026

What changes were proposed in this pull request?

When dev/merge_spark_pr.py resolves a JIRA after merging a PR to multiple branches, it computes a default Fix Versions list from the merge targets. This PR rewrites that computation to produce the minimal set of Fix Versions that tells users which Spark releases will contain the commit, leveraging the Apache Spark Upstream-First backporting policy (cherry-picks flow master -> branch-M.x -> branch-M.N).

Each merge target contributes a candidate version:

  • master -> the greatest unreleased N.0.0
  • branch-M.x -> that major's greatest unreleased M.minor.0
  • branch-M.N -> its greatest unreleased M.N.patch

Two suppression rules then trim redundancy:

  1. Drop master's N.0.0 when any branch-M.x is in the merge set: a cherry-pick to branch-M.x has already landed on master, so the next master release is implied.
  2. Drop branch-M.x's minor.0 when a sibling branch-M.N contributes M.N.0: an unreleased M.N.0 means branch-M.N has not diverged from branch-M.x yet, so M.N.0 alone covers both lineages.

Sanity check against the current unreleased SPARK versions (5.0.0, 4.3.0, 4.2.0, 4.1.2, 4.0.3, 3.5.9) plus the hypothetical "4.2.0 has released" scenarios raised in the review discussion:

Merge branches Before After
master [5.0.0] [5.0.0]
master, branch-4.x [5.0.0, 4.3.0] [4.3.0]
master, branch-4.x, branch-4.2 (4.2.0 unreleased) [5.0.0, 4.2.0] [4.2.0]
master, branch-4.x, branch-4.2 (4.2.0 released) [5.0.0, 4.2.1] [4.3.0, 4.2.1]
master, branch-4.2 (4.2.0 unreleased) [5.0.0, 4.2.0] [5.0.0, 4.2.0]
master, branch-4.2 (4.2.0 released) [5.0.0, 4.2.1] [5.0.0, 4.2.1]

The rules are parameterized on M and N, so they generalize to any major/minor pair as the project moves forward (e.g. post-5.0.0 master + branch-5.x + branch-5.1 with 5.1.0 unreleased -> [5.1.0]; double-digit minors like branch-4.11 work the same way).

Why are the changes needed?

The previous defaults forced committers to manually edit Fix Versions in multiple common cases:

  • master + branch-M.x: strip the extra N.0.0.
  • master + branch-M.x + branch-M.N (just-cut branch-M.N): the change becomes [M.N.0] rather than [5.0.0, M.N.0] or [M.minor.0, M.N.0].

Both are now produced as defaults that match the convention Spark committers apply by hand.

Does this PR introduce any user-facing change?

No. The committer prompt still lets the user override the defaults, so any committer who prefers the old behavior can type it in. Only the suggested default changes.

How was this patch tested?

  • python3 -m doctest dev/merge_spark_pr.py -v -- 34 doctests pass, including 16 cases in compute_merge_default_fix_versions covering all six rows in the table plus the existing scenarios.
  • Manually traced future scenarios (post-4.3.0, post-5.0.0, mixed cross-major, double-digit branch-4.11) to confirm the rules generalize across M and N.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code (Claude Opus 4.7)

…ranch-M.x is also being merged

When the merge script targets master plus a branch-M.x integration dev branch,
master's N.0.0 entry is redundant because branch-M.x -> master is a merge-forward
chain, so master's next release inherits anything that lands on branch-M.x. The
prior `_dedupe_fix_versions_adjacent_major_zero` rule also wrongly dropped the
branch-M.x entry whenever the corresponding branch-M.N release branch was in the
merge set, because both versions ended up in the defaults list. Replace it with
a rule that skips master's contribution only when a branch-M.x is present, and
let dict.fromkeys handle exact duplicates.
@cloud-fan
Copy link
Copy Markdown
Contributor Author

cc @HyukjinKwon @dongjoon-hyun

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To @cloud-fan , I agree with the purpose and functionality of this PR.

However, let's not make a confusion to the community via this PR because Apache Spark community's backporting policy has never been merge-forward chain. Of course, we allow some exceptions, but that's not the general policy.

branch-M.x -> master is a merge-forward chain

If we want to choose a terminology, the Apache Spark's backporting policy has been Upstream-First policy always to prevent a regression.

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cloud-fan commented May 14, 2026

@dongjoon-hyun do you have a concrete proposal? Look at the table, when we merge a PR to master, branch-4.x, branch-4.2, it's wrong to set released version to [5.0.0, 4.2.0], which means the feature is in 5.0.0 release notes, not 4.3.0.

Another idea is to be verbose. Merged to 4.x means 4.3, merged to master means 5.0. The case above will become [5.0.0, 4.3.0, 4.2.0]

@dongjoon-hyun
Copy link
Copy Markdown
Member

@cloud-fan , what I suggesting you is simply revising your wording in the phrase, branch-M.x -> master is a merge-forward chain. As I mentioned, I agree with your PR's the purpose and functionality itself already.

@dongjoon-hyun do you have a concrete proposal?

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cloud-fan commented May 14, 2026

@dongjoon-hyun Your suggestion makes me think what should be the rule of deciding the released versions. In the past, when we merge a PR to master only, the released version is the next feature release version (e.g. 4.1.0), which implies that all releases after 4.1.0 will contains the commit. If the PR is also backported to a release branch, like branch-4.0, we additionally include the next maintenance release version (e.g. 4.0.2), which indicates that releases before 4.0.2 do not have the commit.

So the principle is to use a minimal set of released versions to tell users which Spark versions contain the commit. That being said:

Merge branches released versions
master [5.0.0]
master, branch-4.x [4.3.0]
master, branch-4.x, branch-4.2 [4.2.0]
master, branch-4.x, branch-4.2 if 4.2.0 has released [4.3.0, 4.2.1]
master, branch-4.2 [5.0.0, 4.2.0]
master, branch-4.2 if 4.2.0 has released [5.0.0, 4.2.1]

The last 2 rows are mostly for completeness, but it should not happen in practise.

How do you think of this principle? No backport or forward-merge wording, just explain released versions from a user-facing view.

@dongjoon-hyun
Copy link
Copy Markdown
Member

To @cloud-fan , sorry but it sounds like a generated answer? Are you reading my comments?

@dongjoon-hyun
Copy link
Copy Markdown
Member

To be clear,

  • I asked you not to use the terminology, merge-forward chain.
  • I gave you a clear alternative, Upstream-First policy.
  • For the other suggestions, I already mentioned twice that I agree with you and your PR (except the above terminology)

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cloud-fan commented May 14, 2026

@dongjoon-hyun yes, and the previous comment was 100% hand crafted. I got your point of changing the wording of pr desc, but I want to go deeper to clarify what "released versions" mean. The last comment actually gives a different proposal of how to pick the released versions, than this PR currently implements

@dongjoon-hyun
Copy link
Copy Markdown
Member

Specifically, for the following, if you want to suggest anything new to the community about the released versions, we need to discuss on dev@spark first. Otherwise, we stick to the existing policy only. It's not a thing we can decide by two PMC members.

I want to go deeper to clarify what "released versions" mean

@dongjoon-hyun
Copy link
Copy Markdown
Member

dongjoon-hyun commented May 14, 2026

As you know, this PR's code is simply to mimic the existing PMC member's behavior. I also dropped 5.0.0 in these days from the JIRA issue when I merge the PR to master/4.x/4.2. I prefer to set only 4.2.0 for both Affected and Fixed versions in that case. In that sense, I agree with this PR.

…am-First framing

Extends the Fix Version default to also drop branch-M.x's minor.0 when a
sibling branch-M.N contributes M.N.0 (an unreleased M.N.0 means branch-M.N
has not diverged from branch-M.x yet, so M.N.0 alone covers both lineages).
Reframes the comments from "merge-forward chain" to the Apache Spark
"Upstream-First" backporting policy.

Co-authored-by: Isaac
@cloud-fan
Copy link
Copy Markdown
Contributor Author

ok new proposal implemented. The principle is actually the same as the old days, just with adoption to the new .x branch. The only difference is when the current release is between branch cut and rc cut. In the old days, e.g. when merge a PR to master and branch-4.1 when 4.1 rc not cut yet, we set released versions to [4.1.0, 4.2.0]. The new proposal is to set [4.1.0]. This is corresponding to the row master, branch-4.x, branch-4.2 -> [4.2.0] in the table above.

Let me know if this needs dev list discussion, or it's better to drop it to be safe.

@cloud-fan cloud-fan changed the title [SPARK-56865][INFRA] Drop master's N.0.0 fix-version default when a branch-M.x is also being merged [SPARK-56865][INFRA] Suggest a minimal Fix Versions set when resolving JIRA May 14, 2026
@cloud-fan
Copy link
Copy Markdown
Contributor Author

PR description is updated as well, hopefully it explain the philosophy of this proposed change.

Copy link
Copy Markdown
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much.

@dongjoon-hyun
Copy link
Copy Markdown
Member

cc @peter-toth , too.

Copy link
Copy Markdown
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Copy Markdown
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok LGTM

@cloud-fan
Copy link
Copy Markdown
Contributor Author

cloud-fan commented May 15, 2026

The ParquetIOSuite failure is unrelated, thanks for review, merging to master (using this PR's merge script)

@cloud-fan cloud-fan closed this in a9e0749 May 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants