Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,7 @@ Each feature is documented separately — click a name below to learn configurat
| [Service Chapters](docs/features/service_chapters.md) | Quality & Warnings | Surfaces gaps: issues without PRs, unlabeled items, PRs without notes, etc. |
| [Duplicity Handling](docs/features/duplicity_handling.md) | Quality & Warnings | Marks duplicate lines when the same issue appears in multiple chapters. |
| [Tag Range Selection](docs/features/tag_range.md) | Time Range | Chooses scope via `tag-name`/`from-tag-name`. |
| [Compare Mode](docs/features/compare_mode.md) | Time Range | Graph-based commit selection via `repo.compare()` — correct for branching release histories (maintenance + develop in parallel). |
| [Date Selection](docs/features/date_selection.md) | Time Range | Chooses scope via timestamps (`published-at` vs `created-at`). |
| [Custom Row Formats](docs/features/custom_row_formats.md) | Formatting & Presentation | Controls row templates and placeholders (`{number}`, `{title}`, `{developers}`, …). |
| [Custom Chapters](docs/features/custom_chapters.md) | Formatting & Presentation | Maps labels to chapter headings; aggregates multiple labels under one title. |
Expand Down
152 changes: 152 additions & 0 deletions docs/features/compare_mode.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
# Feature: Compare Mode

## Purpose
Ensure release notes for a maintenance branch contain **only** the changes that belong
to that branch, even when a parallel development branch (`v2.7.x`) is active and produces
commits in the same timestamp window.

## The Problem Compare Mode Solves

The default (timestamp) approach asks GitHub: *"give me all commits/PRs since time T".*
That works perfectly when every release is on a single linear history. It breaks the
moment two release streams run in parallel.

**Concrete example — two active streams:**

```text
develop:
* (tag: v2.7.1) 2026-05-20 Improve Kafka consumer throughput (#1401)
* (tag: v2.7.0) 2026-05-14 Fix new service access role (#1363)
* 2026-05-07 Fix/1346 custom hive table (#1349)
|
| maintenance/v2.6.x:
| * (tag: v2.6.5) 2026-05-20 Backport: handle empty schema in Hive (#1402)
| * (tag: v2.6.4) 2026-05-14 Fix new service access role (#1363) ← cherry-pick
| * (tag: v2.6.3) 2026-05-07 Fix/1346 custom hive table (#1349) ← cherry-pick
|/
* (tag: v2.6.0) 2026-04-21 Fixes for update-ca-certificates (#1318)
```

Generating release notes for **`v2.6.5`** (previous: `v2.6.4`):

| Mode | What is fetched | Correct? |
|---|---|---|
| **Timestamp** | Everything between 2026-05-14 and 2026-05-20 on *any* branch → `#1363` (v2.7.0) + `#1401` (v2.7.1) + `#1402` | ❌ two develop PRs contaminate the patch notes |
| **Compare** | Only commits reachable from `v2.6.5` but **not** `v2.6.4` → `#1402` only | ✅ |

---

## How Compare Mode Works

### Activation

Compare mode is active **when `from-tag-name` is explicitly provided**. When it is absent
the existing timestamp path runs unchanged.

### Step 1 — Graph-based commit selection

Instead of asking "what happened after time T?", the action asks GitHub: *"what commits
exist in `tag-name` that do not exist in `from-tag-name`?"*

This is a pure graph operation — it follows the commit ancestry tree, not the clock.
The result is exactly the set of commits unique to the current release, regardless of
when they were authored or which branch they live on.

### Step 2 — PRs derived from commit messages, not from a time filter

Rather than fetching all closed PRs and filtering by timestamp, compare mode reads the
PR numbers directly from the commit messages returned in Step 1. Both common merge
styles are recognised:

- **Squash-merge:** `Fix new service access role (#1363)`
- **Merge-commit:** `Merge pull request #1363 from org/branch`

Each unique PR number is then fetched individually by number. This means only the PRs
that actually belong to the release are ever loaded.

Cherry-picks are handled automatically: the commit message on the maintenance branch
preserves the original PR number, so the right PR is always found even though the
commit SHA differs from the one on develop.

### Step 3 — Why a PR can have a date before `data.since`

When a commit is cherry-picked, the PR object that gets fetched is the *original* PR —
the one that was merged onto develop weeks or months earlier. Its `merged_at` date is
that old develop date, which is before the previous maintenance tag's timestamp
(`data.since`).

Timestamp mode would silently drop it. Compare mode keeps it, because the commit graph
(not the clock) is the authority on what belongs in the release.

### Step 4 — `data.since` is still set, but only used for issues

`data.since` is always derived from the previous release's timestamp, in both modes.
In compare mode it is **not used to filter PRs or commits** — that job is already done
by the graph in Step 1. It is only used for:

- Fetching recently-updated **issues** (issue filtering is timestamp-based in both modes)
- Date-gating **release notes extraction** from PR/issue body text

### Step 5 — The filter stage passes PRs and commits through unchanged

`FilterByRelease` — the stage that normally drops PRs and commits older than
`data.since` — detects that compare mode is active and skips that timestamp check
entirely. The PR and commit sets arriving from `mine_data` are already exact; no further
trimming is needed or correct.

Issues are always filtered by timestamp regardless of mode.

---

## Data Flow

```
from-tag-name provided?
┌──┴──────────────────────┐
YES (compare mode) NO (timestamp mode)
│ │
GitHub Compare API: get_commits(since=data.since)
commits unique to to-tag get_pulls(state=closed)
│ │
extract PR numbers FilterByRelease drops
from commit messages PRs/commits before since
fetch each PR by number
FilterByRelease: skip timestamp check — pass everything through
```

---

## Configuration

```yaml
- name: Generate Release Notes
uses: AbsaOSS/generate-release-notes@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
tag-name: "v2.6.5" # the release being generated
from-tag-name: "v2.6.4" # providing this activates compare mode
chapters: |
- {"title": "Bugfixes 🛠", "label": "bug"}
- {"title": "Features 🎉", "label": "feature"}
```

> **When to use:** always supply `from-tag-name` when releasing from a maintenance branch
> that runs in parallel with a development branch. Omitting it is fine for purely
> linear release histories.

---

## Related Features

- [Tag Range Selection](./tag_range.md) – explains the user-facing `from-tag-name` input and
its interaction with compare mode.
- [Date Selection](./date_selection.md) – controls whether `created_at` or `published_at`
is used as `data.since` (applies in both modes).
- [Release Notes Extraction](./release_notes_extraction.md) – uses `data.since` for body
scanning; unaffected by compare mode.

← [Back to Feature Tutorials](../../README.md#feature-tutorials)
1 change: 1 addition & 0 deletions docs/features/tag_range.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ https://github.com/org/repo/compare/v1.5.0...v1.6.0
(The compare URL reflects both `from-tag-name` and `tag-name`.)

## Related Features
- [Compare Mode](./compare_mode.md) – activated automatically when `from-tag-name` is set; explains why graph-based selection is needed for branching histories and how it works internally.
- [Date Selection](./date_selection.md) – defines which timestamp of the previous release becomes the cutoff.
- [Service Chapters](./service_chapters.md) – uses the same time window to assess gaps.
- [Release Notes Extraction](./release_notes_extraction.md) – only processes PRs/issues within the computed window.
Expand Down
55 changes: 34 additions & 21 deletions release_notes_generator/data/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,34 +68,47 @@ def filter(self, data: MinedData) -> MinedData:
md = MinedData(data.home_repository)
md.release = data.release
md.since = data.since
md.compare_commit_shas = data.compare_commit_shas

if data.release is not None:
logger.info("Starting issue, prs and commit reduction by the latest release since time.")

issues_dict = self._filter_issues(data)
logger.debug("Count of issues reduced from %d to %d", len(data.issues), len(issues_dict))

# filter out merged PRs and commits before the date
pulls_seen: set[int] = set()
pulls_dict: dict[PullRequest, Repository] = {}
for pull, repo in data.pull_requests.items():
if data.since and (
(pull.merged_at and pull.merged_at >= data.since)
or (pull.closed_at and pull.closed_at >= data.since)
):
if pull.number not in pulls_seen:
pulls_seen.add(pull.number)
pulls_dict[pull] = repo
logger.debug(
"Count of pulls reduced from %d to %d", len(data.pull_requests.items()), len(pulls_dict.items())
)

commits_dict = {
commit: repo
for commit, repo in data.commits.items()
if data.since and commit.commit.author.date > data.since
}
logger.debug("Count of commits reduced from %d to %d", len(data.commits.items()), len(commits_dict.items()))
if data.compare_commit_shas:
# compare mode: PR and commit sets are already exact — pass through unchanged
pulls_dict = dict(data.pull_requests)
commits_dict = dict(data.commits)
logger.debug("Compare mode: skipping PR/commit timestamp filter.")
else:
# filter out merged PRs and commits before the date
pulls_seen: set[int] = set()
pulls_dict = {}
for pull, repo in data.pull_requests.items():
if data.since and (
(pull.merged_at and pull.merged_at >= data.since)
or (pull.closed_at and pull.closed_at >= data.since)
):
if pull.number not in pulls_seen:
pulls_seen.add(pull.number)
pulls_dict[pull] = repo
logger.debug(
"Count of pulls reduced from %d to %d",
len(data.pull_requests.items()),
len(pulls_dict.items()),
)

commits_dict = {
commit: repo
for commit, repo in data.commits.items()
if data.since and commit.commit.author.date > data.since
}
logger.debug(
"Count of commits reduced from %d to %d",
len(data.commits.items()),
len(commits_dict.items()),
)
Comment thread
coderabbitai[bot] marked this conversation as resolved.

md.issues = issues_dict
md.pull_requests = pulls_dict
Expand Down
84 changes: 75 additions & 9 deletions release_notes_generator/data/miner.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
"""

import logging
import re
import sys
import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed, CancelledError
Expand All @@ -30,16 +31,21 @@
from github.Issue import Issue
from github.PullRequest import PullRequest
from github.Repository import Repository
from github.Commit import Commit as GithubCommit

from release_notes_generator.action_inputs import ActionInputs
from release_notes_generator.data.utils.bulk_sub_issue_collector import BulkSubIssueCollector

from release_notes_generator.model.record.issue_record import IssueRecord
from release_notes_generator.model.mined_data import MinedData
Comment thread
miroslavpojer marked this conversation as resolved.
from release_notes_generator.model.record.pull_request_record import PullRequestRecord
from release_notes_generator.utils.decorators import safe_call_decorator
from release_notes_generator.utils.github_rate_limiter import GithubRateLimiter
from release_notes_generator.utils.record_utils import get_id, parse_issue_id

_PR_NUMBER_RE = re.compile(r"\(#(\d+)\)|Merge pull request #(\d+)")
_COMPARE_COMMITS_MAX_RESULTS = 10_000

logger = logging.getLogger(__name__)


Expand All @@ -66,16 +72,55 @@ def mine_data(self) -> MinedData:

self._get_issues(data)

# pulls and commits, and then reduce them by the latest release since time
pull_requests = list(
self._safe_call(repo.get_pulls)(state=PullRequestRecord.PR_STATE_CLOSED, base=repo.default_branch)
)
data.pull_requests = {pr: data.home_repository for pr in pull_requests}
if data.since:
commits = list(self._safe_call(repo.get_commits)(since=data.since))
if ActionInputs.is_from_tag_name_defined():
logger.info(
"Compare mode: using repo.compare('%s', '%s').",
ActionInputs.get_from_tag_name(),
ActionInputs.get_tag_name(),
)
comparison = self._safe_call(repo.compare)(ActionInputs.get_from_tag_name(), ActionInputs.get_tag_name())
if comparison is None:
logger.error(
"Compare API returned no result for '%s'...'%s'. Ending!",
ActionInputs.get_from_tag_name(),
ActionInputs.get_tag_name(),
)
sys.exit(1)
compare_commits: list[GithubCommit] = list(comparison.commits)
total_commits = getattr(comparison, "total_commits", None)
if isinstance(total_commits, int) and total_commits > len(compare_commits):
logger.warning(
"Compare mode: retrieved %d commit(s) but comparison reports %d total; results may be truncated.",
len(compare_commits),
total_commits,
)
elif len(compare_commits) >= _COMPARE_COMMITS_MAX_RESULTS:
logger.warning(
"Compare mode: retrieved %d commit(s); comparison ranges over %d commits may be truncated.",
len(compare_commits),
_COMPARE_COMMITS_MAX_RESULTS,
)
data.compare_commit_shas = {c.sha for c in compare_commits}
data.commits = {c: data.home_repository for c in compare_commits}
Comment thread
coderabbitai[bot] marked this conversation as resolved.
pr_numbers = self._extract_pr_numbers_from_commits(compare_commits)
pulls: dict[PullRequest, Repository] = {}
for number in sorted(pr_numbers):
pr = self._safe_call(repo.get_pull)(number)
if pr is not None:
Comment thread
miroslavpojer marked this conversation as resolved.
pulls[pr] = data.home_repository
data.pull_requests = pulls
logger.info("Compare mode: found %d commit(s), %d PR(s).", len(compare_commits), len(data.pull_requests))
else:
commits = list(self._safe_call(repo.get_commits)())
data.commits = {c: data.home_repository for c in commits}
# pulls and commits, then reduce them by the latest release since time
pull_requests = list(
self._safe_call(repo.get_pulls)(state=PullRequestRecord.PR_STATE_CLOSED, base=repo.default_branch)
)
data.pull_requests = {pr: data.home_repository for pr in pull_requests}
if data.since:
commits = list(self._safe_call(repo.get_commits)(since=data.since))
else:
commits = list(self._safe_call(repo.get_commits)())
data.commits = {c: data.home_repository for c in commits}

logger.info("Initial data mining from GitHub completed.")

Expand Down Expand Up @@ -423,6 +468,27 @@ def __get_latest_semantic_release(releases) -> Optional[GitRelease]:

return rls

@staticmethod
def _extract_pr_numbers_from_commits(commits: list[GithubCommit]) -> set[int]:
"""
Extract unique PR numbers from commit messages.

Note: Only the first line (subject) of each commit message is scanned to avoid matching
references in the commit body.

Parameters:
commits: Commit objects whose messages are scanned.
Returns:
set[int]: Unique PR numbers found across all messages.
"""
pr_numbers: set[int] = set()
for commit in commits:
subject = commit.commit.message.splitlines()[0] if commit.commit.message else ""
for match in _PR_NUMBER_RE.finditer(subject):
number_str = match.group(1) or match.group(2)
pr_numbers.add(int(number_str))
Comment thread
coderabbitai[bot] marked this conversation as resolved.
return pr_numbers

@staticmethod
def __filter_duplicated_issues(data: MinedData) -> "MinedData":
"""
Expand Down
1 change: 1 addition & 0 deletions release_notes_generator/model/mined_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ def __init__(self, repository: Repository):
self.issues: dict[Issue, Repository] = {}
self.pull_requests: dict[PullRequest, Repository] = {}
self.commits: dict[Commit, Repository] = {}
self.compare_commit_shas: set[str] = set()

self.parents_sub_issues: dict[str, list[str]] = {} # parent issue id -> list of its sub-issues ids
# dictionary of fetched cross issues and their pull requests
Expand Down
Loading