gpstate: fix false "Acting as Primary" report for hot standby mirrors#1743
Open
jangjang0401 wants to merge 1 commit into
Open
gpstate: fix false "Acting as Primary" report for hot standby mirrors#1743jangjang0401 wants to merge 1 commit into
jangjang0401 wants to merge 1 commit into
Conversation
With hot_standby=on, mirrors accept SQL connections and return PQPING_OK from pg_isready. The previous code unconditionally mapped PQPING_OK + role=mirror to "Acting as Primary", causing gpstate -s to show a spurious warning on every mirror. Fix this in clsSystemState.__buildGpStateData() by cross-checking pg_stat_replication on the primary: if the mirror has an active WAL receiver connection in streaming or catchup state, it is a legitimate hot standby and the status is corrected to "Up". Only fall through to "Acting as Primary" when no such replication connection exists, meaning the segment truly promoted itself to primary. This approach reuses the existing primary connection already established by _add_replication_info(), so no additional database connections are required. _add_replication_info() is updated to return the raw replication state string to make this information available to the caller.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
With hot_standby=on, mirrors accept SQL connections and return PQPING_OK
from pg_isready. The previous code unconditionally mapped PQPING_OK +
role=mirror to "Acting as Primary", causing gpstate -s to show a spurious
warning on every mirror.
Fix this in clsSystemState.__buildGpStateData() by cross-checking
pg_stat_replication on the primary: if the mirror has an active WAL
receiver connection in streaming or catchup state, it is a legitimate
hot standby and the status is corrected to "Up". Only fall through to
"Acting as Primary" when no such replication connection exists, meaning
the segment truly promoted itself to primary.
This approach reuses the existing primary connection already established
by _add_replication_info(), so no additional database connections are
required. _add_replication_info() is updated to return the raw
replication state string to make this information available to the
caller.
What does this PR do?
Fixes a false positive in
gpstate -swhere every mirror segment isreported as "Acting as Primary" when
hot_standby=onis enabled.Root cause:
hot_standby=on, mirrors accept read-only SQL connections andpg_isreadyreturnsPQPING_OK(0)instead ofPQPING_MIRROR_READY(64).gpgetstatususingtransition.pyassumedPQPING_OK + role=mirrorcould only mean "the mirror was promoted toprimary", which is no longer true under hot standby.
Fix:
clsSystemState.__buildGpStateData(), cross-check the mirror'sstatus against
pg_stat_replication(already queried on the primaryby
_add_replication_info()).streamingorcatchup), it is a legitimate hot standby → corrected to"Up".promoted itself →
"Acting as Primary"is preserved.Type of Change
Breaking Changes
None. The fix only changes how the status string is derived in
gpstate; cluster configuration, replication behavior, and segmentbehavior are unchanged.
Test Plan
Tested on a 5-VM cluster (1 coordinator + 1 standby coordinator + 3
segment hosts, 6 primary / 6 mirror segments) with
hot_standby=on:Before the fix:
Segment status = Acting as Primarygpstate -sproduced spurious warnings on every mirrorgp_segment_configuration, replication state, andgpstate -mallshowed the cluster as healthy — only the
gpstate -soutput was wrongAfter the fix:
Segment status = UpImpact
Performance:
No additional database connections. The fix consumes data that
_add_replication_info()already collects from the primary; only thereturn value of that function is changed.
User-facing changes:
gpstate -sno longer reports false "Acting as Primary" warnings forhealthy hot-standby mirrors. Genuine promotion is still detected and
reported as before.
Dependencies:
None.
Checklist
Additional Context
An alternative approach considered was to call
pg_is_in_recovery()directly on each mirror inside
_get_segment_status()ingpgetstatususingtransition.py. That approach was rejected because:gpstateinvocation, which is undesirable in unhealthy-cluster scenarios
where
gpstateis run most frequently.primary's view of its replication connections.
The
pg_stat_replicationapproach reuses an existing connection and isauthoritative from the primary's perspective.