Skip to content

feat: GDB Failover#625

Open
karenc-bq wants to merge 8 commits intodev/v3from
feat/gdb-failover
Open

feat: GDB Failover#625
karenc-bq wants to merge 8 commits intodev/v3from
feat/gdb-failover

Conversation

@karenc-bq
Copy link
Copy Markdown
Contributor

Summary

Description

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@karenc-bq karenc-bq requested a review from a team as a code owner March 13, 2026 18:09
@karenc-bq karenc-bq changed the base branch from main to refactor/monitor-service March 13, 2026 18:10
@karenc-bq karenc-bq force-pushed the feat/gdb-failover branch 3 times, most recently from cc4c6aa to bcc243b Compare March 15, 2026 08:39
@karenc-bq karenc-bq force-pushed the refactor/monitor-service branch from 81930fe to 00d8176 Compare March 15, 2026 09:18
@karenc-bq karenc-bq force-pushed the feat/gdb-failover branch 2 times, most recently from 5b7069a to 14c1bb3 Compare March 15, 2026 12:52
}

forceMonitoringRefresh(shouldVerifyWriter: boolean, timeoutMs: number): Promise<HostInfo[]> {
throw new AwsWrapperError("ConnectionStringHostListProvider does not support forceMonitoringRefresh.");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we place this in messages?

expect(returnConn).toBe(mockInitialClientWrapper);
});

it("test_get_verified_connection_cluster_inet_address_none", async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add some tests for the new functionality of stale dns helper?

1320000
);
});
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about adding some tests for when the failover would not be successful? ie. out of region or no allowed hosts?

}

// Check writer region to determine failover mode
const writerRegion = this.rdsHelper.getRdsRegion(writerCandidate.host);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this return null? Would we have issues if it does?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equalsIgnoreCase has null checks and would return false if writerRegion is null

@karenc-bq karenc-bq changed the base branch from refactor/monitor-service to dev/v3 March 26, 2026 17:32

const value = cache.get(key);
if (value === null || value === undefined) {
if (!value) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering about this change? Example: if the values in the cache were booleans then False would become null


try {
const updatedHostList: HostInfo[] = await hostListProvider.forceMonitoringRefresh(shouldVerifyWriter, timeoutMs);
const updatedHostList: HostInfo[] = await (hostListProvider as any).forceMonitoringRefresh(shouldVerifyWriter, timeoutMs);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const updatedHostList: HostInfo[] = await (hostListProvider as any).forceMonitoringRefresh(shouldVerifyWriter, timeoutMs);
const updatedHostList: HostInfo[] = await (hostListProvider as DynamicHostListProvider).forceMonitoringRefresh(shouldVerifyWriter, timeoutMs);

Why are we casting to any instead of DynamicHostListProvider?

"GlobalDbFailoverPlugin.isHomeRegion": "Is home region: %s",
"GlobalDbFailoverPlugin.currentFailoverMode": "Current Global DB failover mode: %s",
"GlobalDbFailoverPlugin.failoverElapsed": "Global DB failover elapsed time: %s ms",
"GlobalDbFailoverPlugin.candidateNull": "Candidate host is null for role: %s",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we could find a better description and name for this error. It's not really capturing the cause which is that we couldn't find a host given the strategy and allowed host list.

Comment on lines +244 to +245
if (!(ex instanceof FailoverFailedError)) {
this.failoverWriterFailedCounter.inc();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing could we maybe add a comment explaining that we only want to increment the counter if we haven't already marked it and then thrown a failover failed error? Also what is the benefit to marking it above instead of just always incrementing it here? Is it about timing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants