HDDS-14989. Delay follower SCM DN server start until Ratis log catch-up#10059
HDDS-14989. Delay follower SCM DN server start until Ratis log catch-up#10059xichen01 wants to merge 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Thanks @xichen01 for the optimization. This optimization would allow to reduce the possibility to NO_REPLICA_FOUND when doing startup (e.g. we are doing rolling SCM restarts).
However, I think this is a fundamental design issue of SCM where some information need to be replicated, usually user-facing ones (e.g. creating Container Info and updating Container State), but some information is only updated locally from the heartbeat and updated in best-effort (e.g. container replica, the DN -> List mappings between node and containers). This hybrid combinations of eventually consistent heartbeat mechanism and strongly consistent Ratis potentially can introduce similar subtle bugs. For example
- The container replica might be recorded under the NodeStateManager, but not in ContainerStateManager (e.g. due NotLeaderException).
- If container 1000 has been created in the leader but follower has not seen it, when the datanode sends a report on the replica for container 1000, the follower will reject it with ContainerNotFoundException. If the follower becomes a leader immediately after, container 1000 will not record this replica.
In the future, we need to revisit the SCM heartbeat and Ratis design for container management.
This patch should improve the situation. Nevertheless, after the initial assertion before starting the DN RPC server, there is no guarantee that this issue will not happen under network partition between SCMs (e.g. SCM leader and SCM follower is partitioned, but the SCM follower can still have heartbeats with the DNs.
We might need to discuss for a more long-term solution.
| for (RaftProtos.CommitInfoProto info : division.getCommitInfos()) { | ||
| if (info.getServer().getId().equals(leaderId.toByteString())) { | ||
| long leaderCommit = info.getCommitIndex(); | ||
| boolean caughtUp = lastAppliedIndex >= leaderCommit; |
There was a problem hiding this comment.
Let's apply the fix where we need to take into account the metadata entry.
What changes were proposed in this pull request?
Context
Fixed a bug where reading the key could result in a NO_REPLICA_FOUND error during SCM restart and leader transfer.
For details:
https://issues.apache.org/jira/browse/HDDS-14989
Or can see
testFollowerCatchupAfterContainerClosefor detailed reproduction wayFix
updateContainerState.What is the link to the Apache JIRA'
https://issues.apache.org/jira/browse/HDDS-14989
How was this patch tested?
new test