Skip to content

HDDS-15051. Incorrect DN replica reporting for unhealthy and QUASI CLOSED stuck containers in Recon.#10101

Draft
devmadhuu wants to merge 2 commits intoapache:masterfrom
devmadhuu:HDDS-15051
Draft

HDDS-15051. Incorrect DN replica reporting for unhealthy and QUASI CLOSED stuck containers in Recon.#10101
devmadhuu wants to merge 2 commits intoapache:masterfrom
devmadhuu:HDDS-15051

Conversation

@devmadhuu
Copy link
Copy Markdown
Contributor

@devmadhuu devmadhuu commented Apr 21, 2026

What changes were proposed in this pull request?

This PR fixes incorrect datanode replica details returned by Recon for unhealthy containers by switching the unhealthy-container replicas[] response from Recon’s truncated replica history to SCM’s current replica set. The response still enriches SCM replicas with Recon history metadata like first-seen and last-seen timestamps when available, but SCM is now the source of truth for replica membership.

The change adds a new StorageContainerServiceProvider.getContainerReplicas(...) API, updates ContainerEndpoint to use it for unhealthy containers, and rewrites the affected Recon endpoint tests to validate SCM-backed behavior.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-15051

How was this patch tested?

  • Updated TestContainerEndpoint to validate the new unhealthy-container behavior against SCM-backed replica responses instead of Recon-local replica history.
  • Added/adjusted assertions to cover:
    • over-replicated containers returning all current SCM replicas
    • under-replicated and mis-replicated containers returning SCM replica state/details
    • replica-mismatch containers preserving checksum validation with SCM-backed replicas
    • missing containers in /containers/unhealthy returning empty replicas[] when SCM has no current replicas
image
bash-5.1$ ozone admin container info 1
Container id: 1
Pipeline id: d09949f4-d6f1-43c6-8fed-c0f028c3f689
Write PipelineId: 8b888300-6957-4616-ac9b-22d00da202e6
Write Pipeline State: CLOSED
Container State: QUASI_CLOSED
SequenceId: 236
Datanodes: [bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default]
Replicas: [State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: 734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default]
image
bash-5.1$ ozone admin container info 1
Container id: 1
Pipeline id: fb776384-9508-4a2c-90c4-a2b16b85ec5e
Write PipelineId: 8b888300-6957-4616-ac9b-22d00da202e6
Write Pipeline State: CLOSED
Container State: QUASI_CLOSED
SequenceId: 236
Datanodes: [bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default,
915cc957-eb1e-498d-982e-298cba67a332/ozone-datanode-1.ozone_default,
d0230185-6167-4e6a-97a6-092603a335d9/ozone-datanode-2.ozone_default]
Replicas: [State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: bc882bae-79f8-4aa4-bd53-8b4e4c082128/ozone-datanode-4.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: bc882bae-79f8-4aa4-bd53-8b4e4c082128; Location: 734a2fae-ade6-4be1-a82a-dcf2741f62af/ozone-datanode-3.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: d0230185-6167-4e6a-97a6-092603a335d9; Location: 915cc957-eb1e-498d-982e-298cba67a332/ozone-datanode-1.ozone_default,
State: QUASI_CLOSED; ReplicaIndex: 0; SequenceId: 236; Origin: d0230185-6167-4e6a-97a6-092603a335d9; Location: d0230185-6167-4e6a-97a6-092603a335d9/ozone-datanode-2.ozone_default]

Devesh Kumar Singh added 2 commits April 18, 2026 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant