Skip to content

HDDS-14580. Remove OM Prepare for Upgrade Code for ZDU#9723

Open
errose28 wants to merge 14 commits intoapache:HDDS-14496-zdufrom
errose28:HDDS-14580-remove-prepare
Open

HDDS-14580. Remove OM Prepare for Upgrade Code for ZDU#9723
errose28 wants to merge 14 commits intoapache:HDDS-14496-zdufrom
errose28:HDDS-14580-remove-prepare

Conversation

@errose28
Copy link
Contributor

@errose28 errose28 commented Feb 6, 2026

What changes were proposed in this pull request?

The prepare for upgrade step will not work in the context of ZDU. The design doc outlines a plan to upgrade the OMs without this command, so we can remove it from the ZDU feature branch. Since this is going in a branch, it is ok that we have not yet implemented the OM versioning required for guaranteed consistency without prepare for upgrade.

In this change, prepare for upgrade handling is removed from the server in a backwards compatible way. If the server receives a prepare request, it will return success but indicate that it is not supported in this server version. The CLI has been hidden on the client side, but left intact in case the new client is used to upgrade an old server.

After v100 for ZDU is added, we can add an upgrade action which clears the DB key and prepare marker file that may be left behind in the old version.

In the next major Ozone release following this change (3.0?), we can remove support for the CLI and OM API entirely.

What is the link to the Apache JIRA

HDDS-14580

How was this patch tested?

  • Passing run on my fork

  • New tests for the expected backwards compatible prepare functionality added in TestOzoneShellHA and TestOzoneManagerRequestHandler.java‎

    • Some hsync/open file tests had existing issues causing them to fail when the new prepare test was added. These have been marked unhealthy and the resolution is tracked in HDDS-14686.
  • Existing tests should pass indicating no dependency on prepare

  • There was an existing bug in PrepareSubCommand where the client object used to check each OM for status had no port explicitly configured, and was falling back to the default. This usually worked, but had never been tested with MiniOzoneCluster, where the ports are randomly assigned. The command had to be updated to account for the configured OM port due to the new MiniOzoneCluster test added.

  • TestOMUpgradeFinalization was removed because it depended on prepare to force a follower to install a snapshot. We can add a new test that covers this functionality in HDDS-14687.

@errose28 errose28 added the zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496 label Feb 6, 2026
CMD_AUDIT_ACTION_MAP.put(Type.SetVolumeProperty, OMAction.SET_OWNER);
CMD_AUDIT_ACTION_MAP.put(Type.SetBucketProperty, OMAction.UPDATE_BUCKET);
CMD_AUDIT_ACTION_MAP.put(Type.Prepare, OMAction.UPGRADE_PREPARE);
CMD_AUDIT_ACTION_MAP.put(Type.CancelPrepare, OMAction.UPGRADE_CANCEL);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@octachoron I did actually remove the audit logging for prepare here since it became a no-op write request. Although as you suggested, it would probably still be good to leave the admin check on the server side for coompleteness.

@errose28 errose28 marked this pull request as ready for review February 19, 2026 16:42
@errose28 errose28 marked this pull request as draft February 19, 2026 16:44
@errose28 errose28 marked this pull request as ready for review February 23, 2026 02:37
Copy link
Contributor

@sodonnel sodonnel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - minimal new code is added and mostly code is removed to remove the prepare feature.

@errose28
Copy link
Contributor Author

Even though the operations are no-ops, I restored the admin checks for consistency with the older version. I did this in a manner similar to what is used in OMAdminProtocolServerSideImpl, where we do not use acl.enabled as a pre-requisite for the admin checks since ACLs are for the namespace and independent of admin checks.

@sodonnel can you check the latest commit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

zdu Pull requests for Zero Downtime Upgrade (ZDU) https://issues.apache.org/jira/browse/HDDS-14496

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants