Skip to content

<feature>[snapshot]: idempotent commit-delete via status=Deleting latch#4060

Open
zstack-robot-2 wants to merge 1 commit into
zsv_5.1.0from
sync/tao.gan/fix/ZSV-10538-idempotent-delete
Open

<feature>[snapshot]: idempotent commit-delete via status=Deleting latch#4060
zstack-robot-2 wants to merge 1 commit into
zsv_5.1.0from
sync/tao.gan/fix/ZSV-10538-idempotent-delete

Conversation

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Resolves: ZSV-10538

Change-Id: Ib3589ce5551984fa25b1562c745474437f7d1669

sync from gitlab !9958

Resolves: ZSV-10538

Change-Id: Ib3589ce5551984fa25b1562c745474437f7d1669
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 22, 2026

Review Change Stack

总览

此 PR 实现 ZSV-10538,将快照删除改造为幂等可中断恢复模式。通过添加 deletingSince 时间戳,扩展删除状态操作白名单,在流程前置锁定状态,并在中断点补充恢复逻辑和告警机制。

变更

快照删除幂等性与中断恢复

层级 / 文件 总结
数据模型与字段扩展
conf/db/upgrade/V4.8.0.8__schema.sql, header/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotAO.java, header/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotAO_.java
添加 deletingSince 时间戳列、JPA 字段和 getter/setter 方法,为幂等判断和恢复识别提供持久化依据。
错误码与操作白名单
storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotErrors.java, storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java
新增 DELETING_IN_PROGRESS、DELETING_RETRY_HINT、OPERATION_REJECTED_DURING_DELETING 错误码;扩展 VolumeSnapshotStatus.Deleting 状态下的允许操作白名单以支持删除、恢复和只读查询,其他写操作返回专用拒绝错误码。
单快照删除幂等实现
storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java
deleteSingleFlows 前置 NoRollbackFlow 将状态锁定为 Deleting 并记录时间;stepDelete 每轮重新读取状态并检查根节点是否已删除;handle(APIDeleteVolumeSnapshotMsg) 基于持久化状态判断是首次还是恢复调用,支持从中断点继续。
快照组删除与告警
storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java
在删除快照组前检查成员中是否存在删除中的快照,若有则输出告警提示此次删除为恢复操作。
运维对账与启动告警
storage/src/main/java/org/zstack/storage/snapshot/SnapshotReconcileOnStart.java
新增 ManagementNodeReadyExtensionPoint 实现,在管理节点启动时扫描所有 Deleting 状态快照并输出告警,包含 uuid、name、volumeUuid、deletingSince 等定位信息。
国际化文案
conf/i18n/messages_en_US.properties, conf/i18n/messages_zh_CN.properties
添加删除状态下拒绝操作的英文和中文提示消息,包含快照标识、操作名称和允许操作列表占位符。

🎯 3 (中等复杂度) | ⏱️ ~25 分钟

🐰 快照再也不怕中断卡住,
前置锁定态与时间戳
幂等恢复讲故事,
告警运维心不迟!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 5.88% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed PR标题清晰地概括了主要变更:通过status=Deleting闭锁实现幂等提交删除,与文件摘要中的核心实现一致。
Description check ✅ Passed PR描述关联到ZSV-10538问题和gitlab同步信息,与变更内容相关,虽然细节不多但足够说明来源和意图。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/tao.gan/fix/ZSV-10538-idempotent-delete

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/snapshot/SnapshotReconcileOnStart.java (1)

33-49: 建议补充可监控指标而不仅是日志告警

当前实现可用,但建议同步上报一个启动对账指标(如 deleting 快照数量、最早 deletingSince),便于监控系统做阈值告警,避免仅靠日志被动发现。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@storage/src/main/java/org/zstack/storage/snapshot/SnapshotReconcileOnStart.java`
around lines 33 - 49, The code in SnapshotReconcileOnStart currently logs
snapshots stuck in Deleting (variable stuck) but doesn't emit metrics; modify
SnapshotReconcileOnStart to report at least two monitoring metrics when stuck is
non-empty: a gauge for the count of deleting snapshots (stuck.size()) and a
timestamp or age metric representing the earliest deletingSince among the
VolumeSnapshotVOs, in addition to the existing logger.warn calls; use the
project metric/reporting API (e.g., your Metrics or MetricReporter helper) to
send a numeric gauge named like snapshot.deleting.count and
snapshot.deleting.oldest_timestamp (or snapshot.deleting.max_age) so alerting
systems can pick it up, ensuring the metric emission occurs in the same
conditional block that handles stuck and references stuck,
VolumeSnapshotVO.getDeletingSince(), and SnapshotReconcileOnStart.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@conf/i18n/messages_zh_CN.properties`:
- Line 4834: Update the translation value for the key starting with
"snapshot[uuid\:%s,\ name\:%s]\ is\ in\ Deleting\ status,\ operation[%s]\ is\
rejected;\ allowed\ messages%s" so the Chinese text inserts a separator before
the placeholder {3}; change "当前状态允许的消息列表{3}" to "当前状态允许的消息列表:{3}" to avoid the
placeholder being concatenated with the preceding text.

---

Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/snapshot/SnapshotReconcileOnStart.java`:
- Around line 33-49: The code in SnapshotReconcileOnStart currently logs
snapshots stuck in Deleting (variable stuck) but doesn't emit metrics; modify
SnapshotReconcileOnStart to report at least two monitoring metrics when stuck is
non-empty: a gauge for the count of deleting snapshots (stuck.size()) and a
timestamp or age metric representing the earliest deletingSince among the
VolumeSnapshotVOs, in addition to the existing logger.warn calls; use the
project metric/reporting API (e.g., your Metrics or MetricReporter helper) to
send a numeric gauge named like snapshot.deleting.count and
snapshot.deleting.oldest_timestamp (or snapshot.deleting.max_age) so alerting
systems can pick it up, ensuring the metric emission occurs in the same
conditional block that handles stuck and references stuck,
VolumeSnapshotVO.getDeletingSince(), and SnapshotReconcileOnStart.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: f7e70bd7-0052-446f-9751-45f357391b21

📥 Commits

Reviewing files that changed from the base of the PR and between fe6e5e1 and b5d460b.

⛔ Files ignored due to path filters (1)
  • conf/springConfigXml/volumeSnapshot.xml is excluded by !**/*.xml
📒 Files selected for processing (9)
  • conf/db/upgrade/V4.8.0.8__schema.sql
  • conf/i18n/messages_en_US.properties
  • conf/i18n/messages_zh_CN.properties
  • header/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotAO.java
  • header/src/main/java/org/zstack/header/storage/snapshot/VolumeSnapshotAO_.java
  • storage/src/main/java/org/zstack/storage/snapshot/SnapshotReconcileOnStart.java
  • storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotErrors.java
  • storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java
  • storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.java

LessThan = 小于
LessThanOrEqualTo = 小于等于
resource[%s]\ doesn't\ support\ zwatch\ return\ with\ clause = 资源[{0}]不支持ZWatch Return WITH子句
snapshot[uuid\:%s,\ name\:%s]\ is\ in\ Deleting\ status,\ operation[%s]\ is\ rejected;\ allowed\ messages%s = 快照[uuid:{0}, name:{1}]处于Deleting状态,拒绝操作[{2}];当前状态允许的消息列表{3}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

建议在“消息列表”前补充分隔符,避免占位符直连文本。

当前中文文案为“当前状态允许的消息列表{3}”,建议改为“当前状态允许的消息列表:{3}”,提升用户阅读体验。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@conf/i18n/messages_zh_CN.properties` at line 4834, Update the translation
value for the key starting with "snapshot[uuid\:%s,\ name\:%s]\ is\ in\
Deleting\ status,\ operation[%s]\ is\ rejected;\ allowed\ messages%s" so the
Chinese text inserts a separator before the placeholder {3}; change
"当前状态允许的消息列表{3}" to "当前状态允许的消息列表:{3}" to avoid the placeholder being
concatenated with the preceding text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants