<fix>[storage]: symmetric snapshot group disband on chain delete#4043
<fix>[storage]: symmetric snapshot group disband on chain delete#4043ZStack-Robot wants to merge 2 commits into
Conversation
ungroupAfterDeleted now mirrors ungroupAfterDeleteSingleSnapshot: regardless of root/data volume type, a snapshot group VO is only disbanded after ALL of its refs reach snapshotDeleted=true. Previously the root-volume path immediately removed the group VO once the root chain finished, leaving data-volume refs as orphans pointing to a non-existent group. After this change the group lives until every ref is marked deleted, which is the symmetric behavior with the single-snapshot path and the precondition the integrity check (landed in a separate commit) relies on. Resolves: ZSV-10538 Change-Id: I77707a78706271697463676a756c617a796f6c61
Once symmetric disband is in place, an incomplete snapshot group (some
refs deleted, some still alive on the VM) is sticky and must not be
silently ignored by subsequent operations on the same VM.
Block on the following entries when the VM has any incomplete group
(the incomplete group itself is excluded from the check, so users
always have a path to clear the debt):
- APIDeleteVolumeSnapshotGroupMsg (other groups): blocked unless
a new boolean force=true is passed. force is added to the API
as the operator bypass; other entries deliberately do NOT expose
force.
- APICreateVolumeSnapshotGroupMsg: blocked.
- APIAttachDataVolumeToVmMsg / APIDetachDataVolumeFromVmMsg: blocked.
APIDestroyVmInstanceMsg is intentionally NOT blocked: the new
VolumeSnapshotGroupCascadeExtension keys on VmInstanceVO and
force-cleans every group VO (refs -> archived metadata -> host
backup files -> group VO) when the VM is destroyed, so VM teardown
is never wedged by leftover incomplete groups.
Single-snapshot APIs are also exempt by design.
Resolves: ZSV-10538
Change-Id: I717363667067726a796467787568756d6f7a706e
WalkthroughPR 在卷快照组删除流程中增加了强制删除参数和完整性保护。APIDeleteVolumeSnapshotGroupMsg 定义了 force 参数,VolumeSnapshotGroupChecker 提供快照组完整性检查,删除前置校验和 API 拦截阻止不安全删除,级联扩展处理 VM 删除时的清理,快照删除后重写清理逻辑实现对称处理。 Changes卷快照组删除保护与清理
🎯 3 (Moderate) | ⏱️ ~25 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java (1)
46-68: ⚡ Quick win建议把逐组两次
count()的查询改为批量聚合,降低热点路径开销当前实现对每个快照组都发起两次计数查询,在 VM 组数量较多时会放大 DB 往返次数。建议一次性拉取 refs 后聚合(或使用分组统计),保持语义不变并减少拦截链路延迟。
可参考的重构方向
- List<String> incomplete = new ArrayList<>(); - for (Object o : groupUuids) { - String guuid = o.toString(); - if (guuid.equals(excludeGroupUuid)) { - continue; - } - long deletedRefs = Q.New(VolumeSnapshotGroupRefVO.class) - .eq(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, guuid) - .eq(VolumeSnapshotGroupRefVO_.snapshotDeleted, true).count(); - if (deletedRefs == 0) { - continue; - } - long totalRefs = Q.New(VolumeSnapshotGroupRefVO.class) - .eq(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, guuid).count(); - if (deletedRefs < totalRefs) { - incomplete.add(guuid); - } - } - return incomplete; + List<Tuple> refs = Q.New(VolumeSnapshotGroupRefVO.class) + .in(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, groupUuids) + .select(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, VolumeSnapshotGroupRefVO_.snapshotDeleted) + .listTuple(); + + Map<String, long[]> stats = new HashMap<>(); + for (Tuple t : refs) { + String guuid = t.get(0, String.class); + boolean deleted = t.get(1, Boolean.class); + long[] s = stats.computeIfAbsent(guuid, k -> new long[2]); // [deleted, total] + if (deleted) { + s[0]++; + } + s[1]++; + } + + return stats.entrySet().stream() + .filter(e -> !Objects.equals(e.getKey(), excludeGroupUuid)) + .filter(e -> e.getValue()[0] > 0 && e.getValue()[0] < e.getValue()[1]) + .map(Map.Entry::getKey) + .collect(Collectors.toList());🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java` around lines 46 - 68, The loop in VolumeSnapshotGroupChecker that calls two per-group Q.count() queries on VolumeSnapshotGroupRefVO (one for snapshotDeleted=true and one for total) is causing DB hot spots; replace it with a single batched/grouped query that fetches counts per volumeSnapshotGroupUuid (e.g., SELECT volumeSnapshotGroupUuid, SUM(snapshotDeleted), COUNT(*) GROUP BY volumeSnapshotGroupUuid) for all groupUuids at once, then iterate the returned aggregated results to populate the existing incomplete list (keeping the excludeGroupUuid check and semantics of comparing deletedRefs < totalRefs) instead of issuing per-group queries in the for-loop over groupUuids.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java`:
- Around line 2155-2175: The code that computes remaining refs for a group uses
VolumeSnapshotGroupRefVO.snapshotDeleted to decide whether a group can be
disbanded, but memory snapshot VOs deleted by cleanUpMemorySnapshots() are
removed without setting their VolumeSnapshotGroupRefVO.snapshotDeleted=true, so
remaining will remain >0 and groups never disband; fix by ensuring memory
snapshots removed in cleanUpMemorySnapshots() also mark their corresponding
VolumeSnapshotGroupRefVO.snapshotDeleted=true (or add their group ref UUIDs to
the same snapshots-to-delete set before the remaining-count check), i.e. update
cleanUpMemorySnapshots() or the deletion flow to reuse the group-ref clearing
logic used here so VolumeSnapshotGroupRefVO entries reflect the deletion and
allow groupsToDelete to be computed correctly.
---
Nitpick comments:
In
`@storage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.java`:
- Around line 46-68: The loop in VolumeSnapshotGroupChecker that calls two
per-group Q.count() queries on VolumeSnapshotGroupRefVO (one for
snapshotDeleted=true and one for total) is causing DB hot spots; replace it with
a single batched/grouped query that fetches counts per volumeSnapshotGroupUuid
(e.g., SELECT volumeSnapshotGroupUuid, SUM(snapshotDeleted), COUNT(*) GROUP BY
volumeSnapshotGroupUuid) for all groupUuids at once, then iterate the returned
aggregated results to populate the existing incomplete list (keeping the
excludeGroupUuid check and semantics of comparing deletedRefs < totalRefs)
instead of issuing per-group queries in the for-loop over groupUuids.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: fd11e4ef-283b-4cdb-90e4-a702f059e26d
⛔ Files ignored due to path filters (1)
conf/springConfigXml/volumeSnapshot.xmlis excluded by!**/*.xml
📒 Files selected for processing (6)
header/src/main/java/org/zstack/header/storage/snapshot/group/APIDeleteVolumeSnapshotGroupMsg.javastorage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.javastorage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupBase.javastorage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupCascadeExtension.javastorage/src/main/java/org/zstack/storage/snapshot/group/VolumeSnapshotGroupChecker.javastorage/src/main/java/org/zstack/storage/volume/VolumeApiInterceptor.java
| Set<String> touchedGroupUuids = snapshots.stream() | ||
| .map(VolumeSnapshotInventory::getGroupUuid) | ||
| .filter(Objects::nonNull) | ||
| .collect(Collectors.toSet()); | ||
|
|
||
| List<String> groupsToDelete = new ArrayList<>(); | ||
| for (String groupUuid : touchedGroupUuids) { | ||
| long remaining = Q.New(VolumeSnapshotGroupRefVO.class) | ||
| .eq(VolumeSnapshotGroupRefVO_.volumeSnapshotGroupUuid, groupUuid) | ||
| .eq(VolumeSnapshotGroupRefVO_.snapshotDeleted, false).count(); | ||
| if (remaining == 0) { | ||
| logger.debug(String.format("snapshot group[uuid:%s] all volume snapshots have been deleted, " + | ||
| "disbanding group", groupUuid)); | ||
| groupsToDelete.add(groupUuid); | ||
| } | ||
| } | ||
|
|
||
| groupUuids.forEach(groupUuid -> vidm.deleteArchiveVmInstanceResourceMetadataGroup(groupUuid)); | ||
| cleanVmHostBackupFilesForGroup(groupUuids); | ||
| dbf.removeByPrimaryKeys(groupUuids, VolumeSnapshotGroupVO.class); | ||
| if (!groupsToDelete.isEmpty()) { | ||
| groupsToDelete.forEach(groupUuid -> vidm.deleteArchiveVmInstanceResourceMetadataGroup(groupUuid)); | ||
| cleanVmHostBackupFilesForGroup(groupsToDelete); | ||
| dbf.removeByPrimaryKeys(groupsToDelete, VolumeSnapshotGroupVO.class); |
There was a problem hiding this comment.
这里会把同组的 Memory ref 永久算成“未删”,导致快照组无法解散。
Line 2162 开始的 remaining 统计会把整组里所有 snapshotDeleted=false 的 ref 都算进去,但这里前面只标记了 snapshots 本身的 ref。链路删除 root/data 快照时,同组的 memory snapshot 会在后面的 cleanUpMemorySnapshots() 里被直接硬删 VO/tree,却不会把对应的 VolumeSnapshotGroupRefVO.snapshotDeleted 置为 true。这样只要组里带有 memory ref,这里就会一直得到 remaining > 0,最终把该 VM 长期留在“不完整快照组”状态。建议把本次一并删除的 memory snapshot ref 也纳入标记,或者让 cleanUpMemorySnapshots() 复用同样的 group-ref 清理逻辑。
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@storage/src/main/java/org/zstack/storage/snapshot/VolumeSnapshotTreeBase.java`
around lines 2155 - 2175, The code that computes remaining refs for a group uses
VolumeSnapshotGroupRefVO.snapshotDeleted to decide whether a group can be
disbanded, but memory snapshot VOs deleted by cleanUpMemorySnapshots() are
removed without setting their VolumeSnapshotGroupRefVO.snapshotDeleted=true, so
remaining will remain >0 and groups never disband; fix by ensuring memory
snapshots removed in cleanUpMemorySnapshots() also mark their corresponding
VolumeSnapshotGroupRefVO.snapshotDeleted=true (or add their group ref UUIDs to
the same snapshots-to-delete set before the remaining-count check), i.e. update
cleanUpMemorySnapshots() or the deletion flow to reuse the group-ref clearing
logic used here so VolumeSnapshotGroupRefVO entries reflect the deletion and
allow groupsToDelete to be computed correctly.
ungroupAfterDeleted now mirrors ungroupAfterDeleteSingleSnapshot:
regardless of root/data volume type, a snapshot group VO is only
disbanded after ALL of its refs reach snapshotDeleted=true.
Previously the root-volume path immediately removed the group VO once
the root chain finished, leaving data-volume refs as orphans pointing
to a non-existent group. After this change the group lives until every
ref is marked deleted, which is the symmetric behavior with the
single-snapshot path and the precondition the integrity check (landed
in a separate commit) relies on.
Resolves: ZSV-10538
Change-Id: I77707a78706271697463676a756c617a796f6c61
sync from gitlab !9939