Skip to content

<fix>[kvm]: define secret on migrate destination#3727

Open
ZStack-Robot wants to merge 1 commit intofeature-zsv-5.0.0-vm-support-vtpm-and-secucebootfrom
sync/zstackio/define-secret-mig@@2
Open

<fix>[kvm]: define secret on migrate destination#3727
ZStack-Robot wants to merge 1 commit intofeature-zsv-5.0.0-vm-support-vtpm-and-secucebootfrom
sync/zstackio/define-secret-mig@@2

Conversation

@ZStack-Robot
Copy link
Copy Markdown
Collaborator

Resolves: ZSV-11729

Change-Id: I616a776e78666179637976616c776162626c6533

sync from gitlab !9596

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

走查概览

该PR引入了新的资源密钥获取API、主机密钥消息字段扩展,以及vTPM迁移前处理机制。主要包括移除冗余状态跟踪、添加用法实例标记,以及重构vTPM密钥和密钥管理流程。

更改内容

聚合/文件(s) 摘要
加密资源密钥管理
header/src/main/java/org/zstack/header/keyprovider/EncryptedResourceKeyManager.java, compute/src/main/java/org/zstack/compute/vm/devices/DummyEncryptedResourceKeyManager.java
添加新的 getKey() 方法用于加载已存在的DEK明文,移除了 refExistedBeforeCreate 字段,更新回滚逻辑以仅在新建密钥时删除行。
主机密钥消息扩展
header/src/main/java/org/zstack/header/secret/SecretHostDefineMsg.java, header/src/main/java/org/zstack/header/secret/SecretHostDeleteMsg.java, header/src/main/java/org/zstack/header/secret/SecretHostGetMsg.java
为三个消息类添加 usageInstance 字段及其getter/setter方法,用于跟踪密钥使用场景。
KVM命令和主机处理
plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java, plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
SecretHostDefineCmd 中添加 secretUuid 字段;更新 KVMHost 的消息处理器,现在校验 usageInstance,使用消息中的值而非固定常量。
vTPM迁移上下文
plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmMigratePreAgentContext.java
新增最终类用于保存vTPM迁移前的状态,包含虚拟机、源/目标主机标识以及TPM/提供程序/密钥版本解析字段。
vTPM扩展和管理
plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java, plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java
重构vTPM预实例化流程以支持配置门控的主机密钥优先获取和密钥创建;新增迁移前处理方法用于准备源/目标主机密钥;所有密钥消息现在均设置 usageInstance

序列图

sequenceDiagram
    participant VM as vTPM VM
    participant Src as 源KVM主机
    participant KMS as 密钥管理器
    participant Secret as 密钥提供程序
    participant Dst as 目标KVM主机

    VM->>Src: 触发迁移前准备
    Src->>Src: 解析TPM/提供程序/密钥版本
    Src->>KMS: getKey(resourceKeyResult)
    KMS-->>Src: 返回DEK base64 & 密钥版本
    Src->>Src: 保存resourceKeyResult
    Src->>Secret: SecretHostGetMsg(usageInstance=VTPM)
    Secret-->>Src: 获取源主机密钥
    Src->>Dst: SecretHostDefineMsg(secretUuid, usageInstance=VTPM)
    Dst-->>Src: 在目标主机定义密钥
    Src->>Src: 迁移前准备完成
Loading

代码审查工作量评估

🎯 3 (中等) | ⏱️ ~25 分钟

诗歌

🐰 新API来临展宏图,
密钥获取无需创造舞,
迁移路上留下秘密脚,
用法标记伴随每处站,
vTPM羽毛随风飘过天! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 4.55% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed 标题清晰具体地概括了主要变更:在迁移目标上定义vTPM secret,这与PR的核心目标一致。
Description check ✅ Passed 描述通过解决的issue编号ZSV-11729和GitLab MR !9596同步来关联变更,与changeset有关联。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/zstackio/define-secret-mig@@2

Comment @coderabbitai help to get the list of available commands and usage tips.

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !9596 — ZSV-11729

关联 MR: premium!13487getKey 实现)
Jira: ZSV-11729 — sblk 存储下带 vTPM 虚机热迁移报 "Secret not found"

问题理解

带 vTPM 的虚机在热迁移(或带存储迁移)时,目标主机上没有预先定义相同 UUID 的 libvirt secret,导致 swtpm 在目标侧启动时报 Secret not found: no secret with matching uuid。根本原因是迁移流程中缺少将源主机上的 secret 在目标主机上预定义的步骤。

方案评价

修复方案在 KVMHost.java 的迁移 FlowChain 中,在 migrate-vm 步骤之前插入 4 个新步骤:

  1. vtpm-prepare-context-before-migrate — 判断 VM 是否有 vTPM 及密钥提供者绑定
  2. vtpm-get-resource-key-before-migrate — 通过新增的 getKey()(只读)获取已有 DEK
  3. vtpm-get-source-secret-uuid-before-migrate — 从源主机获取当前 secret UUID
  4. vtpm-define-secret-on-dst-before-migrate — 使用相同 UUID 在目标主机定义 secret

思路正确,将 secret 的"物化"操作移到迁移前完成,保证 libvirt 在目标侧能找到匹配的 secret。同时新增 getKey() 接口做到只读获取、不会创建新密钥,语义清晰。


具体发现

🟡 Warning — Agent 侧是否已支持 secretUuid 字段

文件: plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.javaSecretHostDefineCmd

本次新增了 secretUuid 字段,并在迁移时通过 cmd.setSecretUuid(msg.getSecretUuid()) 传递源主机的 secret UUID 到 KVM Agent。这是本修复的核心——目标主机必须用相同的 UUID 定义 libvirt secret。

但本 MR 中未包含 KVM Agent(Python 侧)的变更。请确认:

  • Agent 的 define_secret 处理函数是否已经支持接收并使用 secretUuid 字段?
  • 如果 secretUuid 为空(如普通开机定义 secret 场景),Agent 是否会自动生成新 UUID?

如果 Agent 侧尚未适配此字段,即使 Java 层传了 UUID,目标主机上定义的 secret 仍然是随机 UUID,迁移仍会失败。

🟡 Warning — 迁移流程中新增步骤均为 NoRollbackFlow,缺少目标侧 secret 清理

文件: plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.javavtpm-define-secret-on-dst-before-migrate 步骤

如果在目标主机上成功定义了 secret,但后续 migrate-vm 步骤失败,已定义的 secret 不会被清理。虽然孤立 secret 不会造成功能性问题,但在频繁迁移失败的场景下,目标主机上可能累积无用 secret。建议对 vtpm-define-secret-on-dst-before-migrate 加上 rollback 逻辑,在迁移失败时调用 SecretHostDeleteMsg 清理目标侧 secret。

🟡 Warning — usageInstance 必填校验可能影响已有调用方

文件: plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java,handle(SecretHostGetMsg) / handle(SecretHostDefineMsg) / handle(SecretHostDeleteMsg)

三个 handler 的参数校验中新增了 usageInstance 必填检查:

if (StringUtils.isBlank(msg.getUsageInstance())) {
    reply.setError(operr("... usageInstance are required ..."));
}

这是一个向后不兼容的变更。请确认所有已有的 SecretHostGetMsgSecretHostDefineMsgSecretHostDeleteMsg 调用方都已设置 usageInstance。如果 feature 分支中存在其他模块发送这些消息但未设置该字段,会导致运行时失败。

🟡 Warning — KvmTpmExtensions 中 skip 条件的语义变化

文件: plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java

原来的 ensure-resource-key-ref 步骤在 providerUuid 为空时跳过,语义是"没有绑定密钥提供者则跳过"。新代码将 get-secret-on-host-firstget-or-create-key-and-dekdefine-secret-on-host 三个步骤的 skip 条件统一改为 VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class)

这意味着:

  • 如果 ALLOWED_TPM_VM_WITHOUT_KMS = false(即必须有 KMS),即使 TPM 实际上没有 key provider 绑定,这些步骤也会尝试执行并可能因为拿不到 provider 而报错。
  • 如果 ALLOWED_TPM_VM_WITHOUT_KMS = true,所有密钥相关步骤都跳过——这在快照克隆场景中是否正确?(克隆出的 VM 也应该继承密钥保护)

请确认这种语义切换在所有场景下都是预期行为。

🟢 Suggestion — doGetKeydoGetOrCreateKey 存在大量重复代码

文件: crypto/src/main/java/org/zstack/crypto/keyprovider/EncryptedResourceKeyManagerImpl.java(premium 仓库)

doGetKey() 中的 provider 解析、NKP/KMS 配置构建、ref 查找逻辑与 doGetOrCreateKey() 高度重复。建议抽取公共辅助方法(如 resolveProviderConfig() 返回 nkpConfig/kmipConfig),减少维护成本,避免将来改一处漏另一处。

🟢 Suggestion — VtpmPreMigrateSecretCtx.from() 使用原始类型 Map

文件: plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java

public static VtpmPreMigrateSecretCtx from(Map data) {

虽然 FlowChain 的 data 是 Map<String, Object>,但方法签名用的是原始 Map,建议改为 Map<String, Object> 或至少加 @SuppressWarnings("rawtypes") 以保持代码风格一致。

🟢 Suggestion — rollbackCreatedKey 行为变更

文件: crypto/src/main/java/org/zstack/crypto/keyprovider/EncryptedResourceKeyManagerImpl.java(premium 仓库)

旧代码在 refExistedBeforeCreate == true 时会将 ref 行"清空"(保留行但清除 kekRef/wrappedDek/keyVersion),而非删除。新代码统一改为删除 ref 行,同时移除了 refExistedBeforeCreate 字段。

这意味着回滚后不再保留空 placeholder ref。如果上层工作流依赖 ref 行的存在来判断"已绑定但尚未创建密钥"的状态,会受到影响。请确认 addTpmToVm 相关流程在回滚后重新执行时不会因为 ref 不存在而出现异常路径。


总结

级别 数量
🔴 Critical 0
🟡 Warning 4
🟢 Suggestion 3

整体方案思路正确,通过在迁移前预定义 secret 解决了目标主机找不到 secret 的问题。最需要关注的是:Agent 侧是否已支持 secretUuid 字段,这是修复能否生效的关键。其次是 usageInstance 必填校验的向后兼容性,以及 skip 条件从精确判断改为全局配置开关带来的语义变化。


🤖 Robot Reviewer

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java (1)

11-21: 可选改进:考虑添加 toString() 方法便于调试。

在迁移流程出现问题时,toString() 方法可以帮助快速了解上下文状态。

💡 可选的 toString() 实现
     private String sourceSecretUuid;
+
+    `@Override`
+    public String toString() {
+        return "VtpmPreMigrateSecretCtx{" +
+                "skipAll=" + skipAll +
+                ", tpmUuid='" + tpmUuid + '\'' +
+                ", providerUuid='" + providerUuid + '\'' +
+                ", keyVersion=" + keyVersion +
+                ", sourceSecretUuid='" + sourceSecretUuid + '\'' +
+                '}';
+    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java`
around lines 11 - 21, Add a concise toString() implementation to
VtpmPreMigrateSecretCtx that returns a string containing the class name and the
key fields (skipAll, tpmUuid, providerUuid, providerName, keyVersion,
resourceKeyResult, sourceSecretUuid) so debugging logs show context state;
implement it as an override of toString() inside the VtpmPreMigrateSecretCtx
class and format the output clearly (e.g.,
"VtpmPreMigrateSecretCtx{field=value,...}") without adding new dependencies.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 3294-3331: The NoRollbackFlow
"vtpm-define-secret-on-dst-before-migrate" (inside KVMHost flow) currently
defines a secret on the destination via SecretHostDefineMsg but does not record
that it succeeded nor provide cleanup on failure; update the flow to set a flag
in the shared data map (e.g., data.put("vtpm.dstSecretDefined", true) or store
the dst secret uuid in VtpmPreMigrateSecretCtx) immediately after a successful
SecretHostDefineMsg reply, and add a compensating rollback/cleanup flow (or a
finally-style NoRollbackFlow executed on failure) that checks that flag and
sends a best-effort SecretHostDeleteMsg targeting dstHostUuid (using
ctx.getSourceSecretUuid() or stored secret id) to remove the materialized
secret; ensure the cleanup only logs/aggregates errors and does not replace the
original migration error returned to the caller.
- Around line 3197-3201: The current logic in KVMHost
(VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS + ctx.setSkipAll(true) +
trigger.next()) unconditionally skips vTPM secret provisioning even for VMs that
already have a KMS/provider bound; change it so the skip only happens when the
VM's TPM has no provider binding (i.e., check the parsed provider/secret info
first) or move this ALLOWED_TPM_VM_WITHOUT_KMS check to after provider
resolution/parse failure; specifically, update the code around the provider
parsing logic in KVMHost to evaluate VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS
only when provider == null/unbound, and only then call ctx.setSkipAll(true) and
trigger.next().
- Around line 5484-5486: Three places that construct SecretHostDeleteMsg in
KvmTpmExtensions and KvmTpmManager omit the now-required usageInstance, causing
runtime validation failures; update each construction to call
setUsageInstance(...) with the appropriate instance identifier (use the same VM
instance identifier already assigned to vmUuid where applicable), and mirror the
same fix for any SecretHostDefineMsg constructions; locate the builders that
currently set hostUuid, vmUuid, purpose, keyVersion and add setUsageInstance(<vm
instance id>) so the message passes the KVMHost validation.

---

Nitpick comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java`:
- Around line 11-21: Add a concise toString() implementation to
VtpmPreMigrateSecretCtx that returns a string containing the class name and the
key fields (skipAll, tpmUuid, providerUuid, providerName, keyVersion,
resourceKeyResult, sourceSecretUuid) so debugging logs show context state;
implement it as an override of toString() inside the VtpmPreMigrateSecretCtx
class and format the output clearly (e.g.,
"VtpmPreMigrateSecretCtx{field=value,...}") without adding new dependencies.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 0f3d65b3-5350-4838-967c-b2b2323b381b

📥 Commits

Reviewing files that changed from the base of the PR and between 449ef01 and a4f9b7a.

📒 Files selected for processing (9)
  • compute/src/main/java/org/zstack/compute/vm/devices/DummyEncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/keyprovider/EncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDefineMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDeleteMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostGetMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java

Comment on lines +3197 to +3201
if (VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class)) {
ctx.setSkipAll(true);
trigger.next();
return;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

这里会把“允许无 KMS”误当成“永远不需要预置 secret”。

ALLOWED_TPM_VM_WITHOUT_KMStrue 时直接 skipAll,会把已经绑定 KMS/provider 的 vTPM VM 也一起跳过,导致目的端仍然不会定义 secret,混合场景下迁移会回到本 PR 修复前的失败路径。建议只在当前 TPM 确认没有 provider 绑定时才跳过,或者把这段判断放到 provider 解析失败之后。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3197 -
3201, The current logic in KVMHost (VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS +
ctx.setSkipAll(true) + trigger.next()) unconditionally skips vTPM secret
provisioning even for VMs that already have a KMS/provider bound; change it so
the skip only happens when the VM's TPM has no provider binding (i.e., check the
parsed provider/secret info first) or move this ALLOWED_TPM_VM_WITHOUT_KMS check
to after provider resolution/parse failure; specifically, update the code around
the provider parsing logic in KVMHost to evaluate
VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS only when provider == null/unbound,
and only then call ctx.setSkipAll(true) and trigger.next().

Comment on lines +3294 to +3331
flow(new NoRollbackFlow() {
String __name__ = "vtpm-define-secret-on-dst-before-migrate";

@Override
public boolean skip(Map data) {
VtpmPreMigrateSecretCtx ctx = VtpmPreMigrateSecretCtx.from(data);
return ctx == null || ctx.isSkipAll();
}

@Override
public void run(FlowTrigger trigger, Map data) {
VtpmPreMigrateSecretCtx ctx = VtpmPreMigrateSecretCtx.from(data);
ResourceKeyResult result = ctx.getResourceKeyResult();
if (result == null || StringUtils.isBlank(result.getDekBase64())) {
trigger.fail(operr("missing DEK for tpm[uuid:%s] before ensure secret on destination", ctx.getTpmUuid()));
return;
}
SecretHostDefineMsg innerMsg = new SecretHostDefineMsg();
innerMsg.setHostUuid(dstHostUuid);
innerMsg.setVmUuid(vmUuid);
innerMsg.setDekBase64(result.getDekBase64());
innerMsg.setPurpose("vtpm");
innerMsg.setKeyVersion(ctx.getKeyVersion());
innerMsg.setUsageInstance(KVMConstant.HOST_SECRET_USAGE_INSTANCE_VTPM);
innerMsg.setSecretUuid(ctx.getSourceSecretUuid());
innerMsg.setDescription(String.format("Define secret for VM %s before live migration", vmUuid));
bus.makeTargetServiceIdByResourceUuid(innerMsg, HostConstant.SERVICE_ID, innerMsg.getHostUuid());
bus.send(innerMsg, new CloudBusCallBack(trigger) {
@Override
public void run(MessageReply reply) {
if (reply.isSuccess()) {
trigger.next();
} else {
trigger.fail(reply.getError());
}
}
});
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

目的端 secret 定义成功后缺少失败补偿。

这一步成功后,后续任一 flow 失败都会直接结束整个迁移链路,但这里是 NoRollbackFlow,没有对应的 SecretHostDeleteMsg 清理。结果是迁移失败或重试后,目的端会残留 materialized secret 和脏状态。建议在 data 里记录 define 成功标记,并在链路失败时对 dstHostUuid 做 best-effort 删除,同时不要覆盖原始迁移错误。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3294 -
3331, The NoRollbackFlow "vtpm-define-secret-on-dst-before-migrate" (inside
KVMHost flow) currently defines a secret on the destination via
SecretHostDefineMsg but does not record that it succeeded nor provide cleanup on
failure; update the flow to set a flag in the shared data map (e.g.,
data.put("vtpm.dstSecretDefined", true) or store the dst secret uuid in
VtpmPreMigrateSecretCtx) immediately after a successful SecretHostDefineMsg
reply, and add a compensating rollback/cleanup flow (or a finally-style
NoRollbackFlow executed on failure) that checks that flag and sends a
best-effort SecretHostDeleteMsg targeting dstHostUuid (using
ctx.getSourceSecretUuid() or stored secret id) to remove the materialized
secret; ensure the cleanup only logs/aggregates errors and does not replace the
original migration error returned to the caller.

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/define-secret-mig@@2 branch from a4f9b7a to 93acfa0 Compare April 12, 2026 08:30
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java (1)

415-472: ⚠️ Potential issue | 🟠 Major

确认 secretUuid 在 agent 端是否被实际应用到 libvirt secret 定义。

Java 管理侧已正确扩展和传递 secretUuid(KVMHost.java:5618),SecretHostDefineCmdSecretHostDefineResponse DTO 也已完整定义此字段。然而,本仓库中未找到 agent 端的实现代码。agent 侧的 secret 定义逻辑(通常在单独部署的 KVM agent 模块中)需要确认是否真正读取并应用了 secretUuid 参数到 libvirt secret XML 定义中,否则迁移目标端生成的 secret UUID 仍会与源端不符。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java` around lines
415 - 472, 确认 agent 端是否把传入的 secretUuid 应用到 libvirt secret 定义:检查 agent 中接收
SecretHostDefineCmd/SecretHostDefineResponse 的请求处理代码(agent 的 secret
定义/注册函数,例如用于生成 libvirt secret XML 的方法),确保它从 DTO 读取 secretUuid 字段并在生成的 libvirt
secret XML 中设置 <uuid> 元素(而非让 libvirt 生成新 UUID);如果当前未使用 secretUuid,则修改该处理函数以使用传入的
secretUuid,调用 libvirt API 时传入该 UUID,并加上单元/集成测试或运行时校验以确认创建后的 libvirt secret UUID
与传入的 secretUuid 一致。
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java`:
- Around line 266-267: The global flag check using
VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class) currently lets
code paths skip secret lookup/DEK/secret definition even when a vTPM already has
KMS binding; change those conditions in KvmTpmExtensions so they only skip when
both the global flag is true AND the TPM binding is truly "no KMS" (i.e. the
vTPM binding object has no providerUuid, no keyVersion and no secretUuid).
Update the three places that currently only check ALLOWED_TPM_VM_WITHOUT_KMS
(the checks around the secret query, DEK retrieval, and host secret definition)
to require both the global flag and an absent providerUuid/keyVersion/secretUuid
on the tpm binding before bypassing secret logic.

---

Outside diff comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java`:
- Around line 415-472: 确认 agent 端是否把传入的 secretUuid 应用到 libvirt secret 定义:检查
agent 中接收 SecretHostDefineCmd/SecretHostDefineResponse 的请求处理代码(agent 的 secret
定义/注册函数,例如用于生成 libvirt secret XML 的方法),确保它从 DTO 读取 secretUuid 字段并在生成的 libvirt
secret XML 中设置 <uuid> 元素(而非让 libvirt 生成新 UUID);如果当前未使用 secretUuid,则修改该处理函数以使用传入的
secretUuid,调用 libvirt API 时传入该 UUID,并加上单元/集成测试或运行时校验以确认创建后的 libvirt secret UUID
与传入的 secretUuid 一致。
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: d80e533e-b2c8-4543-945d-b4b947f64aa2

📥 Commits

Reviewing files that changed from the base of the PR and between a4f9b7a and 93acfa0.

📒 Files selected for processing (10)
  • compute/src/main/java/org/zstack/compute/vm/devices/DummyEncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/keyprovider/EncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDefineMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDeleteMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostGetMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java
✅ Files skipped from review due to trivial changes (4)
  • header/src/main/java/org/zstack/header/secret/SecretHostGetMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDeleteMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDefineMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • compute/src/main/java/org/zstack/compute/vm/devices/DummyEncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/keyprovider/EncryptedResourceKeyManager.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java

Comment on lines 266 to 267
return VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

不要让全局开关把已有 KMS 绑定的 vTPM 路径也一起跳过。

Line 266、Line 307 和 Line 343 现在只看 ALLOWED_TPM_VM_WITHOUT_KMS。这样一来,只要环境打开了“允许无 KMS”,即使某个 TPM 已经有 providerUuid/keyVersion/secretUuid,这里也会直接跳过 secret 查询、DEK 获取和目标 host 上的 secret 定义。后面的 beforeStartVmOnKvm() 仍然会把 secretUuid 传给 agent;如果该 host 上并没有对应 secret,这条启动/恢复链路就会直接退化成 Secret not found。更稳妥的条件应该是“仅在真正的无 KMS 场景下放宽”,而不是让全局开关覆盖掉已有 KMS 绑定的 VM。

Also applies to: 306-308, 342-344

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java` around
lines 266 - 267, The global flag check using
VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class) currently lets
code paths skip secret lookup/DEK/secret definition even when a vTPM already has
KMS binding; change those conditions in KvmTpmExtensions so they only skip when
both the global flag is true AND the TPM binding is truly "no KMS" (i.e. the
vTPM binding object has no providerUuid, no keyVersion and no secretUuid).
Update the three places that currently only check ALLOWED_TPM_VM_WITHOUT_KMS
(the checks around the secret query, DEK retrieval, and host secret definition)
to require both the global flag and an absent providerUuid/keyVersion/secretUuid
on the tpm binding before bypassing secret logic.

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/define-secret-mig@@2 branch from 93acfa0 to aca3c65 Compare April 12, 2026 09:59
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)

5633-5639: ⚠️ Potential issue | 🟠 Major

校验响应中的 secretUuid 与请求值是否一致。

在迁移场景中,源端获取的 secret UUID 被显式地传入 SecretHostDefineMsg.setSecretUuid(ctx.getSourceSecretUuid())(KVMHost.java:3318),但响应处理器(5635-5639)仅检查 rsp.getSecretUuid() != null,不验证与请求值的一致性。

Agent 模拟器实现(KVMSimulator.groovy:356-359)显示,缓存键仅由 (hostUuid, vmUuid, purpose, keyVersion, usageInstance) 构成,不包含 secretUuid。这意味着即使请求中指定了 secretUuid,agent 也会忽略该字段并返回基于缓存的 UUID。若缓存已存在(例如来自不同的请求),会返回不匹配的 UUID。在迁移场景中,这会导致源端和目的端的 secret 不一致,后续在 libvirt/swtpm 阶段出现失败。

建议:

  • 若请求中指定了 msg.getSecretUuid(),响应成功时应验证 rsp.getSecretUuid() 与其一致;若不一致则返回错误。
  • 同时确认 agent 端 define-secret 确实消费了 secretUuid 字段(基于现有证据,agent 模拟器未使用该字段)。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 5633 -
5639, The success handler for SecretHostDefine responses must validate that when
the request specified a secret UUID the agent returned the same one; update the
success(KVMAgentCommands.SecretHostDefineResponse rsp) logic to, if
msg.getSecretUuid() is non-null, compare rsp.getSecretUuid() to
msg.getSecretUuid() and treat a mismatch as an error by calling
buildSecretAgentError (instead of accepting any non-null rsp.getSecretUuid());
locate this in KVMHost.success handling for SecretHostDefineResponse and also
add a note to verify the agent-side define-secret consumer in KVMSimulator/KVM
agent to ensure it honors msg.getSecretUuid().
♻️ Duplicate comments (2)
plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (2)

3197-3206: ⚠️ Potential issue | 🟠 Major

不要在全局开关为 true 时无条件跳过 vTPM secret 预置。

这里会把“允许无 KMS 的 TPM VM”扩大成“所有 TPM VM 都跳过”,已绑定 provider 的场景也会被短路,目的端仍然不会定义 secret,迁移会退回到本次修复前的失败路径。建议先解析 provider,再只对“确实未绑定 provider”的 TPM VM 做 skip。

💡 建议修改
-                        if (VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class)) {
-                            ctx.setSkipAll(true);
-                            trigger.next();
-                            return;
-                        }
                         ctx.setTpmUuid(tpmUuid);
                         ctx.setProviderUuid(tpmKeyBackend.findKeyProviderUuidByTpm(tpmUuid));
                         ctx.setProviderName(tpmKeyBackend.findKeyProviderNameByTpm(tpmUuid));
                         if (StringUtils.isBlank(ctx.getProviderUuid()) && StringUtils.isBlank(ctx.getProviderName())) {
+                            if (VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class)) {
+                                ctx.setSkipAll(true);
+                                trigger.next();
+                                return;
+                            }
                             trigger.fail(operr("missing TPM resource key binding for tpm[uuid:%s] before migrate", tpmUuid));
                             return;
                         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3197 -
3206, The current code checks VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS and
calls ctx.setSkipAll/trigger.next before resolving the TPM's provider, which
causes vTPM secret pre-provisioning to be skipped for all TPM VMs; instead,
first set ctx.setTpmUuid and resolve provider via
tpmKeyBackend.findKeyProviderUuidByTpm and findKeyProviderNameByTpm (and set
ctx.setProviderUuid and ctx.setProviderName), then only if both providerUuid and
providerName are blank AND
VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class) is true call
ctx.setSkipAll and trigger.next; otherwise proceed to the existing
trigger.fail(operr(...)) path when provider is missing. Ensure you update the
branch ordering around VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS,
ctx.setTpmUuid, ctx.setProviderUuid, ctx.setProviderName, ctx.setSkipAll,
trigger.next, and trigger.fail accordingly.

3294-3331: ⚠️ Potential issue | 🟠 Major

定义目的端 secret 后缺少失败补偿。

这一步成功后,后续任一 flow 失败都会直接结束链路,但这里仍是 NoRollbackFlow,迁移失败或重试会在目的端残留 secret。请记录 define 成功状态,并在链路失败时对 dstHostUuid 做 best-effort SecretHostDeleteMsg 清理,且不要覆盖原始迁移错误。

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 3294 -
3331, Record that the secret was successfully defined on destination by setting
a flag on VtpmPreMigrateSecretCtx (e.g. ctx.setDstDefined(true)) immediately
after a successful SecretHostDefineMsg reply inside the NoRollbackFlow; then add
a best-effort cleanup in the flow-chain failure path that checks
ctx.isDstDefined() and, without replacing the original migration error, sends a
SecretHostDeleteMsg to dstHostUuid (usageInstance
KVMConstant.HOST_SECRET_USAGE_INSTANCE_VTPM, secretUuid
ctx.getSourceSecretUuid()) and ignores/logs any delete failures. Ensure the
cleanup logic uses the same context (VtpmPreMigrateSecretCtx) and does not
swallow or overwrite the original FlowTrigger error.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 5633-5639: The success handler for SecretHostDefine responses must
validate that when the request specified a secret UUID the agent returned the
same one; update the success(KVMAgentCommands.SecretHostDefineResponse rsp)
logic to, if msg.getSecretUuid() is non-null, compare rsp.getSecretUuid() to
msg.getSecretUuid() and treat a mismatch as an error by calling
buildSecretAgentError (instead of accepting any non-null rsp.getSecretUuid());
locate this in KVMHost.success handling for SecretHostDefineResponse and also
add a note to verify the agent-side define-secret consumer in KVMSimulator/KVM
agent to ensure it honors msg.getSecretUuid().

---

Duplicate comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 3197-3206: The current code checks
VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS and calls ctx.setSkipAll/trigger.next
before resolving the TPM's provider, which causes vTPM secret pre-provisioning
to be skipped for all TPM VMs; instead, first set ctx.setTpmUuid and resolve
provider via tpmKeyBackend.findKeyProviderUuidByTpm and findKeyProviderNameByTpm
(and set ctx.setProviderUuid and ctx.setProviderName), then only if both
providerUuid and providerName are blank AND
VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS.value(Boolean.class) is true call
ctx.setSkipAll and trigger.next; otherwise proceed to the existing
trigger.fail(operr(...)) path when provider is missing. Ensure you update the
branch ordering around VmGlobalConfig.ALLOWED_TPM_VM_WITHOUT_KMS,
ctx.setTpmUuid, ctx.setProviderUuid, ctx.setProviderName, ctx.setSkipAll,
trigger.next, and trigger.fail accordingly.
- Around line 3294-3331: Record that the secret was successfully defined on
destination by setting a flag on VtpmPreMigrateSecretCtx (e.g.
ctx.setDstDefined(true)) immediately after a successful SecretHostDefineMsg
reply inside the NoRollbackFlow; then add a best-effort cleanup in the
flow-chain failure path that checks ctx.isDstDefined() and, without replacing
the original migration error, sends a SecretHostDeleteMsg to dstHostUuid
(usageInstance KVMConstant.HOST_SECRET_USAGE_INSTANCE_VTPM, secretUuid
ctx.getSourceSecretUuid()) and ignores/logs any delete failures. Ensure the
cleanup logic uses the same context (VtpmPreMigrateSecretCtx) and does not
swallow or overwrite the original FlowTrigger error.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: edb2405d-2eb0-44b1-a665-9ee8ff2dfe72

📥 Commits

Reviewing files that changed from the base of the PR and between 93acfa0 and aca3c65.

📒 Files selected for processing (10)
  • compute/src/main/java/org/zstack/compute/vm/devices/DummyEncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/keyprovider/EncryptedResourceKeyManager.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDefineMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDeleteMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostGetMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmManager.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java
✅ Files skipped from review due to trivial changes (3)
  • header/src/main/java/org/zstack/header/secret/SecretHostDeleteMsg.java
  • header/src/main/java/org/zstack/header/secret/SecretHostDefineMsg.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/VtpmPreMigrateSecretCtx.java
🚧 Files skipped from review as they are similar to previous changes (3)
  • header/src/main/java/org/zstack/header/secret/SecretHostGetMsg.java
  • compute/src/main/java/org/zstack/compute/vm/devices/DummyEncryptedResourceKeyManager.java
  • plugin/kvm/src/main/java/org/zstack/kvm/tpm/KvmTpmExtensions.java

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/define-secret-mig@@2 branch from aca3c65 to 9b984bf Compare April 12, 2026 10:54
}

@Override
public void getKey(GetOrCreateResourceKeyContext ctx,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from wenhao.zhang:

确认一下这个行为。

DummyXXX 的存在本意是,让跑 case 时就算没有对应的非 dummy 的实现类,也能正常跑完整个 case。

你直接报错是否会和这个本意冲突?如果没有 crypto 模块的 case 加了 TPM 会在这里报错吗?


if (context.keyVersion != null) {
// If VM is first created, skip this flow.
if (context.keyVersion == null || StringUtils.isBlank(spec.getDevicesSpec().getTpm().getSecretUuid())) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from wenhao.zhang:

和以前一样,建议就可以看情况不修改。

一般推荐是,content 传递是否是 first created,然后在前面检查,first created 和 tpm == null/keyVersion == null 同时成立才跳过。

}
});

flow(new NoRollbackFlow() {
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from wenhao.zhang:

不能放 migrateVm 的 extension?放这里是没有对应 extension 才这样写的

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/define-secret-mig@@2 branch 8 times, most recently from 9cf8637 to 345be8c Compare April 13, 2026 08:39
public ResourceKeyResult getKey(GetOrCreateResourceKeyContext ctx) {
logger.warn(String.format("crypto module not installed, cannot get resource key for %s[uuid:%s]",
ctx.getResourceType(), ctx.getResourceUuid()));
completion.fail(operr("crypto module is not installed, cannot manage resource encryption keys"));
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment from wenhao.zhang:

completion 不存在

also:
- cleanup EncryptedResourceKeyRefVO when VM creation failed

Resolves: ZSV-11729
Resolves: ZSV-11845

Change-Id: I616a776e78666179637976616c776162626c6533
@MatheMatrix MatheMatrix force-pushed the sync/zstackio/define-secret-mig@@2 branch from 345be8c to 8d32ee7 Compare April 13, 2026 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants