<doc>[sdnController]: integraion zns into cloud#3711
<doc>[sdnController]: integraion zns into cloud#3711MatheMatrix wants to merge 13 commits into5.5.12from
Conversation
Resolves: ZCF-1365 Change-Id: I73687962636e7871626e687761626d6661716668
Resolves: ZCF-1365 Change-Id: I7262787a6474667a766d77766165796f73717775
Resolves: ZCF-1365 Change-Id: I647469616679716d7366686c77617073746f776c
Resolves: ZCF-1365 Change-Id: I7a61747778757574656967626c6a736366716b6a
<doc>[sdnController]: integraion zns into cloud See merge request zstackio/zstack!9447
Resolves: ZCF-1365 Change-Id: I65767164636167726d6369726f63666b68666477
<fix>[zns]: add ZnsUuidHelper utility for UUID format conversion See merge request zstackio/zstack!9467
Resolves: ZCF-1365 Change-Id: I696f6a6a6e6d7867746e70736d6d646861686166
<fix>[rest]: fix bug<description> See merge request zstackio/zstack!9500
Resolves: ZCF-1365 Change-Id: I73637569786c6d6e6d6479646961726365737074
<fix>[rest]: improve markdown validation error reporting See merge request zstackio/zstack!9529
Release SDN NICs before removing VmNicVO. Keep NIC deletion on release failure. Resolves: ZCF-2047 Change-Id: I83f534ea19849467a728e3b6fb9ee2f6bb43bb7e Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Walkthrough引入 ZNS SDN 控制器集成、NIC 分配控制流重构与新扩展点、异步 Webhook 回调客户端与协议、Geneve L2 类型与数据库表支持,以及相关 KVM/NIC 配置与文档更新。 Changes
Sequence Diagram(s)sequenceDiagram
participant VM as VM Creation Flow
participant NIC as VmAllocateNicFlow
participant EXT1 as BeforeAllocateVmNic<br/>ExtensionPoint
participant SDN as VmAllocateSdnNicFlow
participant EXT2 as AfterAllocateSdnNic<br/>ExtensionPoint
participant DB as Database
VM->>NIC: run(trigger, data)
NIC->>NIC: allocateMac()
NIC->>NIC: buildNicInventory()
NIC->>DB: vnicFactory.createVmNic()
DB-->>NIC: VmNicVO created
NIC->>EXT1: callBeforeAllocateVmNicExtensions()
EXT1-->>NIC: extensions complete
NIC->>DB: persistStaticIpIfNeeded()
NIC->>DB: updateNicVO + applyNicConfig()
NIC-->>VM: flow complete
VM->>SDN: run(trigger, data)
SDN->>SDN: query AfterAllocateSdnNicExtensionPoint
loop for each extension
SDN->>EXT2: afterAllocateSdnNic(spec, nics)
EXT2->>EXT2: create SDN ports
EXT2-->>SDN: success/failure
end
SDN-->>VM: proceed or fail
sequenceDiagram
participant NIC as VM NIC Detach
participant RELEASE as VmDetachNicFlow
participant SDN as releaseSdnNics()
participant EXT as AfterAllocateSdnNic<br/>ExtensionPoint
participant DB as Database
NIC->>RELEASE: run(trigger)
RELEASE->>RELEASE: returnIpsAndDeleteNic()
RELEASE->>DB: return allocated IPs
RELEASE->>SDN: callReleaseSdnNics(nics)
SDN->>SDN: query AfterAllocateSdnNicExtensionPoint
loop for each extension
SDN->>EXT: releaseSdnNics(nics)
EXT->>EXT: cleanup SDN resources
EXT-->>SDN: complete (ignore errors)
end
SDN->>DB: dbf.removeByPrimaryKey(nic)
DB-->>SDN: NIC deleted
SDN-->>RELEASE: completion
RELEASE->>NIC: trigger.next()
sequenceDiagram
participant Client as External System
participant WEBHOOK as WebhookCallbackClient
participant HANDLER as HTTP Handler
participant MAP as Task Registry<br/>ConcurrentHashMap
participant TIMEOUT as Timeout Scheduler
Client->>WEBHOOK: submit(completion, timeout)
WEBHOOK->>MAP: store taskId->completion
WEBHOOK->>TIMEOUT: schedule timeout
WEBHOOK-->>Client: return taskId
Client->>Client: execute async operation
Client->>HANDLER: callback(taskId, result)
HANDLER->>HANDLER: onCallback()
HANDLER->>MAP: extract & remove taskId
MAP-->>HANDLER: completion found
HANDLER->>TIMEOUT: cancel timeout
HANDLER->>MAP: complete(success/failure)
alt Timeout Occurs
TIMEOUT->>MAP: remove taskId
TIMEOUT->>MAP: complete(error)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Poem
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (1 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (4)
header/src/main/java/org/zstack/header/vm/AfterReleaseVmNicExtensionPoint.java (1)
6-11: 建议统一“Release”与“Deleted”语义,避免生命周期阶段歧义。接口/方法名强调
Release,但注释写的是“deleted from database”;建议统一术语或按真实触发时机调整命名/注释,避免扩展点挂错阶段。Based on learnings, 在 ZStack 中扩展点通常按不同执行阶段区分,命名和触发语义需要严格对齐。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/vm/AfterReleaseVmNicExtensionPoint.java` around lines 6 - 11, The interface AfterReleaseVmNicExtensionPoint and its method afterReleaseVmNic currently mix "Release" in the name with "deleted from database" in the Javadoc; update either the name or the comment so lifecycle semantics match: either rename the interface/method to DeletedVmNicExtensionPoint/afterDeletedVmNic if it is invoked after DB deletion, or change the Javadoc to state the exact "release" stage (e.g., resource release/cleanup before/after DB deletion) if the name is correct; make the change where AfterReleaseVmNicExtensionPoint and afterReleaseVmNic are declared and ensure VmNicInventory references and any callers/registrations are updated consistently to avoid mismatched lifecycle semantics.testlib/src/main/java/org/zstack/testlib/SdnControllerSpec.groovy (1)
62-64: 建议先校验查询结果再取首元素。Line [62]-Line [64] 直接取
[0],查询为空时会在postCreate阶段抛异常。建议先判空并给出明确错误信息。可参考的修改
postCreate { - inventory = JSONObjectUtil.rehashObject(querySdnController { - conditions=["uuid=${inventory.uuid}".toString()] - }[0], SdnControllerInventory.class) + def controllers = querySdnController { + conditions = ["uuid=${inventory.uuid}".toString()] + } + if (!controllers) { + throw new IllegalStateException("Failed to query created SDN controller by uuid: ${inventory.uuid}") + } + inventory = JSONObjectUtil.rehashObject(controllers[0], SdnControllerInventory.class) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@testlib/src/main/java/org/zstack/testlib/SdnControllerSpec.groovy` around lines 62 - 64, 在使用 JSONObjectUtil.rehashObject(querySdnController { conditions=["uuid=${inventory.uuid}".toString()] }[0], SdnControllerInventory.class) 之前不要直接取索引 0;在 SdnControllerSpec 的 postCreate 流程里先检查 querySdnController 返回的结果是否为 null 或为空列表(例如检查变量或返回值的 size/empty),若为空则抛出或记录一个明确的错误(包含 inventory.uuid 和上下文信息),否则再取第一项并传入 JSONObjectUtil.rehashObject 转换为 SdnControllerInventory。header/src/main/java/org/zstack/header/network/l3/L3NetworkVO.java (1)
118-120: 建议抽取公共判定逻辑,避免 VO/Inventory 后续漂移。这里与
L3NetworkInventory.enableIpAddressAllocation()的分支几乎一致,建议下沉到L3NetworkType的静态工具方法统一复用。♻️ 参考改法
*** header/src/main/java/org/zstack/header/network/l3/L3NetworkType.java @@ + public static boolean isIpAddressAllocationEnabled(String typeName) { + if (!hasType(typeName)) { + return true; + } + return valueOf(typeName).isIpAddressAllocationEnabled(); + } *** header/src/main/java/org/zstack/header/network/l3/L3NetworkInventory.java @@ - if (L3NetworkType.hasType(getType())) { - return L3NetworkType.valueOf(getType()).isIpAddressAllocationEnabled(); - } - return true; + return L3NetworkType.isIpAddressAllocationEnabled(getType()); *** header/src/main/java/org/zstack/header/network/l3/L3NetworkVO.java @@ - if (L3NetworkType.hasType(getType())) { - return L3NetworkType.valueOf(getType()).isIpAddressAllocationEnabled(); - } - return true; + return L3NetworkType.isIpAddressAllocationEnabled(getType());🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/network/l3/L3NetworkVO.java` around lines 118 - 120, Extract the repeated type-checking logic into a single static helper on L3NetworkType and call it from both L3NetworkVO and L3NetworkInventory.enableIpAddressAllocation(): move the if (L3NetworkType.hasType(getType())) { return L3NetworkType.valueOf(getType()).isIpAddressAllocationEnabled(); } logic into a new L3NetworkType.static method (e.g., isIpAddressAllocationEnabled(String type) or isIpAddressAllocationEnabledFor(String type)) that performs the hasType check and returns the flag, then replace the existing branches in L3NetworkVO and L3NetworkInventory.enableIpAddressAllocation() to delegate to that new static method to avoid duplication and future drift.plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java (1)
4183-4184: 将 ZNS 端口常量提取,避免魔法值散落
"br-int"和"openvswitch"建议提取为类常量(或统一常量类),减少拼写漂移风险并提升可维护性。♻️ 建议修改
public class KVMHost extends HostBase implements Host { private static final CLogger logger = Utils.getLogger(KVMHost.class); private static final ZTester tester = Utils.getTester(); + private static final String ZNS_BRIDGE_NAME = "br-int"; + private static final String ZNS_BRIDGE_PORT_TYPE = "openvswitch";- to.setBridgeName("br-int"); - to.setBridgePortType("openvswitch"); + to.setBridgeName(ZNS_BRIDGE_NAME); + to.setBridgePortType(ZNS_BRIDGE_PORT_TYPE);As per coding guidelines: 避免使用魔法值(Magic Value),应替换为常量或枚举。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java` around lines 4183 - 4184, Replace the magic strings used when configuring ZNS ports by extracting them into constants: replace the literal "br-int" passed to to.setBridgeName and "openvswitch" passed to to.setBridgePortType with class-level constants (e.g., BRIDGE_NAME_INT and BRIDGE_PORT_TYPE_OPENVSWITCH) declared in KVMHost (or a shared constants class) and use those constant names in the calls to to.setBridgeName(...) and to.setBridgePortType(...); ensure the constants are public/static/final and documented so all usages reference the single source of truth.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmAllocateNicFlow.java`:
- Around line 121-145: A VmNicVO is persisted by vnicFactory.createVmNic(nic,
spec) before callBeforeAllocateVmNicExtensions(...) but on extension failure the
code only records the error and doesn't remove that persisted nic, leaving an
orphaned VmNicVO/ResourceVO; in the Completion.fail(ErrorCode) branch delete the
persisted nicVO (and associated ResourceVO/any created system tags) inside a DB
transaction/SQLBatch (use dbf.remove(...) or equivalent) before adding the error
and calling wcomp.allDone(), ensuring the cleanup references the nicVO created
earlier so the rollback won't leave orphaned records.
In `@core/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.java`:
- Around line 101-109: The current submit method starts the timeout task via
thdf.submitTimeoutTask before putting the PendingEntry into pendingCalls, which
can race (timeout may run before put) and drop completion; fix by creating and
registering the PendingEntry in pendingCalls first (e.g., new
PendingEntry<>(completion, null) and put it), then call thdf.submitTimeoutTask
and store the returned TimeoutTaskReceipt into that PendingEntry (or replace the
entry with one containing the receipt). Update references: method submit, map
pendingCalls, class PendingEntry, thdf.submitTimeoutTask, fail/operr and
protocol.getCallbackPath() so the timeout handler will always find and remove
the registered entry.
In `@docs/modules/network/pages/networkResource/ZnsIntegration.adoc`:
- Around line 15-29: The example marked [source,go] is not valid Go: replace the
inline `###` comments with proper Go comments or remove them, add the missing
`struct` keyword for `type Segment` and ensure the slice field is declared
correctly (e.g., `CmsMetaDatas []Cms` with a proper struct tag like
`json:"cms"`), and make field names/types conform to Go syntax in the `Cms`
struct (e.g., ensure fields like `CmsUuid`, `Type`, `IP`, `Role`,
`CmsResourceUuid` use valid types and optional `json` tags); alternatively, if
you intend pseudocode, change the block language from `go` to something like
`text` and add a note that it’s pseudocode so readers aren’t misled.
In
`@header/src/main/java/org/zstack/header/vm/AfterReleaseVmNicExtensionPoint.java`:
- Around line 10-11: 为 AfterReleaseVmNicExtensionPoint 接口的 afterReleaseVmNic
方法添加方法级 Javadoc,说明回调契约:明确在何种时机框架会调用 afterReleaseVmNic,要求扩展实现必须在处理完成后且仅调用一次
Completion(调用 completion.success() 表示成功、调用 completion.fail(Throwable)
表示失败),不应阻塞调用线程或抛出未捕获异常,并说明是否允许异步完成及线程/事务约束;同时移除方法声明中多余的 public
修饰符以符合接口方法规范(定位符:AfterReleaseVmNicExtensionPoint.afterReleaseVmNic(VmNicInventory
nic, Completion completion))。
In `@network/src/main/java/org/zstack/network/l3/L3NetworkManagerImpl.java`:
- Line 197: 在 L3NetworkManagerImpl 中避免直接对 dbf.findByUuid(...) 的返回值调用
L3NetworkInventory.valueOf(...) 导致 NPE:先把 dbf.findByUuid(msg.getL3NetworkUuid(),
L3NetworkVO.class) 的结果赋给一个局部变量(例如 vo 或 l3Vo),对该变量判空;若为 null,调用
trigger.fail(...)(包含清晰的错误信息和 msg.getL3NetworkUuid())并返回;否则再调用
L3NetworkInventory.valueOf(...) 继续后续逻辑。
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMRealizeL2VlanNetworkBackend.java`:
- Around line 252-254: 当前在 KVMRealizeL2VlanNetworkBackend 的判断只为 dpdk
vhost-user(L2NetworkConstant.ACCEL_TYPE_VHOST_USER_SPACE)设置 srcPath,遗漏了
vDPA(L2NetworkConstant.ACCEL_TYPE_VDPA),导致 NIC 未被正确配置;在包含 to.setSrcPath(...)
的分支中扩展条件为同时匹配 ACCEL_TYPE_VHOST_USER_SPACE 或 ACCEL_TYPE_VDPA(检查 nic.getType()),使
to.setSrcPath 使用 L2NetworkConstant.OVN_DPDK_VNIC_SRC_PATH +
nic.getInternalName() 对两类加速 NIC 都生效;并在 KVMRealizeL2NoVlanNetworkBackend
中对称地做同样的修改以保持行为一致。
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java`:
- Around line 273-279: The current try/catch in SdnControllerManagerImpl around
tagMgr.createTagsFromAPICreateMessage(...) and dbf.findByUuid(...) silently
swallows initialization failures and still calls publish(event); remove the
catch so failures propagate (or rethrow the exception) instead of continuing on
the success path, ensuring event.setInventory(...) and tag creation errors
prevent publish(event) from returning success; locate the block that calls
tagMgr.createTagsFromAPICreateMessage(...), dbf.findByUuid(...), and
event.setInventory(...) and delete the surrounding try/catch (or replace with a
rethrow) so errors are not suppressed.
- Around line 661-680: The loop building nicMaps in SdnControllerManagerImpl
(currently iterating nics and calling dbf.findByUuid for L3NetworkVO and
L2NetworkVO and reading L2_NETWORK_SDN_CONTROLLER_UUID tag per NIC) causes N+1
queries; refactor by adding a batch helper (e.g., buildNicToControllerMap or
resolveNicsToControllers) that: collects all l3 uuids from the nics, does a
single dbf.listByIds/findByUuids for L3NetworkVO, collects all referenced l2
uuids and batch-fetches L2NetworkVOs, then resolves controllerUuid tokens for
those L2s in one pass and returns a Map<controllerUuid, List<VmNicInventory>>;
replace the three similar loops (around the shown block and at ranges 698-717,
734-753) to call this helper and handle null/missing controllerUuid by failing
the completion once with the same operr message referencing l2 uuid.
- Around line 663-670: The loop that looks up L3NetworkVO and L2NetworkVO should
fail-fast instead of silently continuing: when
dbf.findByUuid(nic.getL3NetworkUuid(), L3NetworkVO.class) or
dbf.findByUuid(l3Vo.getL2NetworkUuid(), L2NetworkVO.class) returns null, stop
processing and fail the allocation (e.g., call completion.fail or throw an
exception) with a clear error message including the missing UUID and NIC info;
do not let nicMaps remain empty and return completion.success(). Update the
logic around shouldSkipSdnForNic to still allow skipping when intentional, but
treat null VO results as fatal data-inconsistency errors and surface them
immediately.
In `@rest/src/main/resources/scripts/RestDocumentationGenerator.groovy`:
- Around line 2846-2849: The loop currently only catches CloudRuntimeException
when calling checkMD(mdPath, it.value), so other runtime exceptions (e.g.,
IndexOutOfBoundsException, NullPointerException thrown from checkMD or
getExistGlobalConfigMarkDown) will break the loop and prevent collecting all
errors; update the try/catch to catch Exception (or RuntimeException) in
addition to CloudRuntimeException around the call sites (checkMD and any calls
to getExistGlobalConfigMarkDown) and add their messages to allErrors (preserving
existing CloudRuntimeException behavior) so the loop continues and aggregates
every markdown validation failure.
- Around line 851-861: 当前校验用的是 markDown.desc_CN.isEmpty() 等判断,但
GlobalConfigMarkDown 是从 new GlobalConfigMarkDown() 开始解析,默认值为 "PLACEHOLDER...",所以
.isEmpty() 会漏报未出现的 section。更新校验逻辑(在 RestDocumentationGenerator.groovy 中处理
getExistGlobalConfigMarkDown 返回的 GlobalConfigMarkDown)改为使用一个辅助判空方法(例如
isMissingField(String s))并在该方法中同时判断 null、trim().isEmpty() 和是否以
"PLACEHOLDER"(或项目中实际的占位符前缀)开头,替换对
desc_CN、name_CN、valueRangeRemark、defaultValueRemark、resourcesGranularitiesRemark、additionalRemark、backgroundInformation、isUIExposed、isCLIExposed
的所有 .isEmpty() 调用;或者相应地修改 getExistGlobalConfigMarkDown 使未在 markdown 中出现的字段返回
null,并据此做相同的判定。
---
Nitpick comments:
In `@header/src/main/java/org/zstack/header/network/l3/L3NetworkVO.java`:
- Around line 118-120: Extract the repeated type-checking logic into a single
static helper on L3NetworkType and call it from both L3NetworkVO and
L3NetworkInventory.enableIpAddressAllocation(): move the if
(L3NetworkType.hasType(getType())) { return
L3NetworkType.valueOf(getType()).isIpAddressAllocationEnabled(); } logic into a
new L3NetworkType.static method (e.g., isIpAddressAllocationEnabled(String type)
or isIpAddressAllocationEnabledFor(String type)) that performs the hasType check
and returns the flag, then replace the existing branches in L3NetworkVO and
L3NetworkInventory.enableIpAddressAllocation() to delegate to that new static
method to avoid duplication and future drift.
In
`@header/src/main/java/org/zstack/header/vm/AfterReleaseVmNicExtensionPoint.java`:
- Around line 6-11: The interface AfterReleaseVmNicExtensionPoint and its method
afterReleaseVmNic currently mix "Release" in the name with "deleted from
database" in the Javadoc; update either the name or the comment so lifecycle
semantics match: either rename the interface/method to
DeletedVmNicExtensionPoint/afterDeletedVmNic if it is invoked after DB deletion,
or change the Javadoc to state the exact "release" stage (e.g., resource
release/cleanup before/after DB deletion) if the name is correct; make the
change where AfterReleaseVmNicExtensionPoint and afterReleaseVmNic are declared
and ensure VmNicInventory references and any callers/registrations are updated
consistently to avoid mismatched lifecycle semantics.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMHost.java`:
- Around line 4183-4184: Replace the magic strings used when configuring ZNS
ports by extracting them into constants: replace the literal "br-int" passed to
to.setBridgeName and "openvswitch" passed to to.setBridgePortType with
class-level constants (e.g., BRIDGE_NAME_INT and BRIDGE_PORT_TYPE_OPENVSWITCH)
declared in KVMHost (or a shared constants class) and use those constant names
in the calls to to.setBridgeName(...) and to.setBridgePortType(...); ensure the
constants are public/static/final and documented so all usages reference the
single source of truth.
In `@testlib/src/main/java/org/zstack/testlib/SdnControllerSpec.groovy`:
- Around line 62-64: 在使用 JSONObjectUtil.rehashObject(querySdnController {
conditions=["uuid=${inventory.uuid}".toString()] }[0],
SdnControllerInventory.class) 之前不要直接取索引 0;在 SdnControllerSpec 的 postCreate
流程里先检查 querySdnController 返回的结果是否为 null 或为空列表(例如检查变量或返回值的
size/empty),若为空则抛出或记录一个明确的错误(包含 inventory.uuid 和上下文信息),否则再取第一项并传入
JSONObjectUtil.rehashObject 转换为 SdnControllerInventory。
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 190b27bf-f5cc-42a9-9b21-26c5d0efaf60
⛔ Files ignored due to path filters (9)
build/pom.xmlis excluded by!**/*.xmlconf/springConfigXml/VmInstanceManager.xmlis excluded by!**/*.xmlconf/springConfigXml/sdnController.xmlis excluded by!**/*.xmlsdk/src/main/java/SourceClassMap.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/CreateL2GeneveNetworkAction.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/L2GeneveNetworkInventory.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/NicTO.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/ZnsControllerInventory.javais excluded by!sdk/**sdk/src/main/java/org/zstack/sdk/ZnsTransportZoneInventory.javais excluded by!sdk/**
📒 Files selected for processing (40)
compute/src/main/java/org/zstack/compute/vm/VmAllocateNicFlow.javacompute/src/main/java/org/zstack/compute/vm/VmAllocateSdnNicFlow.javacompute/src/main/java/org/zstack/compute/vm/VmDetachNicFlow.javacompute/src/main/java/org/zstack/compute/vm/VmInstanceBase.javacompute/src/main/java/org/zstack/compute/vm/VmNicManagerImpl.javacompute/src/main/java/org/zstack/compute/vm/VmReturnReleaseNicFlow.javacompute/src/main/java/org/zstack/compute/vm/VmSystemTags.javaconf/db/upgrade/V5.5.18__schema.sqlcore/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.javacore/src/main/java/org/zstack/core/rest/webhook/WebhookProtocol.javadocs/modules/network/nav.adocdocs/modules/network/pages/networkResource/ZStackL2NetworkType.adocdocs/modules/network/pages/networkResource/ZnsIntegration.adocdocs/modules/network/pages/networkResource/networkResource.adocheader/src/main/java/org/zstack/header/network/l2/L2NetworkConstant.javaheader/src/main/java/org/zstack/header/network/l3/AfterSetL3NetworkMtuExtensionPoint.javaheader/src/main/java/org/zstack/header/network/l3/L3NetworkInventory.javaheader/src/main/java/org/zstack/header/network/l3/L3NetworkType.javaheader/src/main/java/org/zstack/header/network/l3/L3NetworkVO.javaheader/src/main/java/org/zstack/header/vm/AfterAllocateSdnNicExtensionPoint.javaheader/src/main/java/org/zstack/header/vm/AfterAllocateVmNicIpExtensionPoint.javaheader/src/main/java/org/zstack/header/vm/AfterReleaseVmNicExtensionPoint.javaheader/src/main/java/org/zstack/header/vm/BeforeAllocateVmNicExtensionPoint.javaheader/src/main/java/org/zstack/header/vm/VmInstanceNicFactory.javaheader/src/main/java/org/zstack/header/vm/VmOvsNicConstant.javanetwork/src/main/java/org/zstack/network/l3/L3NetworkManagerImpl.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMAgentCommands.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHost.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMRealizeL2NoVlanNetworkBackend.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMRealizeL2VlanNetworkBackend.javaplugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerBase.javaplugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.javaplugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.javaplugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnControllerFactory.javaplugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnControllerFactory.javarest/src/main/resources/scripts/RestDocumentationGenerator.groovytestlib/src/main/java/org/zstack/testlib/ApiHelper.groovytestlib/src/main/java/org/zstack/testlib/SdnControllerSpec.groovyutils/src/main/java/org/zstack/utils/clouderrorcode/CloudOperationsErrorCode.java
💤 Files with no reviewable changes (4)
- plugin/sdnController/src/main/java/org/zstack/sdnController/h3cVcfc/H3cVcfcSdnControllerFactory.java
- plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerFactory.java
- header/src/main/java/org/zstack/header/vm/VmOvsNicConstant.java
- plugin/sugonSdnController/src/main/java/org/zstack/sugonSdnController/controller/SugonSdnControllerFactory.java
| // Persist VmNicVO first so that ResourceVO entry exists before extensions | ||
| // (e.g. SDN controllers) attempt to create SystemTags referencing the NIC UUID. | ||
| VmNicVO nicVO = vnicFactory.createVmNic(nic, spec); | ||
|
|
||
| nic.setDeviceId(deviceId); | ||
| nic.setInternalName(VmNicVO.generateNicInternalName(spec.getVmInventory().getInternalId(), nic.getDeviceId())); | ||
| nic.setState(disableL3Networks.contains(nic.getL3NetworkUuid()) ? VmNicState.disable.toString() : VmNicState.enable.toString()); | ||
| new SQLBatch() { | ||
| callBeforeAllocateVmNicExtensions(nic, spec, new Completion(wcomp) { | ||
| @Override | ||
| protected void scripts() { | ||
| VmNicVO nicVO = vnicFactory.createVmNic(nic, spec); | ||
| if (!nw.enableIpAddressAllocation() && nicNetworkInfoMap != null | ||
| && nicNetworkInfoMap.containsKey(nw.getUuid()) | ||
| && spec.getVmInventory().getType().equals(VmInstanceConstant.USER_VM_TYPE)) { | ||
| NicIpAddressInfo nicNicIpAddressInfo = nicNetworkInfoMap.get(nic.getL3NetworkUuid()); | ||
| if (!nicNicIpAddressInfo.ipv6Address.isEmpty()) { | ||
| UsedIpVO vo = new UsedIpVO(); | ||
| vo.setUuid(Platform.getUuid()); | ||
| vo.setIp(IPv6NetworkUtils.getIpv6AddressCanonicalString(nicNicIpAddressInfo.ipv6Address)); | ||
| vo.setNetmask(IPv6NetworkUtils.getFormalNetmaskOfNetworkCidr(nicNicIpAddressInfo.ipv6Address+"/"+ nicNicIpAddressInfo.ipv6Prefix)); | ||
| vo.setGateway(nicNicIpAddressInfo.ipv6Gateway.isEmpty() ? "" : IPv6NetworkUtils.getIpv6AddressCanonicalString(nicNicIpAddressInfo.ipv6Gateway)); | ||
| vo.setIpVersion(IPv6Constants.IPv6); | ||
| vo.setVmNicUuid(nic.getUuid()); | ||
| vo.setL3NetworkUuid(nic.getL3NetworkUuid()); | ||
| vo.setIpInBinary(NetworkUtils.ipStringToBytes(vo.getIp())); | ||
| vo.setIpRangeUuid(new StaticIpOperator().getIpRangeUuid(nic.getL3NetworkUuid(), vo.getIp())); | ||
| nic.setUsedIpUuid(vo.getUuid()); | ||
| nicVO.setUsedIpUuid(vo.getUuid()); | ||
| nicVO.setIp(vo.getIp()); | ||
| nicVO.setNetmask(vo.getNetmask()); | ||
| nicVO.setGateway(vo.getGateway()); | ||
| dbf.persist(vo); | ||
| public void success() { | ||
| new SQLBatch() { | ||
| @Override | ||
| protected void scripts() { | ||
| persistStaticIpIfNeeded(nic, nicVO, nw, nicNetworkInfoMap, spec); | ||
| nics.add(nic); | ||
| VmNicVO updated = dbf.updateAndRefresh(nicVO); | ||
| addVmNicConfig(updated, spec, nicSpec); | ||
| } | ||
| if (!nicNicIpAddressInfo.ipv4Address.isEmpty()) { | ||
| UsedIpVO vo = new UsedIpVO(); | ||
| vo.setUuid(Platform.getUuid()); | ||
| vo.setIp(nicNicIpAddressInfo.ipv4Address); | ||
| vo.setGateway(nicNicIpAddressInfo.ipv4Gateway); | ||
| vo.setNetmask(nicNicIpAddressInfo.ipv4Netmask); | ||
| vo.setIpVersion(IPv6Constants.IPv4); | ||
| vo.setVmNicUuid(nic.getUuid()); | ||
| vo.setL3NetworkUuid(nic.getL3NetworkUuid()); | ||
| vo.setIpInLong(NetworkUtils.ipv4StringToLong(vo.getIp())); | ||
| vo.setIpInBinary(NetworkUtils.ipStringToBytes(vo.getIp())); | ||
| vo.setIpRangeUuid(new StaticIpOperator().getIpRangeUuid(nic.getL3NetworkUuid(), vo.getIp())); | ||
| nic.setUsedIpUuid(vo.getUuid()); | ||
| nicVO.setUsedIpUuid(vo.getUuid()); | ||
| nicVO.setIp(vo.getIp()); | ||
| nicVO.setNetmask(vo.getNetmask()); | ||
| nicVO.setGateway(vo.getGateway()); | ||
| dbf.persist(vo); | ||
| } | ||
| } | ||
| nics.add(nic); | ||
| nicVO = dbf.updateAndRefresh(nicVO); | ||
| addVmNicConfig(nicVO, spec, nicSpec); | ||
| }.execute(); | ||
| wcomp.done(); | ||
| } | ||
| }.execute(); | ||
| wcomp.done(); | ||
|
|
||
| @Override | ||
| public void fail(ErrorCode errorCode) { | ||
| errs.add(errorCode); | ||
| wcomp.allDone(); | ||
| } | ||
| }); |
There was a problem hiding this comment.
在扩展失败时清理已落库的 VmNicVO。
Line 123 已经持久化了 VmNicVO,但只有 Line 132 成功分支才把这张 NIC 放进后续集合;Line 141-144 失败时只是记错并结束。当前回滚逻辑会遍历 spec.getDestNics(),它看不到这张尚未加入集合的 NIC,因此这里会遗留孤儿 VmNicVO/ResourceVO。
🛠️ 一个直接的修复方向
callBeforeAllocateVmNicExtensions(nic, spec, new Completion(wcomp) {
`@Override`
public void success() {
new SQLBatch() {
@@
`@Override`
public void fail(ErrorCode errorCode) {
+ vnicFactory.releaseVmNic(nic);
errs.add(errorCode);
wcomp.allDone();
}
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/VmAllocateNicFlow.java` around
lines 121 - 145, A VmNicVO is persisted by vnicFactory.createVmNic(nic, spec)
before callBeforeAllocateVmNicExtensions(...) but on extension failure the code
only records the error and doesn't remove that persisted nic, leaving an
orphaned VmNicVO/ResourceVO; in the Completion.fail(ErrorCode) branch delete the
persisted nicVO (and associated ResourceVO/any created system tags) inside a DB
transaction/SQLBatch (use dbf.remove(...) or equivalent) before adding the error
and calling wcomp.allDone(), ensuring the cleanup references the nicVO created
earlier so the rollback won't leave orphaned records.
| public String submit(ReturnValueCompletion<T> completion, TimeUnit unit, long timeout) { | ||
| String taskId = Platform.getUuid(); | ||
|
|
||
| ThreadFacadeImpl.TimeoutTaskReceipt timeoutReceipt = thdf.submitTimeoutTask(() -> { | ||
| fail(taskId, operr("[Webhook Timeout] callback timed out for taskId[%s], path[%s]", | ||
| taskId, protocol.getCallbackPath())); | ||
| }, unit, timeout); | ||
|
|
||
| pendingCalls.put(taskId, new PendingEntry<>(completion, timeoutReceipt)); |
There was a problem hiding this comment.
先注册 pending entry,再启动超时任务。
Line 104 在 Line 109 之前启动 timeout;如果 timeout == 0,或者调度线程立刻执行,fail(taskId, ...) 会先跑到 pendingCalls.remove(taskId) 并直接返回。随后 Line 109 再 put,这次调用就会变成一个永远不会超时、也不会完成的悬挂请求。
🛠️ 一个可行的修复模式
private static class PendingEntry<T> {
final ReturnValueCompletion<T> completion;
- final ThreadFacadeImpl.TimeoutTaskReceipt timeoutReceipt;
+ volatile ThreadFacadeImpl.TimeoutTaskReceipt timeoutReceipt;
- PendingEntry(ReturnValueCompletion<T> completion,
- ThreadFacadeImpl.TimeoutTaskReceipt timeoutReceipt) {
+ PendingEntry(ReturnValueCompletion<T> completion) {
this.completion = completion;
- this.timeoutReceipt = timeoutReceipt;
}
}
@@
public String submit(ReturnValueCompletion<T> completion, TimeUnit unit, long timeout) {
String taskId = Platform.getUuid();
-
- ThreadFacadeImpl.TimeoutTaskReceipt timeoutReceipt = thdf.submitTimeoutTask(() -> {
- fail(taskId, operr("[Webhook Timeout] callback timed out for taskId[%s], path[%s]",
- taskId, protocol.getCallbackPath()));
- }, unit, timeout);
-
- pendingCalls.put(taskId, new PendingEntry<>(completion, timeoutReceipt));
+ PendingEntry<T> entry = new PendingEntry<>(completion);
+ pendingCalls.put(taskId, entry);
+ entry.timeoutReceipt = thdf.submitTimeoutTask(() -> {
+ fail(taskId, operr("[Webhook Timeout] callback timed out for taskId[%s], path[%s]",
+ taskId, protocol.getCallbackPath()));
+ }, unit, timeout);
return taskId;
}
@@
PendingEntry<T> entry = pendingCalls.remove(taskId);
if (entry != null) {
- entry.timeoutReceipt.cancel();
+ if (entry.timeoutReceipt != null) {
+ entry.timeoutReceipt.cancel();
+ }
entry.completion.fail(error);
}
@@
PendingEntry<T> entry = pendingCalls.remove(taskId);
if (entry == null) {
@@
- entry.timeoutReceipt.cancel();
+ if (entry.timeoutReceipt != null) {
+ entry.timeoutReceipt.cancel();
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@core/src/main/java/org/zstack/core/rest/webhook/WebhookCallbackClient.java`
around lines 101 - 109, The current submit method starts the timeout task via
thdf.submitTimeoutTask before putting the PendingEntry into pendingCalls, which
can race (timeout may run before put) and drop completion; fix by creating and
registering the PendingEntry in pendingCalls first (e.g., new
PendingEntry<>(completion, null) and put it), then call thdf.submitTimeoutTask
and store the returned TimeoutTaskReceipt into that PendingEntry (or replace the
entry with one containing the receipt). Update references: method submit, map
pendingCalls, class PendingEntry, thdf.submitTimeoutTask, fail/operr and
protocol.getCallbackPath() so the timeout handler will always find and remove
the registered entry.
| [source,go] | ||
| ---- | ||
| type Cms struct { | ||
| CmsUuid string | ||
| Type string ### cloud/zsv/zaku/zns | ||
| IP string ### cloud mn vip | ||
| Role string ###owner, user | ||
| CmsResourceUuid string ###owner, user | ||
| } | ||
|
|
||
| type Segment { | ||
| ... | ||
| CmsMetaDatas []Cms `json:"cms"` | ||
| } | ||
| ---- |
There was a problem hiding this comment.
把这段示例改成合法 Go,或者明确标成伪代码。
这里标了 [source,go],但片段本身不是有效 Go:### 不是 Go 注释,type Segment { 也缺少 struct。这会误导读者直接复制示例,也会让语法高亮给出错误暗示。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/modules/network/pages/networkResource/ZnsIntegration.adoc` around lines
15 - 29, The example marked [source,go] is not valid Go: replace the inline
`###` comments with proper Go comments or remove them, add the missing `struct`
keyword for `type Segment` and ensure the slice field is declared correctly
(e.g., `CmsMetaDatas []Cms` with a proper struct tag like `json:"cms"`), and
make field names/types conform to Go syntax in the `Cms` struct (e.g., ensure
fields like `CmsUuid`, `Type`, `IP`, `Role`, `CmsResourceUuid` use valid types
and optional `json` tags); alternatively, if you intend pseudocode, change the
block language from `go` to something like `text` and add a note that it’s
pseudocode so readers aren’t misled.
| public interface AfterReleaseVmNicExtensionPoint { | ||
| void afterReleaseVmNic(VmNicInventory nic, Completion completion); |
There was a problem hiding this comment.
请为接口方法补充方法级 Javadoc,明确回调契约。
当前只有接口级说明,Completion 的调用约束(例如成功/失败分支和调用时机)不够明确,扩展实现方容易误用。
建议补丁
public interface AfterReleaseVmNicExtensionPoint {
+ /**
+ * Called after VmNic reaches this extension point stage.
+ * Implementations must complete via {`@code` completion.success()} or {`@code` completion.fail()}.
+ */
void afterReleaseVmNic(VmNicInventory nic, Completion completion);
}As per coding guidelines, “接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释”。
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| public interface AfterReleaseVmNicExtensionPoint { | |
| void afterReleaseVmNic(VmNicInventory nic, Completion completion); | |
| public interface AfterReleaseVmNicExtensionPoint { | |
| /** | |
| * Called after VmNic reaches this extension point stage. | |
| * Implementations must complete via {`@code` completion.success()} or {`@code` completion.fail()}. | |
| */ | |
| void afterReleaseVmNic(VmNicInventory nic, Completion completion); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@header/src/main/java/org/zstack/header/vm/AfterReleaseVmNicExtensionPoint.java`
around lines 10 - 11, 为 AfterReleaseVmNicExtensionPoint 接口的 afterReleaseVmNic
方法添加方法级 Javadoc,说明回调契约:明确在何种时机框架会调用 afterReleaseVmNic,要求扩展实现必须在处理完成后且仅调用一次
Completion(调用 completion.success() 表示成功、调用 completion.fail(Throwable)
表示失败),不应阻塞调用线程或抛出未捕获异常,并说明是否允许异步完成及线程/事务约束;同时移除方法声明中多余的 public
修饰符以符合接口方法规范(定位符:AfterReleaseVmNicExtensionPoint.afterReleaseVmNic(VmNicInventory
nic, Completion completion))。
| return; | ||
| } | ||
|
|
||
| L3NetworkInventory l3Inv = L3NetworkInventory.valueOf(dbf.findByUuid(msg.getL3NetworkUuid(), L3NetworkVO.class)); |
There was a problem hiding this comment.
请为 L3 查询结果增加空值保护,避免 NPE。
Line 197 直接对 dbf.findByUuid(...) 结果做 L3NetworkInventory.valueOf(...),在并发删除等场景下可能触发空指针并返回不明确错误。建议先判空并显式 trigger.fail(...)。
💡建议修改
- L3NetworkInventory l3Inv = L3NetworkInventory.valueOf(dbf.findByUuid(msg.getL3NetworkUuid(), L3NetworkVO.class));
+ L3NetworkVO l3Vo = dbf.findByUuid(msg.getL3NetworkUuid(), L3NetworkVO.class);
+ if (l3Vo == null) {
+ trigger.fail(operr("cannot find l3Network[uuid:%s]", msg.getL3NetworkUuid()));
+ return;
+ }
+ L3NetworkInventory l3Inv = L3NetworkInventory.valueOf(l3Vo);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@network/src/main/java/org/zstack/network/l3/L3NetworkManagerImpl.java` at
line 197, 在 L3NetworkManagerImpl 中避免直接对 dbf.findByUuid(...) 的返回值调用
L3NetworkInventory.valueOf(...) 导致 NPE:先把 dbf.findByUuid(msg.getL3NetworkUuid(),
L3NetworkVO.class) 的结果赋给一个局部变量(例如 vo 或 l3Vo),对该变量判空;若为 null,调用
trigger.fail(...)(包含清晰的错误信息和 msg.getL3NetworkUuid())并返回;否则再调用
L3NetworkInventory.valueOf(...) 继续后续逻辑。
| try { | ||
| tagMgr.createTagsFromAPICreateMessage(msg, vo.getUuid(), SdnControllerVO.class.getSimpleName()); | ||
| event.setInventory(SdnControllerInventory.valueOf(dbf.findByUuid(vo.getUuid(), SdnControllerVO.class))); | ||
| } catch (Exception e) { | ||
| logger.warn(String.format("failed to load SdnControllerVO[uuid:%s] after init: %s", | ||
| vo.getUuid(), e.getMessage()), e); | ||
| } |
There was a problem hiding this comment.
不要在成功路径里吞掉初始化失败。
Line 274 或 Line 275 任一失败,这里都会继续 publish(event),调用方仍会收到 success,但 tag 可能没建、inventory 也可能为空。这个分支会把“创建成功”变成静默的部分成功。
Based on learnings 在 ZStack 中,资源创建成功路径的 tagMgr.createTagsFromAPICreateMessage(...) 和后续 dbf.findByUuid(...) 不需要用 try/catch 来防御“资源不存在”异常,应依赖生命周期和数据库约束保证一致性。
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java`
around lines 273 - 279, The current try/catch in SdnControllerManagerImpl around
tagMgr.createTagsFromAPICreateMessage(...) and dbf.findByUuid(...) silently
swallows initialization failures and still calls publish(event); remove the
catch so failures propagate (or rethrow the exception) instead of continuing on
the success path, ensuring event.setInventory(...) and tag creation errors
prevent publish(event) from returning success; locate the block that calls
tagMgr.createTagsFromAPICreateMessage(...), dbf.findByUuid(...), and
event.setInventory(...) and delete the surrounding try/catch (or replace with a
rethrow) so errors are not suppressed.
| Map<String, List<VmNicInventory>> nicMaps = new HashMap<>(); | ||
| for (VmNicInventory nic : nics) { | ||
| L3NetworkVO l3Vo = dbf.findByUuid(nic.getL3NetworkUuid(), L3NetworkVO.class); | ||
| if (l3Vo == null) { | ||
| continue; | ||
| } | ||
|
|
||
| L2NetworkVO l2VO = dbf.findByUuid(l3Vo.getL2NetworkUuid(), L2NetworkVO.class); | ||
| if (l2VO == null || shouldSkipSdnForNic(l2VO)) { | ||
| continue; | ||
| } | ||
|
|
||
| String controllerUuid = L2NetworkSystemTags.L2_NETWORK_SDN_CONTROLLER_UUID.getTokenByResourceUuid( | ||
| l2VO.getUuid(), L2NetworkSystemTags.L2_NETWORK_SDN_CONTROLLER_UUID_TOKEN); | ||
| if (controllerUuid == null) { | ||
| completion.fail(operr(ORG_ZSTACK_SDNCONTROLLER_10006, "sdn l2 network[uuid:%s] has not attached controller", l2VO.getUuid())); | ||
| return; | ||
| } | ||
|
|
||
| nicMaps.computeIfAbsent(controllerUuid, k -> new ArrayList<>()).add(nic); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
把 NIC→controller 解析提成一个批量 helper。
这三段逻辑对每个 NIC 都做两次 findByUuid(),而且几乎完全复制了三遍;NIC 数量一多就会退化成明显的 N+1 查询。建议先批量取 L3NetworkVO / L2NetworkVO / controller tag 映射,再复用一个公共 helper 生成 nicMaps。
As per coding guidelines “禁止循环里套查询,避免嵌套查询带来的性能问题”。
Also applies to: 698-717, 734-753
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java`
around lines 661 - 680, The loop building nicMaps in SdnControllerManagerImpl
(currently iterating nics and calling dbf.findByUuid for L3NetworkVO and
L2NetworkVO and reading L2_NETWORK_SDN_CONTROLLER_UUID tag per NIC) causes N+1
queries; refactor by adding a batch helper (e.g., buildNicToControllerMap or
resolveNicsToControllers) that: collects all l3 uuids from the nics, does a
single dbf.listByIds/findByUuids for L3NetworkVO, collects all referenced l2
uuids and batch-fetches L2NetworkVOs, then resolves controllerUuid tokens for
those L2s in one pass and returns a Map<controllerUuid, List<VmNicInventory>>;
replace the three similar loops (around the shown block and at ranges 698-717,
734-753) to call this helper and handle null/missing controllerUuid by failing
the completion once with the same operr message referencing l2 uuid.
| L3NetworkVO l3Vo = dbf.findByUuid(nic.getL3NetworkUuid(), L3NetworkVO.class); | ||
| if (l3Vo == null) { | ||
| continue; | ||
| } | ||
|
|
||
| L2NetworkVO l2VO = dbf.findByUuid(l3Vo.getL2NetworkUuid(), L2NetworkVO.class); | ||
| if (l2VO == null || shouldSkipSdnForNic(l2VO)) { | ||
| continue; |
There was a problem hiding this comment.
分配路径不应静默跳过缺失的网络对象。
这里对缺失的 L3NetworkVO / L2NetworkVO 直接 continue,最终 nicMaps 可能为空并返回 completion.success()。结果就是 VM 流程继续成功,但 SDN 端口根本没有创建。分配路径应该 fail-fast,而不是把这种数据不一致当成可忽略情况。
🛠️ 一个更安全的处理方式
for (VmNicInventory nic : nics) {
L3NetworkVO l3Vo = dbf.findByUuid(nic.getL3NetworkUuid(), L3NetworkVO.class);
if (l3Vo == null) {
- continue;
+ completion.fail(operr("cannot find l3Network[uuid:%s] for vmNic[uuid:%s]",
+ nic.getL3NetworkUuid(), nic.getUuid()));
+ return;
}
L2NetworkVO l2VO = dbf.findByUuid(l3Vo.getL2NetworkUuid(), L2NetworkVO.class);
- if (l2VO == null || shouldSkipSdnForNic(l2VO)) {
- continue;
+ if (l2VO == null) {
+ completion.fail(operr("cannot find l2Network[uuid:%s] for vmNic[uuid:%s]",
+ l3Vo.getL2NetworkUuid(), nic.getUuid()));
+ return;
+ }
+ if (shouldSkipSdnForNic(l2VO)) {
+ continue;
}Based on learnings backend realization code can safely assume the existence of the VO for the resource being operated on due to lifecycle management and strong foreign key constraints in the database schema.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@plugin/sdnController/src/main/java/org/zstack/sdnController/SdnControllerManagerImpl.java`
around lines 663 - 670, The loop that looks up L3NetworkVO and L2NetworkVO
should fail-fast instead of silently continuing: when
dbf.findByUuid(nic.getL3NetworkUuid(), L3NetworkVO.class) or
dbf.findByUuid(l3Vo.getL2NetworkUuid(), L2NetworkVO.class) returns null, stop
processing and fail the allocation (e.g., call completion.fail or throw an
exception) with a clear error message including the missing UUID and NIC info;
do not let nicMaps remain empty and return completion.success(). Update the
logic around shouldSkipSdnForNic to still allow skipping when intentional, but
treat null VO results as fatal data-inconsistency errors and surface them
immediately.
| GlobalConfigMarkDown markDown = getExistGlobalConfigMarkDown(mdPath) | ||
| List<String> missingFields = [] | ||
| if (markDown.desc_CN.isEmpty()) missingFields.add("desc_CN") | ||
| if (markDown.name_CN.isEmpty()) missingFields.add("name_CN") | ||
| if (markDown.valueRangeRemark.isEmpty()) missingFields.add("valueRangeRemark") | ||
| if (markDown.defaultValueRemark.isEmpty()) missingFields.add("defaultValueRemark") | ||
| if (markDown.resourcesGranularitiesRemark.isEmpty()) missingFields.add("resourcesGranularitiesRemark") | ||
| if (markDown.additionalRemark.isEmpty()) missingFields.add("additionalRemark") | ||
| if (markDown.backgroundInformation.isEmpty()) missingFields.add("backgroundInformation") | ||
| if (markDown.isUIExposed.isEmpty()) missingFields.add("isUIExposed") | ||
| if (markDown.isCLIExposed.isEmpty()) missingFields.add("isCLIExposed") |
There was a problem hiding this comment.
缺失章节会被误判为已填写
这里用 .isEmpty() 判断必填项会漏报整段缺失的 section。getExistGlobalConfigMarkDown() 是从 new GlobalConfigMarkDown() 开始解析的,这些字段默认都是非空的 PLACEHOLDER...;如果 markdown 里根本没有对应标题,解析后字段仍然不是空串,所以当前校验会直接通过。
建议修复
GlobalConfigMarkDown markDown = getExistGlobalConfigMarkDown(mdPath)
List<String> missingFields = []
- if (markDown.desc_CN.isEmpty()) missingFields.add("desc_CN")
- if (markDown.name_CN.isEmpty()) missingFields.add("name_CN")
- if (markDown.valueRangeRemark.isEmpty()) missingFields.add("valueRangeRemark")
- if (markDown.defaultValueRemark.isEmpty()) missingFields.add("defaultValueRemark")
- if (markDown.resourcesGranularitiesRemark.isEmpty()) missingFields.add("resourcesGranularitiesRemark")
- if (markDown.additionalRemark.isEmpty()) missingFields.add("additionalRemark")
- if (markDown.backgroundInformation.isEmpty()) missingFields.add("backgroundInformation")
- if (markDown.isUIExposed.isEmpty()) missingFields.add("isUIExposed")
- if (markDown.isCLIExposed.isEmpty()) missingFields.add("isCLIExposed")
+ def missing = { String value ->
+ StringUtils.isBlank(value) || value.startsWith(PLACEHOLDER)
+ }
+ if (missing(markDown.desc_CN)) missingFields.add("desc_CN")
+ if (missing(markDown.name_CN)) missingFields.add("name_CN")
+ if (missing(markDown.valueRangeRemark)) missingFields.add("valueRangeRemark")
+ if (missing(markDown.defaultValueRemark)) missingFields.add("defaultValueRemark")
+ if (missing(markDown.resourcesGranularitiesRemark)) missingFields.add("resourcesGranularitiesRemark")
+ if (missing(markDown.additionalRemark)) missingFields.add("additionalRemark")
+ if (missing(markDown.backgroundInformation)) missingFields.add("backgroundInformation")
+ if (missing(markDown.isUIExposed)) missingFields.add("isUIExposed")
+ if (missing(markDown.isCLIExposed)) missingFields.add("isCLIExposed")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rest/src/main/resources/scripts/RestDocumentationGenerator.groovy` around
lines 851 - 861, 当前校验用的是 markDown.desc_CN.isEmpty() 等判断,但 GlobalConfigMarkDown
是从 new GlobalConfigMarkDown() 开始解析,默认值为 "PLACEHOLDER...",所以 .isEmpty() 会漏报未出现的
section。更新校验逻辑(在 RestDocumentationGenerator.groovy 中处理
getExistGlobalConfigMarkDown 返回的 GlobalConfigMarkDown)改为使用一个辅助判空方法(例如
isMissingField(String s))并在该方法中同时判断 null、trim().isEmpty() 和是否以
"PLACEHOLDER"(或项目中实际的占位符前缀)开头,替换对
desc_CN、name_CN、valueRangeRemark、defaultValueRemark、resourcesGranularitiesRemark、additionalRemark、backgroundInformation、isUIExposed、isCLIExposed
的所有 .isEmpty() 调用;或者相应地修改 getExistGlobalConfigMarkDown 使未在 markdown 中出现的字段返回
null,并据此做相同的判定。
| try { | ||
| checkMD(mdPath, it.value) | ||
| } catch (CloudRuntimeException e) { | ||
| allErrors.add(e.message) |
There was a problem hiding this comment.
这里只聚合了 CloudRuntimeException,仍然会在首个解析异常处提前退出
新逻辑的目标是汇总所有 markdown 校验失败,但 checkMD() / getExistGlobalConfigMarkDown() 里仍可能抛出普通 RuntimeException(例如 markdown 结构异常触发的 IndexOutOfBoundsException 或 NullPointerException)。当前只 catch CloudRuntimeException,这类文件一旦出现,循环还是会直接中断,最终拿不到完整的错误列表。
建议修复
try {
checkMD(mdPath, it.value)
- } catch (CloudRuntimeException e) {
- allErrors.add(e.message)
+ } catch (Exception e) {
+ allErrors.add(e.message ?: "Validation failed for ${mdPath}: ${e.class.name}")
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rest/src/main/resources/scripts/RestDocumentationGenerator.groovy` around
lines 2846 - 2849, The loop currently only catches CloudRuntimeException when
calling checkMD(mdPath, it.value), so other runtime exceptions (e.g.,
IndexOutOfBoundsException, NullPointerException thrown from checkMD or
getExistGlobalConfigMarkDown) will break the loop and prevent collecting all
errors; update the try/catch to catch Exception (or RuntimeException) in
addition to CloudRuntimeException around the call sites (checkMD and any calls
to getExistGlobalConfigMarkDown) and add their messages to allErrors (preserving
existing CloudRuntimeException behavior) so the loop continues and aggregates
every markdown validation failure.
Resolves: ZCF-2047 Change-Id: I8ca7e84446f10cedbc2afb6281638f50ea6cb764 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java (1)
722-739: 这里建议别再新增一段TypedQuery查询分支。这段和上面的 L3 分支结构几乎一致,但继续引入 JPQL 字符串会让字段变更更脆弱,也把相同的筛选条件又复制了一遍。更合适的是抽一个按网络范围查 VM 的 helper,并改用现有的
Q.New(...)/SQL.New(...)风格收敛实现。As per coding guidelines 新增代码里面避免使用
TypedQuery。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java` around lines 722 - 739, The new anonymous Callable in VmCascadeExtension introduces a raw TypedQuery JPQL branch to fetch VMs by L2 network UUIDs; extract this into a shared helper (e.g., queryVmsByNetworkRange or findVmsByL2NetworkUuids) and reimplement it using the project-standard Q.New(...) / SQL.New(...) API used by the existing L3 branch, reusing the same filters (vm.type == VmInstanceConstant.USER_VM_TYPE, vm.state in {Stopped, Paused, Running, Destroyed}, join through VmNicVO -> L3NetworkVO -> L2Network UUIDs) instead of adding another TypedQuery; update the Callable to call that helper and remove the JPQL string so the code follows the coding guideline of avoiding TypedQuery.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java`:
- Around line 339-343: The current L2 delete flow first derives l3Uuids via
Q.New(L3NetworkVO).in(L3NetworkVO_.l2NetworkUuid, l2Uuids) and later re-queries
NIC/VMs using nic.l3NetworkUuid, but VmNicVO.l3NetworkUuid is SET_NULL on L3
delete so NICs can be missed if L3s were processed earlier; instead, compute the
affected VmNic/VM set once up-front for the L2 cascade (e.g. query VmNicVO by
the L2 identifier(s) or by joined criteria that do not rely on L3NetworkVO
existence), store that NIC/VM list (used where msgs is built) and reuse it in
subsequent steps instead of re-querying via L3NetworkVO/l3.uuid so the detach
logic always runs for those NICs/VMs.
---
Nitpick comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java`:
- Around line 722-739: The new anonymous Callable in VmCascadeExtension
introduces a raw TypedQuery JPQL branch to fetch VMs by L2 network UUIDs;
extract this into a shared helper (e.g., queryVmsByNetworkRange or
findVmsByL2NetworkUuids) and reimplement it using the project-standard
Q.New(...) / SQL.New(...) API used by the existing L3 branch, reusing the same
filters (vm.type == VmInstanceConstant.USER_VM_TYPE, vm.state in {Stopped,
Paused, Running, Destroyed}, join through VmNicVO -> L3NetworkVO -> L2Network
UUIDs) instead of adding another TypedQuery; update the Callable to call that
helper and remove the JPQL string so the code follows the coding guideline of
avoiding TypedQuery.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: 4f4af80d-cd65-4c77-b7ad-ec1b4e60ef76
📒 Files selected for processing (1)
compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java
| List<String> l3Uuids = Q.New(L3NetworkVO.class).in(L3NetworkVO_.l2NetworkUuid, l2Uuids) | ||
| .select(L3NetworkVO_.uuid).listValues(); | ||
| if (l3Uuids.isEmpty()) { | ||
| return msgs; | ||
| } |
There was a problem hiding this comment.
L2 删除路径不要依赖仍然存在的 L3NetworkVO 记录。
这里一处先通过 L2 -> L3 现查 UUID,另一处再通过 nic.l3NetworkUuid = l3.uuid 反查 VM。结合 header/src/main/java/org/zstack/header/vm/VmNicVO.java:35-39,L3 删除时 VmNicVO.l3NetworkUuid 会被 SET_NULL;如果同一条 cascade 里 L3 已先处理,这两个查询都会漏掉目标 NIC/VM,最终这次新增的 detach 逻辑就不会触发。建议把 L2 删除命中的 NIC/VM 一次性算出来并沿用,避免在后续步骤再次依赖 L3 行仍在库里。
Also applies to: 726-731
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@compute/src/main/java/org/zstack/compute/vm/VmCascadeExtension.java` around
lines 339 - 343, The current L2 delete flow first derives l3Uuids via
Q.New(L3NetworkVO).in(L3NetworkVO_.l2NetworkUuid, l2Uuids) and later re-queries
NIC/VMs using nic.l3NetworkUuid, but VmNicVO.l3NetworkUuid is SET_NULL on L3
delete so NICs can be missed if L3s were processed earlier; instead, compute the
affected VmNic/VM set once up-front for the L2 cascade (e.g. query VmNicVO by
the L2 identifier(s) or by joined criteria that do not rely on L3NetworkVO
existence), store that NIC/VM list (used where msgs is built) and reuse it in
subsequent steps instead of re-querying via L3NetworkVO/l3.uuid so the detach
logic always runs for those NICs/VMs.
Resolves: ZCF-1365
Change-Id: I73687962636e7871626e687761626d6661716668
sync from gitlab !9574