Skip to content

<fix>[agent]: add timeout to deploy agent syncJsonPost to prevent queue blocking#4129

Open
zstack-robot-1 wants to merge 1 commit into
5.5.22from
sync/songtao.liu/fix/ZSTAC-84187-clean
Open

<fix>[agent]: add timeout to deploy agent syncJsonPost to prevent queue blocking#4129
zstack-robot-1 wants to merge 1 commit into
5.5.22from
sync/songtao.liu/fix/ZSTAC-84187-clean

Conversation

@zstack-robot-1
Copy link
Copy Markdown
Collaborator

Root Cause

AgentManagerImpl.java:155 — syncJsonPost(url, cmd, Void.class) 三参数无超时。Agent端口不可达时阻塞约300s(全局readTimeout),per-IP deploy-agent队列槽在此期间全部积压。

Fix

加 AGENT_INIT_TIMEOUT=60 + TimeUnit.SECONDS 参数,60s后timeout异常自动触发FlowChain error handler → completion.fail() → chain.next() 释放队列槽。

Files Changed

  1. core/.../agent/AgentManagerImpl.java (+2/-1: 加常量+超时参数)
  2. test/.../TestAgentInitTimeout.java (+48: 超时行为验证+常量值检查)

Risk: Low

  • 超时参数只影响网络不可达场景
  • 正常可达 Agent 部署不受影响
  • 模式固定:参考 NB-001 ZSTAC-67814 KVMHost 同模式修复

Resolves: ZSTAC-84187

sync from gitlab !10032

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Review Change Stack

Warning

Review limit reached

@MatheMatrix, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 39 minutes and 32 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 7f65d120-f208-408d-a5a3-e74bb133d75a

📥 Commits

Reviewing files that changed from the base of the PR and between cc9d1e7 and d46ce9c.

📒 Files selected for processing (2)
  • core/src/main/java/org/zstack/core/agent/AgentManagerImpl.java
  • test/src/test/java/org/zstack/test/core/rest/TestAgentInitTimeout.java

概述

本 PR 为 ZStack Agent 初始化流程引入显式超时控制。在 AgentManagerImpl 中新增 60 秒超时常量,并将 connect() 方法中的 syncJsonPost 调用改为显式传入该超时参数;同时新增单元测试验证超时机制的正确实现。

变更

Agent 初始化超时控制

层 / 文件 摘要
初始化超时常量与 API 调用
core/src/main/java/org/zstack/core/agent/AgentManagerImpl.java
新增 public static final int AGENT_INIT_TIMEOUT = 60 常量;将 connect() 方法中 INIT_PATHrestf.syncJsonPost 调用改为显式传入 TimeUnit.SECONDSAGENT_INIT_TIMEOUT 参数。
超时功能验证测试
test/src/test/java/org/zstack/test/core/rest/TestAgentInitTimeout.java
新增 JUnit 测试类;setUp() 通过 WebBeanConstructor 加载 XML 并获取 RESTFacade 组件;testSyncJsonPostTimeoutEnforced() 验证对不可达 URL 的调用会正确超时并抛异常;testTimeoutConstantIsSet() 验证常量值为 60 秒。

估算代码审查工作量

🎯 2 (Simple) | ⏱️ ~12 分钟

🐰 兔兔来庆祝,超时有规控,
六十秒一刻,初始化稳妥,
测试把把稳,网络莫再拖!


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error PR标题超过72字符限制(80字符),且缺少必需的JIRA key前缀。 修改标题为不超过72字符,并确保遵循 [scope]: 格式,例如:'[agent]: Add timeout to agent initialization'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description check ✅ Passed PR描述详细说明了根本原因、修复方案、涉及文件和风险评估,与changeset相关且提供了充分的上下文。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/songtao.liu/fix/ZSTAC-84187-clean

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
test/src/test/java/org/zstack/test/core/rest/TestAgentInitTimeout.java (1)

13-13: ⚡ Quick win

测试类命名不符合仓库规范(应以 Test/Case 结尾)

建议将类名改为以 TestCase 结尾(例如 AgentInitTimeoutTest),并同步调整文件名,避免后续测试发现规则不一致。

♻️ 建议修改
-public class TestAgentInitTimeout {
+public class AgentInitTimeoutTest {

As per coding guidelines “测试类需要以 Test 或 Case 结尾。”

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/src/test/java/org/zstack/test/core/rest/TestAgentInitTimeout.java` at
line 13, Rename the test class TestAgentInitTimeout to follow the repository
convention (e.g., AgentInitTimeoutTest) and update the corresponding file name
accordingly; specifically change the class declaration "public class
TestAgentInitTimeout" to "public class AgentInitTimeoutTest" and ensure the test
file is renamed to match the new class name so test discovery and naming rules
remain consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/src/test/java/org/zstack/test/core/rest/TestAgentInitTimeout.java`:
- Line 13: Rename the test class TestAgentInitTimeout to follow the repository
convention (e.g., AgentInitTimeoutTest) and update the corresponding file name
accordingly; specifically change the class declaration "public class
TestAgentInitTimeout" to "public class AgentInitTimeoutTest" and ensure the test
file is renamed to match the new class name so test discovery and naming rules
remain consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: cfab1fbf-f61e-4eed-9a1c-c281cc42eaed

📥 Commits

Reviewing files that changed from the base of the PR and between 113a77a and cc9d1e7.

📒 Files selected for processing (2)
  • core/src/main/java/org/zstack/core/agent/AgentManagerImpl.java
  • test/src/test/java/org/zstack/test/core/rest/TestAgentInitTimeout.java

Comment on lines +31 to +41
long start = System.currentTimeMillis();
try {
restf.syncJsonPost(url, "{}", Void.class, TimeUnit.SECONDS, 1);
Assert.fail("should throw on timeout");
} catch (Exception e) {
long elapsed = System.currentTimeMillis() - start;
errorDetails = e.getMessage();
Assert.assertNotNull(errorDetails);
Assert.assertTrue("timeout should take at least ~1s, got " + elapsed + "ms",
elapsed >= 500 || errorDetails.contains("timed out") || errorDetails.contains("Timeout"));
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

该用例可能无法验证“超时参数已生效”的核心行为

当前断言允许“非超时错误”通过(elapsed >= 500 || ...),且 core/src/main/java/org/zstack/core/rest/RESTFacadeImpl.javaUNIT_TEST_ON 分支里调用 template.exchange(url, method, req, String.class),并未使用 unit/timeout 参数。这样即使超时参数未生效,此测试也可能通过,存在误报风险。

…ue blocking

Replace hardcoded timeout with AGENT_INIT_TIMEOUT constant. Delete
self-proving constant test. Fix test to use RFC 5737 TEST-NET-1
unreachable address. Strengthen timing assertion to 900ms threshold.

Resolves: ZSTAC-84187
Change-Id: Ie81f20c5d4a7b3f9e2d6c8a1b4f7e3d9c2a5f801
@MatheMatrix MatheMatrix force-pushed the sync/songtao.liu/fix/ZSTAC-84187-clean branch from cc9d1e7 to d46ce9c Compare May 28, 2026 05:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant