Skip to content

[DataProcessor] Refactor and unify text/multimodal processor pipeline#7853

Open
luukunn wants to merge 2 commits into
PaddlePaddle:developfrom
luukunn:merge_3
Open

[DataProcessor] Refactor and unify text/multimodal processor pipeline#7853
luukunn wants to merge 2 commits into
PaddlePaddle:developfrom
luukunn:merge_3

Conversation

@luukunn
Copy link
Copy Markdown
Collaborator

@luukunn luukunn commented May 19, 2026

Motivation

本 PR 主要对 FastDeploy 的 Processor 体系进行了合并与重构,统一了文本模型与多模态模型的请求预处理流程,提升了代码的可维护性和扩展性。

通过本次改动,将多模态输入处理逻辑进行模块化拆分,并接入统一的 Processor 框架,为后续支持和维护 Qwen2.5-VL、Qwen3-VL、ERNIE 4.5 VL、PaddleOCR-VL 等模型提供更清晰的实现基础。

Modifications

  • 新增统一的 Processor 实现(fastdeploy/input/processor.py),整合原有文本处理流程。
  • 修改 fastdeploy/input/preprocess.py 的 processor 创建逻辑:
    • 文本模型统一使用 Processor
    • 多模态模型通过挂载对应的 multimodal processor 进行处理
  • 新增多模态处理框架 fastdeploy/input/multimodal/,包括:
    • MMProcessor 抽象基类
    • QwenVLProcessor
    • Qwen3VLProcessor
    • Ernie4_5VLProcessor
    • PaddleOCRVLProcessor
  • 调整 chat 请求处理流程,支持在多模态场景下更自然地传递 messages
    • 修改 fastdeploy/entrypoints/llm.py
    • 修改 fastdeploy/entrypoints/chat_utils.py
  • 统一 messages -> prompt / multimodal_data 的处理流程,减少文本和多模态路径之间的分叉逻辑。
  • 补充和更新相关测试,包括:
    • processor 初始化与 preprocess 流程测试
    • chat / generation 相关测试
    • multimodal 公共工具测试
    • Qwen / Qwen3 / ERNIE / PaddleOCR 多模态 processor 测试
    • image processor 与缓存相关逻辑测试

Usage or Command

可通过以下命令运行相关测试:

python -m pytest tests/input/multimodal
python -m pytest tests/input/test_preprocess.py
python -m pytest tests/entrypoints/test_chat.py
python -m pytest tests/entrypoints/test_generation.py

Accuracy Tests

本 PR 主要涉及 Processor 架构重构与多模态输入处理流程整理,不直接修改模型前向计算逻辑或算子实现。

因此本次未提供 accuracy 测试结果。

Checklist

  • Add at least a tag in the PR title.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Copilot AI review requested due to automatic review settings May 19, 2026 08:48
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 19, 2026

Thanks for your contribution!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

本 PR 对 FastDeploy 的输入预处理链路进行合并重构:用统一的 Processor 承载文本模型预处理,并通过可插拔的 MMProcessor 体系接入多模态(Qwen2.5-VL/Qwen3-VL/ERNIE4.5-VL/PaddleOCR-VL),同时配套补充了较完整的单测覆盖。

Changes:

  • 新增统一的 fastdeploy/input/processor.py,整合文本请求预处理与响应解码逻辑,并支持挂载多模态处理器。
  • 新增 fastdeploy/input/multimodal/ 多模态处理框架与各模型实现(含 image processor、cache client、公共 resize 工具等)。
  • 调整入口侧 chat 请求组装方式,并更新/新增多模态与 processor 相关测试用例。

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/input/test_processor.py 新增统一 Processor 的单测(含 thinking/tool/reasoning/mm 委托等)
tests/input/test_preprocess.py 适配 preprocess 创建逻辑:文本改用 Processor
tests/input/multimodal/test_qwen3_vl.py 新增 Qwen3VLProcessor 相关单测
tests/input/multimodal/test_qwen_vl.py 新增 QwenVLProcessor 相关单测
tests/input/multimodal/test_paddleocr_vl.py 新增 PaddleOCRVLProcessor 相关单测
tests/input/multimodal/test_mm_processor.py 新增 MMProcessor 基类与 CacheClient/流程的单测
tests/input/multimodal/test_image_processors.py 新增多种 image processor 的单测
tests/input/multimodal/test_ernie4_5_vl.py 新增 Ernie4_5VLProcessor 相关单测
tests/input/multimodal/test_common.py 新增 multimodal/common.py resize 工具单测
tests/entrypoints/test_generation.py 调整 generation/chat 行为相关测试以适配新入口/processor
tests/entrypoints/test_chat.py 调整 chat 工具测试,改为 hook process_messages 捕获 prompt_tokens
fastdeploy/input/processor.py 新增统一 Processor 实现(文本+可选多模态)
fastdeploy/input/preprocess.py 重构 processor 创建流程:统一 Processor + 挂载 mm_processor
fastdeploy/input/multimodal/qwen3_vl.py 新增 Qwen3VLProcessor(基于 QwenVLProcessor)
fastdeploy/input/multimodal/qwen_vl.py 新增/迁移 QwenVLProcessor 多模态实现
fastdeploy/input/multimodal/paddleocr_vl.py 新增 PaddleOCRVLProcessor(覆盖 token/vit/video sampling 等差异)
fastdeploy/input/multimodal/mm_processor.py 新增 MMProcessor 抽象基类与缓存/打包/编排流程
fastdeploy/input/multimodal/image_processors/qwen3.py 新增 Qwen3ImageProcessor 默认参数实现
fastdeploy/input/multimodal/image_processors/qwen.py 迁移/实现 QwenImageProcessor
fastdeploy/input/multimodal/image_processors/paddleocr.py 新增 PaddleOCRImageProcessor
fastdeploy/input/multimodal/image_processors/ernie.py 迁移/实现 ERNIE AdaptiveImageProcessor(含视频/图片处理)
fastdeploy/input/multimodal/image_processors/init.py 导出各 image processor 符号
fastdeploy/input/multimodal/ernie4_5_vl.py 新增 Ernie4_5VLProcessor 实现
fastdeploy/input/multimodal/common.py 新增多模态通用 resize/scale 工具函数
fastdeploy/input/multimodal/init.py 导出 multimodal processors API
fastdeploy/entrypoints/llm.py 调整 chat 入参到 _add_request 的传递与 prompts 类型分支支持

Comment on lines +324 to +355
if self.reasoning_parser:
reasoning_delta_message = self.reasoning_parser.extract_reasoning_content_streaming(
previous_texts,
previous_texts + delta_text,
delta_text,
previous_token_ids,
previous_token_ids + token_ids,
token_ids,
self.model_status_dict[req_id],
)
if reasoning_delta_message:
reasoning_content = reasoning_delta_message.reasoning_content
reasoning_tokens = self.tokenizer.tokenize(reasoning_content) if reasoning_content else []
response_dict["outputs"]["reasoning_token_num"] = len(reasoning_tokens)
response_dict["outputs"]["reasoning_content"] = reasoning_content or ""
response_dict["outputs"]["text"] = reasoning_delta_message.content or ""
else:
if not is_end:
response_dict["outputs"]["skipped"] = True

if self.tool_parser_obj:
if req_id not in self.tool_parser_dict:
self.tool_parser_dict[req_id] = self.tool_parser_obj(self.tokenizer)
tool_parser = self.tool_parser_dict[req_id]
tool_call_delta_message = tool_parser.extract_tool_calls_streaming(
previous_texts,
previous_texts + delta_text,
delta_text,
previous_token_ids,
previous_token_ids + token_ids,
token_ids,
request,
Comment on lines +754 to +764
if prompt_token_ids[0] > self.tokenizer.vocab_size:
if not add_prefix_space:
log_request(
level=1,
message="bad_words: '{prompt}' token id {token_id} > vocab_size, skipping",
prompt=prompt,
token_id=prompt_token_ids[0],
)
continue
if prompt_token_ids not in token_ids:
token_ids.extend(prompt_token_ids)
Comment on lines +769 to +770
if isinstance(self.tokenizer, (LlamaTokenizer, Llama3Tokenizer)) and not self.tokenizer.pad_token_id:
return self.tokenizer.eos_token
Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-19 18:29:29

📋 Review 摘要

PR 概述:重构并统一 FastDeploy 文本与多模态 Processor 预处理流水线,引入 Processor/MMProcessor 统一框架,新增 Qwen-VL / Qwen3-VL / ERNIE 4.5 VL / PaddleOCR-VL 四种多模态处理器实现。

变更范围fastdeploy/entrypoints/fastdeploy/input/tests/entrypoints/tests/input/

影响面 TagDataProcessor APIServer

问题

级别 文件 概述
🟡 建议 fastdeploy/entrypoints/chat_utils.py:206 str content 直接返回原值,与 list 分支类型不一致,部分 chat template 可能逐字符遍历
❓ 疑问 fastdeploy/input/multimodal/ernie4_5_vl.py:208 load_video 未检查 0 帧边界,frames[-1] 可能 IndexError
❓ 疑问 fastdeploy/entrypoints/llm.py:339 _add_request 新增 list 类型判断缺少兜底分支

⚠️ 本 PR 变更量较大(28 文件),建议按功能内聚性拆分为以下子 PR,降低审查难度和合入风险:

建议拆分方案(🟡 建议)

  • PR 1: [框架核心] — fastdeploy/input/processor.py, fastdeploy/input/preprocess.py, fastdeploy/input/multimodal/__init__.py, fastdeploy/input/multimodal/mm_processor.py, fastdeploy/input/multimodal/common.py, fastdeploy/entrypoints/llm.py, fastdeploy/entrypoints/chat_utils.py, fastdeploy/input/multimodal_processor.py
  • PR 2: [多模态处理器实现] — fastdeploy/input/multimodal/qwen_vl.py, fastdeploy/input/multimodal/qwen3_vl.py, fastdeploy/input/multimodal/ernie4_5_vl.py, fastdeploy/input/multimodal/paddleocr_vl.py
  • PR 3: [Image Processors] — fastdeploy/input/multimodal/image_processors/ 目录下所有文件
  • PR 4: [测试] — tests/input/, tests/entrypoints/ 下所有测试文件

📝 PR 规范检查

标题 [DataProcessor] Refactor and unify text/multimodal processor pipeline 格式合规,Tag 使用正确。描述结构完整,包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 所有必填段落,Checklist 勾选状态合理(无需 cherry-pick,精度测试已说明理由)。✓

总体评价

代码整体质量较好,MMProcessor 抽象基类的模板方法模式设计合理,处理器框架清晰可扩展。主要需关注 chat_utils.py 的 content 类型变更对 chat template 的兼容性,以及 load_video 的零帧边界防护。建议合入前拆分 PR 并补充 chat template 兼容性验证。

parsed_content = []
elif isinstance(content, str):
parsed_content = [{"type": "text", "text": content}]
parsed_content = content
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 建议 parsed_content 类型不一致

原逻辑将 str 类型的 content 包装为 [{"type": "text", "text": content}],与 list 分支保持统一的 list[dict] 类型。修改后直接返回 str,导致 conversation 中各消息的 content 字段类型不一致(str / list[dict])。

若某个模型的 Jinja2 chat template 对 content 做无类型判断的 for part in content 遍历(如 {% for part in message.content %}),当 contentstr 时会逐字符遍历,产生静默逻辑错误。

建议修复策略:在 process_messagesprocessor.py)中对 chat template 渲染前统一处理 str/list 类型差异,或确认所有已支持模型的 chat template 均已针对 str content 做了类型判断。

"prompt_token_ids": prompts[i],
"request_id": request_id,
}
elif isinstance(prompts[i], list) and isinstance(prompts[i][0], dict):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 _add_request list 类型判断无兜底分支

新增了 list[int]list[dict] 和空列表三种 list 分支判断,但当 prompts[i] 是非空列表且 prompts[i][0] 既非 int 也非 dict 时(如误传了 list[str]),三个 elif isinstance(..., list) 分支均不匹配,会静默落入后续 isinstance(prompts[i], dict) 判断,最终触发不明确的报错。

建议在所有 list 分支之后、dict 分支之前添加:

elif isinstance(prompts[i], list):
    raise ValueError(
        f"prompts[{i}] is a list but first element type "
        f"{type(prompts[i][0])} is not supported. "
        "Expected list[int] (prompt_token_ids) or list[dict] (messages)."
    )

outputs["image_type_ids"].extend([1] * t)

pos_ids = self._compute_3d_positions(t, h, w, outputs["cur_position"])
outputs["position_ids"].extend(pos_ids)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ 疑问 load_video 未对 0 帧做防护

read_frames_paddlecodec 因视频解码失败或参数限制返回空列表时,frames[-1] 会抛出 IndexError

建议在奇偶补帧逻辑前添加:

if len(frames) == 0:
    raise ValueError(f"load_video: no frames extracted from {url}")

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 19, 2026

Codecov Report

❌ Patch coverage is 86.08838% with 255 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@bda1756). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/input/processor.py 72.22% 70 Missing and 65 partials ⚠️
fastdeploy/input/multimodal/qwen_vl.py 85.40% 25 Missing and 9 partials ⚠️
fastdeploy/input/multimodal/ernie4_5_vl.py 92.01% 18 Missing and 3 partials ⚠️
fastdeploy/input/multimodal/mm_processor.py 93.38% 9 Missing and 8 partials ⚠️
fastdeploy/input/multimodal/paddleocr_vl.py 86.40% 13 Missing and 4 partials ⚠️
...tdeploy/input/multimodal/image_processors/ernie.py 91.97% 8 Missing and 5 partials ⚠️
fastdeploy/input/preprocess.py 54.54% 10 Missing ⚠️
...stdeploy/input/multimodal/image_processors/qwen.py 93.02% 3 Missing and 3 partials ⚠️
fastdeploy/entrypoints/llm.py 50.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7853   +/-   ##
==========================================
  Coverage           ?   63.86%           
==========================================
  Files              ?      475           
  Lines              ?    66190           
  Branches           ?    10177           
==========================================
  Hits               ?    42273           
  Misses             ?    21040           
  Partials           ?     2877           
Flag Coverage Δ
GPU 72.76% <86.08%> (?)
XPU 6.93% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 19, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-19 21:50:45

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

❌ 有 1 个 Required 任务失败Approval),需处理后方可合并。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
42(0) 42 38 3 0 1 0

2 任务状态汇总

2.1 Required任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 11s PR问题:PR新增多处log输出需RD成员审批 请 xyxinyang 或 zyyzghb Approve PR Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 29/32 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 2m25s Job -
Trigger Jenkins for PR 11m57s Job -
⏸️ CI_HPU - - -
其余 29 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 代码规范(置信度: 高)

Approval

  • 状态: ❌ 失败
  • 错误类型: 代码规范
  • 置信度: 高
  • 根因摘要: PR新增多处log输出需指定RD成员审批方可合并
  • 分析器: 通用分析(fallback)

根因详情:
本 PR 修改了日志行为,在 diff 中新增了多处 data_processor_logger.infodata_processor_logger.debuglog_request 调用。根据 FastDeploy 代码规范,修改 .info/.debug/.error/log_request 等日志行为必须获得 FastDeploy 核心 RD(xyxinyang(zhouchong)zyyzghb(zhangyongyue))至少 1 人的 Approve,check_approval.sh 脚本检测到 1 处审批错误后以 exit code 6 退出。

关键日志:

Detected log modification in diff:
+        data_processor_logger.info(
+        data_processor_logger.debug(...)
+        log_request(...)
0. You must have one FastDeploy RD (xyxinyang(zhouchong), zyyzghb(zhangyongyue)) approval for modifying logging behavior (.info/.debug/.error/log_request).
There are 1 approved errors.
##[error]Process completed with exit code 6.

修复建议:

  1. xyxinyang(zhouchong)zyyzghb(zhangyongyue) 在本 PR 上进行 Approve Review,即可解除审批门限。
  2. 若日志修改为误加,可删除多余的 log 语句后重新推送,Approval 检查将自动重新运行。

修复建议摘要: 请 xyxinyang 或 zyyzghb Approve PR 即可解除审批门限

链接: 查看日志

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants