[DataProcessor] Refactor and unify text/multimodal processor pipeline#7853
[DataProcessor] Refactor and unify text/multimodal processor pipeline#7853luukunn wants to merge 2 commits into
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Pull request overview
本 PR 对 FastDeploy 的输入预处理链路进行合并重构:用统一的 Processor 承载文本模型预处理,并通过可插拔的 MMProcessor 体系接入多模态(Qwen2.5-VL/Qwen3-VL/ERNIE4.5-VL/PaddleOCR-VL),同时配套补充了较完整的单测覆盖。
Changes:
- 新增统一的
fastdeploy/input/processor.py,整合文本请求预处理与响应解码逻辑,并支持挂载多模态处理器。 - 新增
fastdeploy/input/multimodal/多模态处理框架与各模型实现(含 image processor、cache client、公共 resize 工具等)。 - 调整入口侧 chat 请求组装方式,并更新/新增多模态与 processor 相关测试用例。
Reviewed changes
Copilot reviewed 26 out of 26 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/input/test_processor.py | 新增统一 Processor 的单测(含 thinking/tool/reasoning/mm 委托等) |
| tests/input/test_preprocess.py | 适配 preprocess 创建逻辑:文本改用 Processor |
| tests/input/multimodal/test_qwen3_vl.py | 新增 Qwen3VLProcessor 相关单测 |
| tests/input/multimodal/test_qwen_vl.py | 新增 QwenVLProcessor 相关单测 |
| tests/input/multimodal/test_paddleocr_vl.py | 新增 PaddleOCRVLProcessor 相关单测 |
| tests/input/multimodal/test_mm_processor.py | 新增 MMProcessor 基类与 CacheClient/流程的单测 |
| tests/input/multimodal/test_image_processors.py | 新增多种 image processor 的单测 |
| tests/input/multimodal/test_ernie4_5_vl.py | 新增 Ernie4_5VLProcessor 相关单测 |
| tests/input/multimodal/test_common.py | 新增 multimodal/common.py resize 工具单测 |
| tests/entrypoints/test_generation.py | 调整 generation/chat 行为相关测试以适配新入口/processor |
| tests/entrypoints/test_chat.py | 调整 chat 工具测试,改为 hook process_messages 捕获 prompt_tokens |
| fastdeploy/input/processor.py | 新增统一 Processor 实现(文本+可选多模态) |
| fastdeploy/input/preprocess.py | 重构 processor 创建流程:统一 Processor + 挂载 mm_processor |
| fastdeploy/input/multimodal/qwen3_vl.py | 新增 Qwen3VLProcessor(基于 QwenVLProcessor) |
| fastdeploy/input/multimodal/qwen_vl.py | 新增/迁移 QwenVLProcessor 多模态实现 |
| fastdeploy/input/multimodal/paddleocr_vl.py | 新增 PaddleOCRVLProcessor(覆盖 token/vit/video sampling 等差异) |
| fastdeploy/input/multimodal/mm_processor.py | 新增 MMProcessor 抽象基类与缓存/打包/编排流程 |
| fastdeploy/input/multimodal/image_processors/qwen3.py | 新增 Qwen3ImageProcessor 默认参数实现 |
| fastdeploy/input/multimodal/image_processors/qwen.py | 迁移/实现 QwenImageProcessor |
| fastdeploy/input/multimodal/image_processors/paddleocr.py | 新增 PaddleOCRImageProcessor |
| fastdeploy/input/multimodal/image_processors/ernie.py | 迁移/实现 ERNIE AdaptiveImageProcessor(含视频/图片处理) |
| fastdeploy/input/multimodal/image_processors/init.py | 导出各 image processor 符号 |
| fastdeploy/input/multimodal/ernie4_5_vl.py | 新增 Ernie4_5VLProcessor 实现 |
| fastdeploy/input/multimodal/common.py | 新增多模态通用 resize/scale 工具函数 |
| fastdeploy/input/multimodal/init.py | 导出 multimodal processors API |
| fastdeploy/entrypoints/llm.py | 调整 chat 入参到 _add_request 的传递与 prompts 类型分支支持 |
| if self.reasoning_parser: | ||
| reasoning_delta_message = self.reasoning_parser.extract_reasoning_content_streaming( | ||
| previous_texts, | ||
| previous_texts + delta_text, | ||
| delta_text, | ||
| previous_token_ids, | ||
| previous_token_ids + token_ids, | ||
| token_ids, | ||
| self.model_status_dict[req_id], | ||
| ) | ||
| if reasoning_delta_message: | ||
| reasoning_content = reasoning_delta_message.reasoning_content | ||
| reasoning_tokens = self.tokenizer.tokenize(reasoning_content) if reasoning_content else [] | ||
| response_dict["outputs"]["reasoning_token_num"] = len(reasoning_tokens) | ||
| response_dict["outputs"]["reasoning_content"] = reasoning_content or "" | ||
| response_dict["outputs"]["text"] = reasoning_delta_message.content or "" | ||
| else: | ||
| if not is_end: | ||
| response_dict["outputs"]["skipped"] = True | ||
|
|
||
| if self.tool_parser_obj: | ||
| if req_id not in self.tool_parser_dict: | ||
| self.tool_parser_dict[req_id] = self.tool_parser_obj(self.tokenizer) | ||
| tool_parser = self.tool_parser_dict[req_id] | ||
| tool_call_delta_message = tool_parser.extract_tool_calls_streaming( | ||
| previous_texts, | ||
| previous_texts + delta_text, | ||
| delta_text, | ||
| previous_token_ids, | ||
| previous_token_ids + token_ids, | ||
| token_ids, | ||
| request, |
| if prompt_token_ids[0] > self.tokenizer.vocab_size: | ||
| if not add_prefix_space: | ||
| log_request( | ||
| level=1, | ||
| message="bad_words: '{prompt}' token id {token_id} > vocab_size, skipping", | ||
| prompt=prompt, | ||
| token_id=prompt_token_ids[0], | ||
| ) | ||
| continue | ||
| if prompt_token_ids not in token_ids: | ||
| token_ids.extend(prompt_token_ids) |
| if isinstance(self.tokenizer, (LlamaTokenizer, Llama3Tokenizer)) and not self.tokenizer.pad_token_id: | ||
| return self.tokenizer.eos_token |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-19 18:29:29
📋 Review 摘要
PR 概述:重构并统一 FastDeploy 文本与多模态 Processor 预处理流水线,引入 Processor/MMProcessor 统一框架,新增 Qwen-VL / Qwen3-VL / ERNIE 4.5 VL / PaddleOCR-VL 四种多模态处理器实现。
变更范围:fastdeploy/entrypoints/、fastdeploy/input/、tests/entrypoints/、tests/input/
影响面 Tag:DataProcessor APIServer
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | fastdeploy/entrypoints/chat_utils.py:206 |
str content 直接返回原值,与 list 分支类型不一致,部分 chat template 可能逐字符遍历 |
| ❓ 疑问 | fastdeploy/input/multimodal/ernie4_5_vl.py:208 |
load_video 未检查 0 帧边界,frames[-1] 可能 IndexError |
| ❓ 疑问 | fastdeploy/entrypoints/llm.py:339 |
_add_request 新增 list 类型判断缺少兜底分支 |
⚠️ 本 PR 变更量较大(28 文件),建议按功能内聚性拆分为以下子 PR,降低审查难度和合入风险:建议拆分方案(🟡 建议):
- PR 1: [框架核心] —
fastdeploy/input/processor.py,fastdeploy/input/preprocess.py,fastdeploy/input/multimodal/__init__.py,fastdeploy/input/multimodal/mm_processor.py,fastdeploy/input/multimodal/common.py,fastdeploy/entrypoints/llm.py,fastdeploy/entrypoints/chat_utils.py,fastdeploy/input/multimodal_processor.py- PR 2: [多模态处理器实现] —
fastdeploy/input/multimodal/qwen_vl.py,fastdeploy/input/multimodal/qwen3_vl.py,fastdeploy/input/multimodal/ernie4_5_vl.py,fastdeploy/input/multimodal/paddleocr_vl.py- PR 3: [Image Processors] —
fastdeploy/input/multimodal/image_processors/目录下所有文件- PR 4: [测试] —
tests/input/,tests/entrypoints/下所有测试文件
📝 PR 规范检查
标题 [DataProcessor] Refactor and unify text/multimodal processor pipeline 格式合规,Tag 使用正确。描述结构完整,包含 Motivation / Modifications / Usage or Command / Accuracy Tests / Checklist 所有必填段落,Checklist 勾选状态合理(无需 cherry-pick,精度测试已说明理由)。✓
总体评价
代码整体质量较好,MMProcessor 抽象基类的模板方法模式设计合理,处理器框架清晰可扩展。主要需关注 chat_utils.py 的 content 类型变更对 chat template 的兼容性,以及 load_video 的零帧边界防护。建议合入前拆分 PR 并补充 chat template 兼容性验证。
| parsed_content = [] | ||
| elif isinstance(content, str): | ||
| parsed_content = [{"type": "text", "text": content}] | ||
| parsed_content = content |
There was a problem hiding this comment.
🟡 建议 parsed_content 类型不一致
原逻辑将 str 类型的 content 包装为 [{"type": "text", "text": content}],与 list 分支保持统一的 list[dict] 类型。修改后直接返回 str,导致 conversation 中各消息的 content 字段类型不一致(str / list[dict])。
若某个模型的 Jinja2 chat template 对 content 做无类型判断的 for part in content 遍历(如 {% for part in message.content %}),当 content 为 str 时会逐字符遍历,产生静默逻辑错误。
建议修复策略:在 process_messages(processor.py)中对 chat template 渲染前统一处理 str/list 类型差异,或确认所有已支持模型的 chat template 均已针对 str content 做了类型判断。
| "prompt_token_ids": prompts[i], | ||
| "request_id": request_id, | ||
| } | ||
| elif isinstance(prompts[i], list) and isinstance(prompts[i][0], dict): |
There was a problem hiding this comment.
❓ 疑问 _add_request list 类型判断无兜底分支
新增了 list[int]、list[dict] 和空列表三种 list 分支判断,但当 prompts[i] 是非空列表且 prompts[i][0] 既非 int 也非 dict 时(如误传了 list[str]),三个 elif isinstance(..., list) 分支均不匹配,会静默落入后续 isinstance(prompts[i], dict) 判断,最终触发不明确的报错。
建议在所有 list 分支之后、dict 分支之前添加:
elif isinstance(prompts[i], list):
raise ValueError(
f"prompts[{i}] is a list but first element type "
f"{type(prompts[i][0])} is not supported. "
"Expected list[int] (prompt_token_ids) or list[dict] (messages)."
)| outputs["image_type_ids"].extend([1] * t) | ||
|
|
||
| pos_ids = self._compute_3d_positions(t, h, w, outputs["cur_position"]) | ||
| outputs["position_ids"].extend(pos_ids) |
There was a problem hiding this comment.
❓ 疑问 load_video 未对 0 帧做防护
当 read_frames_paddlecodec 因视频解码失败或参数限制返回空列表时,frames[-1] 会抛出 IndexError。
建议在奇偶补帧逻辑前添加:
if len(frames) == 0:
raise ValueError(f"load_video: no frames extracted from {url}")
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7853 +/- ##
==========================================
Coverage ? 63.86%
==========================================
Files ? 475
Lines ? 66190
Branches ? 10177
==========================================
Hits ? 42273
Misses ? 21040
Partials ? 2877
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览❌ 有 1 个 Required 任务失败(
2 任务状态汇总2.1 Required任务 : 9/10 通过
2.2 可选任务 — 29/32 通过
3 失败详情(仅 required)Approval — 代码规范(置信度: 高)Approval
根因详情: 关键日志: 修复建议:
修复建议摘要: 请 xyxinyang 或 zyyzghb Approve PR 即可解除审批门限 链接: 查看日志 |
Motivation
本 PR 主要对 FastDeploy 的 Processor 体系进行了合并与重构,统一了文本模型与多模态模型的请求预处理流程,提升了代码的可维护性和扩展性。
通过本次改动,将多模态输入处理逻辑进行模块化拆分,并接入统一的
Processor框架,为后续支持和维护 Qwen2.5-VL、Qwen3-VL、ERNIE 4.5 VL、PaddleOCR-VL 等模型提供更清晰的实现基础。Modifications
Processor实现(fastdeploy/input/processor.py),整合原有文本处理流程。fastdeploy/input/preprocess.py的 processor 创建逻辑:Processorfastdeploy/input/multimodal/,包括:MMProcessor抽象基类QwenVLProcessorQwen3VLProcessorErnie4_5VLProcessorPaddleOCRVLProcessormessages:fastdeploy/entrypoints/llm.pyfastdeploy/entrypoints/chat_utils.pymessages -> prompt / multimodal_data的处理流程,减少文本和多模态路径之间的分叉逻辑。Usage or Command
可通过以下命令运行相关测试:
Accuracy Tests
本 PR 主要涉及 Processor 架构重构与多模态输入处理流程整理,不直接修改模型前向计算逻辑或算子实现。
因此本次未提供 accuracy 测试结果。
Checklist
pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.