[Feature] Add Deterministic Inference Support by gongweibao · Pull Request #6476 · PaddlePaddle/FastDeploy

gongweibao · 2026-02-13T13:38:33Z

[Feature] Add Deterministic Inference Support

Motivation

Implement deterministic inference support for FastDeploy to ensure reproducible results across multiple runs. Deterministic inference is critical for:

Debugging and testing models
Reproducing results in production
Ensuring consistency in distributed inference scenarios

The implementation addresses non-determinism sources in:

All-Reduce operations in Tensor Parallelism (NCCL floating-point accumulation order)
Batch-invariant operations (matrix multiplication, log_softmax, mean)
Chunked Prefill alignment
FlashAttention backend
Sampling parameters seed management
Scheduler request stealing

Modifications

Core Implementation

File	Description
`fastdeploy/envs.py`	Added `FD_DETERMINISTIC_MODE` environment variable
`fastdeploy/__init__.py`	Auto-initialize custom all-reduce in deterministic mode
`fastdeploy/distributed/communication.py`	Add deterministic mode checks and custom all-reduce integration
`fastdeploy/engine/common_engine.py`	Add deterministic mode support
`fastdeploy/engine/sampling_params.py`	Add `deterministic` parameter for sampling
`fastdeploy/engine/sched/resource_manager_v1.py`	Add deterministic alignment logic
`fastdeploy/model_executor/layers/attention/flash_attn_backend.py`	Add deterministic mode support for FlashAttention
`fastdeploy/model_executor/layers/batch_invariant_ops/batch_invariant_ops.py`	Enhance batch-invariant operations
`fastdeploy/model_executor/models/qwen2.py`	Add deterministic support for Qwen2 model
`fastdeploy/scheduler/splitwise_scheduler.py`	Remove random request stealing in deterministic mode
`fastdeploy/worker/gpu_model_runner.py`	Add deterministic mode handling

Key Features

Custom All-Reduce for Deterministic TP: Forces custom all-reduce in deterministic mode with fixed accumulation order (unlike NCCL's dynamic algorithm)
Batch-Invariant Operations: Triton-based implementations for matmul, log_softmax, and mean
Chunked Prefill Alignment: Ensures truncation points align with split_kv_size integer multiples
Deterministic Sampling: Seed-based sampling for reproducible results
Error Handling: Explicit RuntimeErrors when deterministic requirements cannot be met

Usage or Command

Enable Deterministic Mode

export FD_DETERMINISTIC_MODE=1

Run Inference with Determinism

from fastdeploy import LLM
llm = LLM(...)
result = llm.generate(...)  # Automatically uses deterministic all-reduce

Run Tests

# All-reduce determinism test (requires 2+ GPUs)
python -m paddle.distributed.launch --gpus=0,1 tests/distributed/test_deterministic_all_reduce.py

# Batch-invariant operations test
python tests/batch_invariant_ops/test_batch_invariant_ops.py

# Sampling parameters determinism test
python tests/engine/test_sampling_params_determinism.py

Accuracy Tests

All unit tests pass:

Batch-invariant operations: 8 tests, 100% pass rate
Cache manager: 90 tests, 100% pass rate
Sampling parameters: 50 tests, 100% pass rate
Scheduler (local/dp): 42 tests, 100% pass rate
All-reduce determinism: Verified deterministic for float32/float16/bfloat16

Determinism Verification Results

======================================================================
Summary
======================================================================
Data Type       | Custom AR Deterministic   | NCCL Deterministic
----------------------------------------------------------------------
float32         | YES                      | NO
float16         | YES                      | NO
bfloat16        | YES                      | NO
======================================================================
Custom All-Reduce is deterministic for all supported types!
======================================================================

Checklist

Add at least a tag in the PR title.
Format your code, run pre-commit before commit.
Add unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch first.

CLAassistant · 2026-02-13T13:38:40Z

All committers have signed the CLA.

paddle-bot · 2026-02-13T13:38:45Z

Thanks for your contribution!

…manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov-commenter · 2026-02-24T05:36:43Z

Codecov Report

❌ Patch coverage is 33.67876% with 128 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@1405d7d). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/gpu_model_runner.py	10.61%	95 Missing and 6 partials ⚠️
fastdeploy/distributed/communication.py	50.00%	10 Missing and 2 partials ⚠️
fastdeploy/engine/sched/resource_manager_v1.py	86.66%	0 Missing and 4 partials ⚠️
fastdeploy/worker/worker_process.py	0.00%	3 Missing and 1 partial ⚠️
fastdeploy/worker/input_batch.py	25.00%	3 Missing ⚠️
fastdeploy/envs.py	50.00%	1 Missing and 1 partial ⚠️
.../layers/batch_invariant_ops/batch_invariant_ops.py	80.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #6476   +/-   ##
==========================================
  Coverage           ?   69.61%           
==========================================
  Files              ?      392           
  Lines              ?    53572           
  Branches           ?     8410           
==========================================
  Hits               ?    37293           
  Misses             ?    13552           
  Partials           ?     2727

Flag	Coverage Δ
GPU	`69.61% <33.67%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…into deter

Copilot

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 9 comments.

tests/ce/deterministic/test_determinism_verification.py

tests/distributed/test_allreduce_deterministic_launch.py

tests/inter_communicator/test_ipc_signal.py

tests/layers/test_flash_attention_versions_determinism.py

tests/ce/deterministic/test_determinism_verification.py

tests/engine/test_sampling_params_determinism.py

tests/layers/test_paddle_attention_determinism.py

tests/layers/test_paddle_attention_determinism_standalone.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

add

ed9e9dc

gongweibao had a problem deploying to Metax_ci February 13, 2026 13:38 — with GitHub Actions Failure

[tests] Add Paddle attention determinism tests and refactor resource …

4d849b8

…manager Add comprehensive determinism tests for Paddle attention layer and refactor resource manager for deterministic mode support. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:14 — with GitHub Actions Failure

add

359fac9

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:35 — with GitHub Actions Failure

add

390d7f5

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:45 — with GitHub Actions Failure

add

89736c8

gongweibao had a problem deploying to Metax_ci February 13, 2026 14:58 — with GitHub Actions Failure

add

8338824

gongweibao had a problem deploying to Metax_ci February 13, 2026 15:01 — with GitHub Actions Failure

add more

957e876

gongweibao had a problem deploying to Metax_ci February 14, 2026 06:35 — with GitHub Actions Failure

add more

d5f3586

gongweibao had a problem deploying to Metax_ci February 14, 2026 09:51 — with GitHub Actions Failure

merge

2949313

gongweibao temporarily deployed to Metax_ci February 24, 2026 03:04 — with GitHub Actions Inactive

fixsome

9c0a374

gongweibao had a problem deploying to Metax_ci February 24, 2026 03:54 — with GitHub Actions Error

fixsome

47e44eb

gongweibao temporarily deployed to Metax_ci February 24, 2026 04:02 — with GitHub Actions Inactive

fix bugs

860eb5d

gongweibao temporarily deployed to Metax_ci February 24, 2026 07:15 — with GitHub Actions Inactive

gongweibao added 2 commits February 24, 2026 15:44

fix bugs

9a86812

only in gpu

ff23487

gongweibao temporarily deployed to Metax_ci February 24, 2026 08:11 — with GitHub Actions Inactive

gongweibao added 2 commits February 25, 2026 08:50

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

d60e1e0

…into deter

fix comments

6664437

gongweibao had a problem deploying to Metax_ci February 25, 2026 03:51 — with GitHub Actions Error

gongweibao added 2 commits February 25, 2026 11:53

Merge branch 'develop' of https://github.com/PaddlePaddle/FastDeploy …

5803832

…into deter

fix comments

70eaf28

gongweibao temporarily deployed to Metax_ci February 25, 2026 04:04 — with GitHub Actions Inactive

gongweibao requested a review from Copilot February 25, 2026 04:08

Copilot started reviewing on behalf of gongweibao February 25, 2026 04:08 View session

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Update tests/ce/deterministic/test_determinism_verification.py

f64f631

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongweibao had a problem deploying to Metax_ci February 25, 2026 04:37 — with GitHub Actions Error

Update tests/inter_communicator/test_ipc_signal.py

57ec6bd

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongweibao had a problem deploying to Metax_ci February 25, 2026 04:37 — with GitHub Actions Error

Update tests/layers/test_paddle_attention_determinism.py

947ee5a

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongweibao had a problem deploying to Metax_ci February 25, 2026 04:38 — with GitHub Actions Error

Update tests/engine/test_sampling_params_determinism.py

0571b37

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongweibao had a problem deploying to Metax_ci February 25, 2026 04:38 — with GitHub Actions Error

Update tests/layers/test_paddle_attention_determinism.py

33c90ee

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongweibao had a problem deploying to Metax_ci February 25, 2026 04:39 — with GitHub Actions Error

Update tests/layers/test_paddle_attention_determinism_standalone.py

665a944

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

gongweibao temporarily deployed to Metax_ci February 25, 2026 04:39 — with GitHub Actions Inactive

fix comments

2ad6a84

gongweibao temporarily deployed to Metax_ci February 25, 2026 04:49 — with GitHub Actions Inactive

fix import error

9c16052

gongweibao temporarily deployed to Metax_ci February 25, 2026 09:28 — with GitHub Actions Inactive

gongweibao added 2 commits February 25, 2026 20:11

fix a bug

fbf13c6

fix bugs

8a0879f

gongweibao had a problem deploying to Metax_ci February 25, 2026 13:01 — with GitHub Actions Error

fix bugs

52db46a

gongweibao temporarily deployed to Metax_ci February 25, 2026 13:05 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Deterministic Inference Support#6476

[Feature] Add Deterministic Inference Support#6476
gongweibao wants to merge 41 commits intoPaddlePaddle:developfrom
gongweibao:deter

gongweibao commented Feb 13, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Feb 13, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Feb 13, 2026

Uh oh!

codecov-commenter commented Feb 24, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gongweibao commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[Feature] Add Deterministic Inference Support

Motivation

Modifications

Core Implementation

Key Features

Usage or Command

Enable Deterministic Mode

Run Inference with Determinism

Run Tests

Accuracy Tests

Determinism Verification Results

Checklist

Uh oh!

CLAassistant commented Feb 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Feb 13, 2026

Uh oh!

codecov-commenter commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

gongweibao commented Feb 13, 2026 •

edited

Loading

CLAassistant commented Feb 13, 2026 •

edited

Loading

codecov-commenter commented Feb 24, 2026 •

edited

Loading