Skip to content

fix: fix rank argument in ddp multi-node training#111

Merged
kilinchange merged 1 commit intomasterfrom
fix/ddp_multinode
Mar 9, 2026
Merged

fix: fix rank argument in ddp multi-node training#111
kilinchange merged 1 commit intomasterfrom
fix/ddp_multinode

Conversation

@Chamberlain0w0
Copy link
Contributor

修改 DDP 构造时接受 Rank 参数

@Chamberlain0w0
Copy link
Contributor Author

Llama FP32 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image
单机同规模:
image

@Chamberlain0w0
Copy link
Contributor Author

Llama BF16 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image
单机同规模:
image

@Chamberlain0w0
Copy link
Contributor Author

GPT2 FP32 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image
单机同规模:
image

@Chamberlain0w0
Copy link
Contributor Author

GPT2 BF16 精度
双机 16 卡开启 ddp+tp+sp+pp 并行训练:
image

单机同规模:
image

@kilinchange kilinchange merged commit 791c75e into master Mar 9, 2026
2 checks passed
@kilinchange kilinchange deleted the fix/ddp_multinode branch March 9, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants