Skip to content

add mha_kvcache#261

Open
Susskind115 wants to merge 2 commits intoInfiniTensor:mainfrom
Susskind115:feature/mha_kvcache
Open

add mha_kvcache#261
Susskind115 wants to merge 2 commits intoInfiniTensor:mainfrom
Susskind115:feature/mha_kvcache

Conversation

@Susskind115
Copy link

Benchmark Command

python bench_e2e_backend_compare.py \
  --nvidia \
  --model /data-aisoft/zhujianian/Uneed/Uneed/huggingface_download/9G7B_MHA \
  --batch-size 1

Result

Configuration

  • Model: 9G7B MHA
  • Batch size: 1
  • Mode: paged-attn
  • Paired with the corresponding InfiniCore PR
Input Len TTFT FA2 (ms) TTFT InfiniOp (ms) Ratio ITL FA2 (ms) ITL InfiniOp (ms) Ratio Decode FA2 Decode InfiniOp
64 15.3 15.0 1.023x ← INF 11.43 12.29 0.93x ← FA2 87.5 81.4
256 24.9 28.5 0.87x ← FA2 11.63 12.54 0.92x ← FA2 86.0 79.7
512 45.5 60.1 0.76x ← FA2 11.61 12.76 0.91x ← FA2 86.1 78.3
1024 87.3 141.3 0.62x ← FA2 11.84 13.28 0.89x ← FA2 84.5 75.3
2048 167.0 376.0 0.44x ← FA2 12.07 14.37 0.84x ← FA2 82.8 69.6
4096 337.5 1168.6 0.29x ← FA2 12.63 16.56 0.76x ← FA2 79.2 60.4

Ratio < 1.0 → FA2 faster
Ratio > 1.0 → InfiniOp faster

@Susskind115 Susskind115 requested review from a team and wooway777 March 9, 2026 04:36
@wooway777 wooway777 requested a review from pengcheng888 March 9, 2026 12:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant