Skip to content

add mha_kvcache#1062

Open
Susskind115 wants to merge 1 commit intoInfiniTensor:mainfrom
Susskind115:feature/mha_kvcache
Open

add mha_kvcache#1062
Susskind115 wants to merge 1 commit intoInfiniTensor:mainfrom
Susskind115:feature/mha_kvcache

Conversation

@Susskind115
Copy link

@Susskind115 Susskind115 commented Mar 9, 2026

Benchmark Command

python bench_e2e_backend_compare.py \
  --nvidia \
  --model /data-aisoft/zhujianian/Uneed/Uneed/huggingface_download/9G7B_MHA \
  --batch-size 1

Result

Configuration

  • Model: 9G7B MHA
  • Batch size: 1
  • Mode: paged-attn
Input Len TTFT FA2 (ms) TTFT InfiniOp (ms) Ratio ITL FA2 (ms) ITL InfiniOp (ms) Ratio Decode FA2 Decode InfiniOp
64 15.3 15.0 1.023x ← INF 11.43 12.29 0.93x ← FA2 87.5 81.4
256 24.9 28.5 0.87x ← FA2 11.63 12.54 0.92x ← FA2 86.0 79.7
512 45.5 60.1 0.76x ← FA2 11.61 12.76 0.91x ← FA2 86.1 78.3
1024 87.3 141.3 0.62x ← FA2 11.84 13.28 0.89x ← FA2 84.5 75.3
2048 167.0 376.0 0.44x ← FA2 12.07 14.37 0.84x ← FA2 82.8 69.6
4096 337.5 1168.6 0.29x ← FA2 12.63 16.56 0.76x ← FA2 79.2 60.4

Ratio < 1.0 → FA2 faster
Ratio > 1.0 → InfiniOp faster

@Susskind115 Susskind115 requested review from a team and wooway777 March 9, 2026 04:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant