add mha_kvcache by Susskind115 · Pull Request #261 · InfiniTensor/InfiniLM

Susskind115 · 2026-03-09T04:36:45Z

Benchmark Command

python bench_e2e_backend_compare.py \
  --nvidia \
  --model /data-aisoft/zhujianian/Uneed/Uneed/huggingface_download/9G7B_MHA \
  --batch-size 1

Configuration

Input Len	TTFT FA2 (ms)	TTFT InfiniOp (ms)	Ratio	ITL FA2 (ms)	ITL InfiniOp (ms)	Ratio	Decode FA2	Decode InfiniOp
64	15.3	15.0	1.023x ← INF	11.43	12.29	0.93x ← FA2	87.5	81.4
256	24.9	28.5	0.87x ← FA2	11.63	12.54	0.92x ← FA2	86.0	79.7
512	45.5	60.1	0.76x ← FA2	11.61	12.76	0.91x ← FA2	86.1	78.3
1024	87.3	141.3	0.62x ← FA2	11.84	13.28	0.89x ← FA2	84.5	75.3
2048	167.0	376.0	0.44x ← FA2	12.07	14.37	0.84x ← FA2	82.8	69.6
4096	337.5	1168.6	0.29x ← FA2	12.63	16.56	0.76x ← FA2	79.2	60.4

Ratio < 1.0 → FA2 faster
Ratio > 1.0 → InfiniOp faster

add mha_kvcache

d0a3a09

Susskind115 requested review from a team and wooway777 March 9, 2026 04:36

repair gqa-api bug

b98aab1

wooway777 requested a review from pengcheng888 March 9, 2026 12:10