-
Notifications
You must be signed in to change notification settings - Fork 179
[WIP] Chore/agentx v0.3 #1571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
cquil11
wants to merge
148
commits into
main
Choose a base branch
from
chore/agentx-v0.3
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
[WIP] Chore/agentx v0.3 #1571
Changes from all commits
Commits
Show all changes
148 commits
Select commit
Hold shift + click to select a range
846af91
mi355x kimi-fp4 agentic: switch from SimpleCPUOffloadConnector to Off…
cquil11 9996180
dsv4-fp4-b200-vllm-agentic: bump image to cquil v0.21.0 custom build
cquil11 aae82c0
Add dsv4-fp4-mi355x-sglang-agentic config + launcher
cquil11 8feadd4
dsv4-fp4-b200-vllm-agentic: drop docker.io/ prefix from image
cquil11 5e1ca4e
Add dsv4-fp4-gb300-dynamo-vllm-agentic with local-recipe overlay
cquil11 3eb9cbf
gb300 agentic recipes: quote PORT as string for fork srtctl schema
cquil11 7b3756e
launch_gb300-cw.sh: mirror IS_AGENTIC branch from launch_gb300-nv.sh
cquil11 2ae4bf9
gb300 agentic launchers: use upstream NVIDIA/srt-slurm + fix venv pip
cquil11 b858480
gb300 launchers: use real upstream srt-slurm SHA (was fabricated)
cquil11 893f5b8
gb300 agentic: strip chat parser flags from worker config + harden cw…
cquil11 43b3a05
gb300-nv launcher: point dsv4 MODEL_PATH at the real shared NFS path
cquil11 4195071
gb300-nv launcher: switch dsv4 MODEL_PATH to /data/ mount to dodge NF…
cquil11 948eaa5
gb300 agentic launchers: pin to fork branch with --mem=0 patch
cquil11 a3512cb
gb300-nv launcher: move squash files to /data/ mount (same NFS ELOOP)
cquil11 52af9d4
gb300 agentic: set --mem=0 via recipe srun_options (canonical mechanism)
cquil11 3274dea
gb300 agentic: add sbatch_directives.mem=0 (the missing layer)
cquil11 92d2738
gb300 agentic: add sbatch_directives.cpus-per-task=72 (fix etcd starv…
cquil11 1614e7f
gb300 agentic: pin to nv-only + try /scratch model path
cquil11 4ff2e50
gb300-nv agentic: clone cquil11 fork + pass --no-preflight
cquil11 a3d946c
gb300 agentic: wire aiperf mmap dataset cache
cquil11 7530760
bump aiperf submodule: sync with ai-dynamo/aiperf PR #875
cquil11 0678059
agentic: install git on-demand for aiperf editable install
cquil11 62ef027
agentic: switch to no-subagents loader + sudo git install for non-roo…
cquil11 18bc0bc
agentic: drop -e from aiperf install (sidesteps git + userns-remap)
cquil11 ea13e41
agentic: simplify git install to bare apt-get update && install; keep -e
cquil11 3f4b095
gb300-nv agentic: add srun_options.container-remap-root
cquil11 482348c
gb300-nv launcher: bump srt-slurm SHA to include benchmark_stage fix
cquil11 dac50f7
bump aiperf submodule: hang fix on cancel path
cquil11 e575981
runners(gb300): snapshot server-log tarball on script EXIT (handle ca…
cquil11 609b74d
agentic: bump --failed-request-threshold 0.05 -> 0.20
cquil11 afacd5b
bump aiperf submodule: quieter warnings + tqdm in non-tty
cquil11 4c9a4b5
benchmark_lib: disable failed-request threshold (1.0) for capacity-bo…
cquil11 b2ffd9b
launch_gb300-nv: snapshot server logs BEFORE rm -rf outputs
cquil11 48f151e
bump aiperf to a6812b03: fix UIType.TQDM crash
cquil11 a4ee9a7
bump aiperf to 2f30ea86: revert TQDM + warning-downgrade changes
cquil11 329d168
agentic recipes: raise NATS max_payload from 1MiB default to 32MiB
cquil11 f8b85c9
bump aiperf to 61a9ed80: per-lane start-token counts in TrajectorySou…
cquil11 fa28004
add dsv4-fp4-gb300-cw-dynamo-vllm-agentic — CoreWeave sibling config
cquil11 4a46881
bump aiperf to a2b9d6b5: cc-traces dataset 051226 -> 051826 (98 traces)
cquil11 20d4dd8
bump aiperf to 90c93aba: revert per-lane start-token logging
cquil11 21f71b6
bump aiperf to a61553fd: drop preemptions from realtime log
cquil11 6d10eaf
b200/b300 vllm-agentic: no-offload curves vs new cc-traces 051826
cquil11 2ce6131
launch_b300-nv: drop nonexistent b300-020 from salloc nodelist
cquil11 c2c04df
launch_b300-nv: add --container-remap-root to enable apt-get inside c…
cquil11 a70f1ba
remove utils/trace-replay submodule
cquil11 6558228
remove trace-replay references; standardize on aiperf_artifacts
cquil11 3af753b
update aiperf submodule branch tracking to cjq/agentx-v0.3
cquil11 aa7348f
track aiperf submodule on cjq/agentx-v0.3-subagents
cquil11 3273663
chore: update aiperf tiered subagent joins
cquil11 1722c11
chore: update aiperf tiered join docs
cquil11 9d94969
chore: update aiperf idle gap cap
cquil11 dc35e35
chore: update aiperf idle gap cap precedence
cquil11 da16c0b
chore: update aiperf idle gap semantics
cquil11 bd290a0
chore: update aiperf join examples
cquil11 9ea7370
benchmarks(agentic): switch to with-subagents corpus + idle-gap cap
cquil11 a2707d4
benchmarks(agentic): trim workload distribution analyzer to ISL/OSL only
cquil11 8a267b7
benchmarks(agentic): restore generate_aiperf_plots.py for server-metr…
cquil11 a258f90
benchmarks(agentic): drop conc=96,128 from b200 dsv4 vllm agentic sweep
cquil11 d79bc5f
benchmarks(agentic): fix generate_aiperf_plots.py artifact dir lookup
cquil11 5c15fa9
chore: bump aiperf submodule to de702eaf (mmap cache hardlink)
cquil11 c149b9d
feat: add lmcache mp agentic offload
cquil11 ed79577
fix: run lmcache on dsv4 tep agentic
cquil11 01ed357
fix: clean lmcache agentic startup logs
cquil11 21ed1eb
fix: disable lmcache dsv4 offload
cquil11 907ad2e
switch to native offloading
cquil11 4abc590
switch to native offloading
cquil11 3d7bfe2
fix: size native dsv4 offload to 2.8tb
cquil11 ad505ff
switch to native offloading
cquil11 1cede80
switch to native offloading
cquil11 99cd035
benchmarks(agentic): drop dsv4 b200 native offload from 2.8TB to 1.2TB
cquil11 b07bd58
feat(agentic): add Kimi LMCache offload coverage
cquil11 327c4d9
feat(agentic): add Qwen SGLang HiCache starts
cquil11 6b87f49
fix(agentic): size SGLang HiCache per rank
cquil11 9e6e81a
fix(agentic): tune B300 HiCache defaults
cquil11 1a300d3
fix(agentic): tune MI355X HiCache defaults
cquil11 859aec5
fix(agentic): cap Kimi LMCache CPU pool per rank
cquil11 5398ba9
fix(agentic): cap MI355X HiCache per-rank memory
cquil11 9a8f89c
fix(agentic): skip MI355X HiCache server warmup
cquil11 9f3fb05
fix(agentic): skip non-finite SGLang metrics
cquil11 5ebf81f
fix(agentic): size Qwen HiCache host pools
cquil11 dbfbd56
fix(agentic): cap replay contexts to server window
cquil11 8fa3c96
fix(agentic): cap MI355X HiCache graph capture
cquil11 83fa8ec
fix(matrix): apply runner filter to agentic configs
cquil11 f999fef
fix(config): use registered MI355X runner labels
cquil11 afaec72
fix(agentic): use direct HiCache copies for Qwen MI355X
cquil11 1e730d7
mi355x qwen sgl offload
cquil11 e29fb3b
fix(agentic): use LMCache MP for Kimi B200
cquil11 bb64d3e
feat(agentic): add LMCache MP for Kimi MI355X
cquil11 91b24b5
mi355x qwen sgl offload
cquil11 5a3cd6a
fix(agentic): avoid CUDA NIXL import on MI355X LMCache
cquil11 4fec279
chore: bump aiperf submodule to 5b3db5a2 (merge PR #2)
cquil11 4a51237
fix(agentic): fail replay above 10 percent request errors
cquil11 4aeb164
benchmarks(agentic): retarget HF dataset constants to with-subagents-…
cquil11 36cb524
fix(agentic): propagate replay failures
cquil11 10222f4
fix(agentic): remove CUDA LMCache deps on ROCm
cquil11 8f01cb4
fix(agentic): keep LMCache cupy deps on ROCm
cquil11 265fc75
fix(agentic): use ROCm CuPy for Kimi LMCache MP
cquil11 f34e024
fix(agentic): add ROCm LMCache MP block fallback
cquil11 20d6508
fix(agentic): defer ROCm LMCache pinned expansion
cquil11 0103241
fix(agentic): lazily patch ROCm LMCache allocator
cquil11 5db2668
fix(agentic): avoid partial LMCache import patching
cquil11 5819b31
fix(agentic): filter Kimi MI355X replay context
cquil11 165d41c
fix(agentic): normalize Kimi MI355X max context
cquil11 229d541
fix(agentic): update AIPerf replay metadata
cquil11 81fd6bf
fix(agentic): refresh AIPerf mmap cache schema
cquil11 e80a843
fix(agentic): carry AIPerf prefix metadata
cquil11 69cdbc2
fix(agentic): use final LMCache capacity on ROCm
cquil11 03a85ab
fix(agentic): extend Kimi MI355X LMCache read lease
cquil11 4941697
feat: Kimi-K2.5-MXFP4 LMCache MP offloading for MI355X agentic benchm…
andyluo7 9e41c1a
chore(agentx): update aiperf prefix cache metric
cquil11 380dcd7
fix(agentx): refresh aiperf mmap cache schema
cquil11 bc41a72
fix(agentx): carry prefix counts into mmap metadata
cquil11 60fcd42
fix(agentx): default to pre-canned assistant replay
cquil11 81d381d
dsv4
cquil11 8724609
dsv4
cquil11 06c606a
dsv4
cquil11 7ad7dd4
fix(agentx): update aiperf realtime cache metrics
cquil11 5403a6b
testing kimi
cquil11 1c6d297
testing kimi
cquil11 acc2c73
chore(aiperf): bump submodule for unique_in_srv realtime metric
cquil11 967c50c
runners(h200-dgxc-slurm): remap container UID to root to match b200-dgxc
cquil11 4be3ef0
fix(agentx): re-enable weka live assistant replay
cquil11 8eec0d4
benchmarks(single_node): move fixed-seq-len scripts into fixed_seq_le…
cquil11 f89cdfe
Merge origin/main into chore/agentx-v0.3 with fixed_seq_len/ reorg fi…
cquil11 049a873
chore: update agentx v0.3 aiperf
cquil11 711cb85
chore: update agentx weka dataset
cquil11 284cfa5
chore: update agentx snapshot logging
cquil11 1b41cd0
benchmarks: drop redundant ${VAR:-default} defaults from recipe scripts
cquil11 a98fcaa
runners(h200-{nb,cw}): wire AIPERF mmap cache mount + env
cquil11 e1e4d44
benchmarks(agentic): add WEKA_LOADER_OVERRIDE; switch minimax to 256k…
cquil11 4e62c59
benchmarks: retarget WEKA_LOADER_OVERRIDE 256k variant to 052726-256k
cquil11 eab58e9
utils(proxy_to_weka): drop exact-duplicate rows in load_session_rows
cquil11 88a1153
nvidia-master(kimik2.5-fp4-b200-vllm-agentic): bump vLLM v0.20.2 -> v…
cquil11 72cf856
feat(agentic): add qwen3.5-fp8-h100-sglang-agentic recipe
cquil11 3406355
runners(h100-dgxc-slurm): wire AIPERF mmap cache mount + env
cquil11 4933cf3
chore(aiperf): bump submodule for SGLang realtime srv-row fallbacks
cquil11 6d884b9
chore(aiperf): bump submodule for _total counter-lookup fix
cquil11 77e648d
agentic(sglang): drop --disable-radix-cache from every recipe
cquil11 b27295c
chore(aiperf): bump submodule for SGLang counter-pair cache hit rate
cquil11 842a0cf
testing qwen
cquil11 5d10625
testing qwen
cquil11 717385a
testing qwen
cquil11 6a77acb
testing qwen
cquil11 c00454e
chore(aiperf): bump submodule for weka_trace id()-keyed dict fix
cquil11 ae8ba76
chore(aiperf): bump submodule for parallel reconstruction dropped-sub…
cquil11 bcf338c
chore(aiperf): bump submodule for mmap-cache stale-lock bypass
cquil11 0e8ac92
testing qwen
cquil11 57fdef7
chore(aiperf): bump submodule for snapshot warmup fix
cquil11 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.