Skip to content

Add kernel syscall-prune build cycle with cached profile reuse#16

Merged
jserv merged 1 commit intomainfrom
syscall
May 1, 2026
Merged

Add kernel syscall-prune build cycle with cached profile reuse#16
jserv merged 1 commit intomainfrom
syscall

Conversation

@jserv
Copy link
Copy Markdown
Owner

@jserv jserv commented May 1, 2026

The existing PGO cycle bundles config-only, syscall-prune, and layout-ordering candidates into a single rebuild loop. When all we want is the syscall-prune verdict (e.g. tightening the prune table after a userspace change), paying for the layout-ordering detour is wasteful, and the QEMU trace + analyze step gets re-run on every invocation.

build.sh grows kernel_syscall_prune_cycle alongside the existing kernel_pgo_cycle: same primitives (build_candidate_kernel, boot_not_regressed, restore_kernel_artifacts), but only baseline / config-only / syscall-prune candidates, defaulting QEMU_LOG to exec,cpu,in_asm so the R7 dumps needed for syscall extraction are present. The stage selects the smallest linux.axf that does not regress shell_ready_ms, mirroring the existing cycle's selection rule.

prepare_kernel_profile_analysis extracts the trace-collect+analyze pair into a shared helper keyed on (IMAGE_FP, sha256(workload), trace selector). Both cycles share the namespace but resolve to distinct cache directories via the trace tag, so PGO's exec,in_asm trace and the new cycle's exec,cpu,in_asm trace do not collide. IMAGE_FP already covers scripts/, configs/, patches/, and build.sh, so any change that affects the baseline binary or the analysis tooling invalidates the cache.

materialize_cache_tree uses cp -f rather than hardlinks: link_cached_tree exposes cache files via symlinks in working dirs, and a hardlink would let any future writer that follows that symlink mutate the cache through a shared inode. cp -f makes the cache the canonical copy.

scripts/qemu-trace-to-orderfile.py rewrites the syscall pairing state machine to match QEMU's actual -d cpu ordering: register dumps precede the TB whose entry state they capture. The previous code kept a backward-binding fallback that, when a TB without a preceding regdump showed up (orphan TB), bound the next R07 to the orphan instead of holding it for the upcoming TB. In a real trace this silently misattributes or drops syscalls and feeds a wrong prune table. pending_tb is dropped entirely; an R07 always binds forward to the next TB or stays pending until one arrives.

scripts/test_qemu_trace_to_orderfile.py covers the happy path, the orphan-TB regression (which fails under the old backward-binding), and the MAX_ARM_SYSCALL boundary at 511/512.

Three Python script invocations gain an explicit python3 prefix. generate-syscall-prune-table.py is committed as 100644, so calling it via the bare path raised Permission denied; the other two are defensive consistency.

The existing PGO cycle bundles config-only, syscall-prune, and
layout-ordering candidates into a single rebuild loop.  When all
we want is the syscall-prune verdict (e.g. tightening the prune
table after a userspace change), paying for the layout-ordering
detour is wasteful, and the QEMU trace + analyze step gets re-run
on every invocation.

build.sh grows kernel_syscall_prune_cycle alongside the existing
kernel_pgo_cycle: same primitives (build_candidate_kernel,
boot_not_regressed, restore_kernel_artifacts), but only baseline /
config-only / syscall-prune candidates, defaulting QEMU_LOG to
exec,cpu,in_asm so the R7 dumps needed for syscall extraction are
present.  The stage selects the smallest linux.axf that does not
regress shell_ready_ms, mirroring the existing cycle's selection
rule.

prepare_kernel_profile_analysis extracts the trace-collect+analyze
pair into a shared helper keyed on (IMAGE_FP, sha256(workload),
trace selector).  Both cycles share the namespace but resolve to
distinct cache directories via the trace tag, so PGO's exec,in_asm
trace and the new cycle's exec,cpu,in_asm trace do not collide.
IMAGE_FP already covers scripts/, configs/, patches/, and build.sh,
so any change that affects the baseline binary or the analysis
tooling invalidates the cache.

materialize_cache_tree uses cp -f rather than hardlinks:
link_cached_tree exposes cache files via symlinks in working dirs,
and a hardlink would let any future writer that follows that
symlink mutate the cache through a shared inode.  cp -f makes the
cache the canonical copy.

scripts/qemu-trace-to-orderfile.py rewrites the syscall pairing
state machine to match QEMU's actual -d cpu ordering: register
dumps precede the TB whose entry state they capture.  The previous
code kept a backward-binding fallback that, when a TB without a
preceding regdump showed up (orphan TB), bound the next R07 to the
orphan instead of holding it for the upcoming TB.  In a real trace
this silently misattributes or drops syscalls and feeds a wrong
prune table.  pending_tb is dropped entirely; an R07 always binds
forward to the next TB or stays pending until one arrives.

scripts/test_qemu_trace_to_orderfile.py covers the happy path, the
orphan-TB regression (which fails under the old backward-binding),
and the MAX_ARM_SYSCALL boundary at 511/512.

Three Python script invocations gain an explicit python3 prefix.
generate-syscall-prune-table.py is committed as 100644, so calling
it via the bare path raised Permission denied; the other two are
defensive consistency.
@jserv jserv merged commit 68c0004 into main May 1, 2026
4 checks passed
@jserv jserv deleted the syscall branch May 1, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant