From 245b950e89a7531d092cd855a2214ae0cd8099be Mon Sep 17 00:00:00 2001 From: Jim Huang Date: Fri, 1 May 2026 20:21:54 +0800 Subject: [PATCH] Strip residual sched/time surface after fair-tiny 6 out-of-tree patches stack on top of 0014 to land another 16,416 bytes of linux.axf (-14,528 of resident .text): 0015 SCHED_TOPOLOGY_MINIMAL (default n; set y) Wraps kernel/sched/topology.c in #ifndef and substitutes a minimal stub: def_root_domain + init_rootdomain + init_defrootdomain + rq_attach_root + sched_get/put_rd + sched_domains_mutex helpers + empty sched_init_domains and partition_sched_domains. Drops sched_domain construction, NUMA, perf-domain, asym-capacity, and topology debug. topology.c 5,028 -> 70 B / 23 -> 6 syms. 0016 SCHED_NO_RICH_API (default n; set y) Gates 11 SYSCALL_DEFINEs in kernel/sched/syscalls.c (sched_set/getparam, set/getscheduler, set/getattr, set/getaffinity, get_priority_{max,min}, rr_get_interval). sys_ni.c gains COND_SYSCALL aliases so each slot returns -ENOSYS. Internal helpers (sched_set_fifo*, sched_setscheduler_nocheck, sched_setattr_nocheck, sched_setaffinity, sched_getaffinity, __sched_setaffinity, dl_task_check_affinity) stay live for RCU / kthread / compat callers. sched_yield (158) and nice (34) unaffected. syscalls.c 3,192 -> 2,068 B / 41 -> 21 syms. 0017 POSIX_CPU_TIMERS (default y; set n) Wraps kernel/time/posix-cpu-timers.c in #ifdef and substitutes k_clock dispatch tables for clock_process / clock_thread / clock_posix_cpu that return -EINVAL via four shared stub methods, plus no-op stubs for posix_cputimers_group_init, run_posix_cpu_timers, update_rlimit_cpu, thread_group_sample_cputime, set_process_cpu_timer, posix_cpu_timers_exit{,_group}. Effective syscall impact: clock_gettime / clock_getres / clock_nanosleep on CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID return -EINVAL; setrlimit(RLIMIT_CPU) and ITIMER_PROF / ITIMER_VIRTUAL become silent no-ops. posix-cpu-timers.c 3,722 -> 66 B / 42 -> 11 syms. 0018 SCHED_PELT_RT_MINI (default n; set y) Stubs CFS-side (__update_load_avg_blocked_se / _se / _cfs_rq) and DL-side (update_dl_rq_load_avg) PELT entry points in kernel/sched/pelt.c to return 0; update_other_load_avgs collapses to a thin call into update_rt_rq_load_avg. Helpers (decay_load, __accumulate_pelt_segments, accumulate_sum, ___update_load_sum, ___update_load_avg, runnable_avg_yN_inv) stay live for rt.c. Combined with 0019, all of pelt.c becomes LTO-strippable. pelt.c 1,758 -> 470 B / 8 -> 7 syms. 0019 SCHED_RT_TINY (default n; set y) Wraps kernel/sched/rt.c in #ifndef and substitutes a fixed- priority FIFO class reusing the existing rt_rq->active priority-bitmap + per-priority list_head data structure. Drops SCHED_RR slice rotation (RR collapses to FIFO), RT bandwidth period/runtime timer, throttle, push/pull migration (UP-only target makes the migration surface dead), cpupri-driven find_lowest_rq, and sched_rt/rr_handler sysctl writers. Cross-priority preemption stays via resched_curr; RT > fair preemption stays via the class-chain walk. get_rr_interval_rt returns 0 for SCHED_FIFO (mainline contract). rt.c 3,400 -> 606 B / 34 -> 16 syms. 0020 TIME_NO_SET_WALLCLOCK (default n; set y) Short-circuits do_settimeofday64, do_adjtimex, and timekeeping_warp_clock in kernel/time/timekeeping.c. Effective userspace impact: settimeofday(2), clock_settime(2), adjtimex(2), and stime(2) all return -EPERM; clock_gettime(2) read paths stay live. NTP discipline / leap-second / TAI helpers lose all in-tree callers and become candidates for LTO dead-stripping. build.sh: extend the apply_patch_once loop, set the 6 new Kconfigs (five =y, POSIX_CPU_TIMERS =n), extend the post-olddefconfig verification list. Cumulative on the production build: vmlinux .text 755,888 -> 746,484 (-9,404), resident .text (subsystem-rollup) 713,412 -> 698,884 (-14,528), linux.axf 1,188,352 -> 1,171,936 (-16,416 / 4 pages). Per-bucket: kernel/sched 30,980 -> 21,568 (-30.4%), kernel/time 39,442 -> 35,446 (-10.1%). Each patch boot-tested independently in QEMU MPS2-AN386 (busybox sleep, date, ps; background CPU hog plus interactive shell coexistence verified for 0019). --- build.sh | 72 +++- patches/0015-tiny-sched-no-topology.patch | 222 +++++++++++ patches/0016-tiny-sched-no-rich-api.patch | 197 ++++++++++ patches/0017-tiny-time-no-cpu-timers.patch | 181 +++++++++ patches/0018-tiny-sched-pelt-rt-mini.patch | 146 +++++++ patches/0019-tiny-sched-rt-tiny.patch | 363 ++++++++++++++++++ patches/0020-tiny-time-no-set-wallclock.patch | 144 +++++++ 7 files changed, 1323 insertions(+), 2 deletions(-) create mode 100644 patches/0015-tiny-sched-no-topology.patch create mode 100644 patches/0016-tiny-sched-no-rich-api.patch create mode 100644 patches/0017-tiny-time-no-cpu-timers.patch create mode 100644 patches/0018-tiny-sched-pelt-rt-mini.patch create mode 100644 patches/0019-tiny-sched-rt-tiny.patch create mode 100644 patches/0020-tiny-time-no-set-wallclock.patch diff --git a/build.sh b/build.sh index 2c6b50e..588d92e 100755 --- a/build.sh +++ b/build.sh @@ -666,7 +666,7 @@ build_linux() { cd linux-${LINUX_VERSION} # Apply linux-tiny patches for reduced memory footprint and LTO support - for p in ../patches/0002-*.patch ../patches/0003-*.patch ../patches/0004-*.patch ../patches/0005-*.patch ../patches/0006-*.patch ../patches/0010-*.patch ../patches/0011-*.patch ../patches/0012-*.patch ../patches/0013-*.patch ../patches/0014-*.patch; do + for p in ../patches/0002-*.patch ../patches/0003-*.patch ../patches/0004-*.patch ../patches/0005-*.patch ../patches/0006-*.patch ../patches/0010-*.patch ../patches/0011-*.patch ../patches/0012-*.patch ../patches/0013-*.patch ../patches/0014-*.patch ../patches/0015-*.patch ../patches/0016-*.patch ../patches/0017-*.patch ../patches/0018-*.patch ../patches/0019-*.patch ../patches/0020-*.patch; do [ -f "${p}" ] || continue apply_patch_once "${p}" done @@ -952,6 +952,68 @@ build_linux() { # still pre-empts via the existing class chain walk. echo "CONFIG_SCHED_FAIR_TINY=y" >>.config + # Patch 0015 introduces CONFIG_SCHED_TOPOLOGY_MINIMAL (default n; set y + # here). Wraps kernel/sched/topology.c body in #ifndef and substitutes + # a minimal stub block: def_root_domain + init_rootdomain + + # init_defrootdomain + rq_attach_root + sched_get/put_rd + + # sched_domains_mutex helpers + empty sched_init_domains and + # partition_sched_domains. Drops sched_domain construction, NUMA, + # perf-domain, asym-capacity, and topology-debug machinery. Safe on + # UP NOMMU (SMP=n, NUMA=n, ENERGY_MODEL=n, CGROUPS=n, CPUSETS=n). + echo "CONFIG_SCHED_TOPOLOGY_MINIMAL=y" >>.config + + # Patch 0016 introduces CONFIG_SCHED_NO_RICH_API (default n; set y here). + # Wraps SYSCALL_DEFINEs for sched_setparam/getparam (154/155), + # sched_set/getscheduler (156/157), sched_get_priority_{max,min} + # (159/160), sched_rr_get_interval (161), sched_{set,get}affinity + # (241/242), sched_{set,get}attr (380/381) in #ifndef; sys_ni.c gains + # COND_SYSCALL aliases so the slot returns -ENOSYS. Internal helpers + # (sched_set_fifo*, sched_setscheduler_nocheck, sched_setattr_nocheck, + # sched_setaffinity, sched_getaffinity, __sched_setaffinity, + # dl_task_check_affinity) stay live for RCU/kthread/compat callers. + # Keeps sched_yield (158) and nice (34) intact. + echo "CONFIG_SCHED_NO_RICH_API=y" >>.config + + # Patch 0017 introduces CONFIG_POSIX_CPU_TIMERS (default y; set n here). + # Wraps kernel/time/posix-cpu-timers.c body in #ifdef and substitutes + # k_clock dispatch tables that return -EINVAL for clock_process / + # clock_thread / clock_posix_cpu, plus no-op stubs for + # posix_cputimers_group_init / posix_cpu_timers_exit{,_group} / + # run_posix_cpu_timers / update_rlimit_cpu / thread_group_sample_cputime + # / set_process_cpu_timer. Effective syscall impact: clock_gettime / + # clock_getres / clock_nanosleep on CLOCK_PROCESS_CPUTIME_ID and + # CLOCK_THREAD_CPUTIME_ID return -EINVAL; setrlimit(RLIMIT_CPU) and + # ITIMER_PROF / ITIMER_VIRTUAL become silent no-ops. + echo "# CONFIG_POSIX_CPU_TIMERS is not set" >>.config + + # Patch 0018 introduces CONFIG_SCHED_PELT_RT_MINI (default n; set y). + # Stubs CFS-side (__update_load_avg_blocked_se / _se / _cfs_rq) and + # DL-side (update_dl_rq_load_avg) PELT entry points to return 0; + # update_other_load_avgs collapses to a thin call to update_rt_rq_load_avg. + # Safe under SCHED_FAIR_TINY=y (no fair-class PELT consumer) and + # SCHED_DEADLINE_CLASS=n (DL stub class never accumulates load). + echo "CONFIG_SCHED_PELT_RT_MINI=y" >>.config + + # Patch 0019 introduces CONFIG_SCHED_RT_TINY (default n; set y here). + # Wraps kernel/sched/rt.c body in #ifndef and substitutes a fixed-priority + # FIFO class: priority bitmap + per-priority list_heads, O(1) enqueue / + # dequeue / pick. Drops SCHED_RR slice rotation (RR collapses to FIFO), + # RT bandwidth period timer, throttle, push/pull migration, cpupri + # find_lowest_rq, and sched_rt/rr_handler sysctl writers. Cross-priority + # preemption stays via resched_curr; RT > fair preemption stays via the + # class-chain walk. Safe on UP NOMMU with no `chrt` applet. + echo "CONFIG_SCHED_RT_TINY=y" >>.config + + # Patch 0020 introduces CONFIG_TIME_NO_SET_WALLCLOCK (default n; set y). + # Stubs do_settimeofday64, do_adjtimex, and timekeeping_warp_clock to + # return -EPERM / no-op. Effective syscall impact: settimeofday(2), + # clock_settime(2), adjtimex(2), and stime(2) all return -EPERM; + # clock_gettime(2) read paths stay live. NTP discipline / leap-second + # / TAI maintenance helpers in timekeeping.c become candidates for LTO + # dead-stripping. Safe with QEMU's stable boot-time epoch and no NTP + # source / RTC on this target. + echo "CONFIG_TIME_NO_SET_WALLCLOCK=y" >>.config + run_logged "olddefconfig" kernel_make olddefconfig # Verify critical config options survived olddefconfig resolution. @@ -1018,7 +1080,13 @@ build_linux() { "# CONFIG_PSI is not set" \ "# CONFIG_CGROUPS is not set" \ "# CONFIG_SCHED_AUTOGROUP is not set" \ - "CONFIG_SCHED_FAIR_TINY=y"; do + "CONFIG_SCHED_FAIR_TINY=y" \ + "CONFIG_SCHED_TOPOLOGY_MINIMAL=y" \ + "CONFIG_SCHED_NO_RICH_API=y" \ + "# CONFIG_POSIX_CPU_TIMERS is not set" \ + "CONFIG_SCHED_PELT_RT_MINI=y" \ + "CONFIG_SCHED_RT_TINY=y" \ + "CONFIG_TIME_NO_SET_WALLCLOCK=y"; do if ! grep -q "^${opt}\$" .config; then echo "ERROR: expected '${opt}' in .config after olddefconfig" exit 1 diff --git a/patches/0015-tiny-sched-no-topology.patch b/patches/0015-tiny-sched-no-topology.patch new file mode 100644 index 0000000..946b7ee --- /dev/null +++ b/patches/0015-tiny-sched-no-topology.patch @@ -0,0 +1,222 @@ +From: Jim Huang +Subject: [PATCH] tiny: sched: gate topology.c behind CONFIG_SCHED_TOPOLOGY_MINIMAL + +kernel/sched/topology.c carries the entire scheduler-domain construction +machinery: sched_domain / sched_group builders (build_sched_domains, +build_overlap_sched_groups, build_sched_groups, ...), NUMA topology +discovery (sched_init_numa, sched_record_numa_dist, init_numa_topology_type, +sched_numa_find_*, sched_numa_hop_mask), root_domain rebuild paths +(partition_sched_domains*, detach_destroy_domains, dattrs_equal), +performance-domain bookkeeping for EAS (build_perf_domains, pd_init, +free_pd, find_pd, perf_domain_debug, sched_energy_set, +sched_energy_aware_handler), asymmetric-capacity classification +(asym_cpu_capacity_scan, asym_cpu_capacity_update_data, free_asym_cap_entry, +sched_update_asym_prefer_cpu, asym_cpu_capacity_classify), and topology +debug (sched_domain_debug*, topology_span_sane, sched_numa_warn). + +On a UP NOMMU image (CONFIG_SMP=n, CONFIG_NUMA=n, CONFIG_ENERGY_MODEL=n, +CONFIG_CPU_FREQ_GOV_SCHEDUTIL=n, CONFIG_CGROUPS=n, CONFIG_CPUSETS=n) every +one of those is unreachable: there is no second CPU to balance to, no +NUMA distance to query, no perf domain to attach, no cpuset filesystem +to call partition_sched_domains. Yet --gc-sections cannot strip them +because their references appear behind structure-of-function-pointers +(sched_domain_topology_level table, sched_class callbacks) and through +externally visible symbols (rebuild_sched_domains -> partition_sched_domains +via cpuset.h's static-inline fallback). + +CONFIG_SCHED_TOPOLOGY_MINIMAL (default n; mainline behaviour preserved) +wraps the body of topology.c in #ifndef and substitutes a minimal stub +block providing only the symbols the rest of the kernel calls into: + + Storage / mutex: + sched_domains_mutex (DEFINE_MUTEX kept at top) + sched_domains_mutex_lock/unlock (mutex_lock/unlock wrappers) + def_root_domain (the single root_domain instance) + + Root-domain lifecycle: + init_rootdomain (cpumask alloc + cpudl/cpupri init) + init_defrootdomain (called once from sched_init) + rq_attach_root (simplified: never replaces an + existing rd, no old-rd teardown) + sched_get_rd / sched_put_rd (atomic_inc / atomic_dec; no + call_rcu to free_rootdomain since + def_root_domain is never detached) + + Domain construction (no-ops): + sched_init_domains (returns 0) + partition_sched_domains (empty; the static-inline + rebuild_sched_domains fallback in + cpuset.h still resolves to a real + symbol) + +cpupri_init / cpudl_init / cpupri_set / cpupri_find / cpudl_set / +cpudl_find still link from rt.c and the deadline.c stub, so init_rootdomain +is preserved verbatim from mainline (cpumask_var_t is non-OFFSTACK on +CONFIG_SMP=n so zalloc_cpumask_var degenerates to cpumask_clear; the +function stays small). + +NUMA helpers (sched_init_numa, sched_update_numa, +sched_domains_numa_masks_set/clear, sched_numa_find_closest) are already +provided as static-inline no-ops in kernel/sched/sched.h under +CONFIG_NUMA=n, so we drop those entry points along with everything else. +update_sched_domain_debugfs / dirty_sched_domain_sysctl are stubbed in +debug.c when CONFIG_SCHED_DEBUG_OUTPUT=n (patch 0012); we drop the +real bodies along with the surrounding domain-building code. + +Cascade: the topology.c entries for sd_llc / sd_share_id / sd_numa / +sd_asym_packing / sd_asym_cpucapacity / sched_asym_cpucapacity / +sched_cluster_active / asym_cap_list disappear with the body. The +DECLARE_PER_CPU declarations in sched.h still resolve because their only +remaining readers are core.c (cpus_share_cache / cpus_share_resources) +and rt.c migration paths -- all of which read zero-initialised per-CPU +storage and degrade to "single-LLC, share resources, no asym". + +Risk: low for our target. Re-enabling CPU_FREQ[_GOV_SCHEDUTIL], +ENERGY_MODEL, cpusets, or SMP requires reverting this patch (or +rebuilding with =n) because none of the domain-construction logic is +reachable through the stubs. + +Measured on Cortex-M4 nommu mps2-an386 (linux-7.0): topology.c shrinks +from 5,028 to ~500 bytes; linux.axf trims roughly 4-4.5 KB after +page-alignment cascade. + +--- + init/Kconfig | 21 +++++++++ + kernel/sched/topology.c | 91 ++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 112 insertions(+) + +diff --git a/init/Kconfig b/init/Kconfig +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -991,6 +991,27 @@ config SCHED_FAIR_TINY + Say N unless you are building a heavily size-constrained image. + Boot test before deploying. + ++config SCHED_TOPOLOGY_MINIMAL ++ bool "Drop scheduler-domain construction (UP NOMMU only)" ++ default n ++ help ++ Replace kernel/sched/topology.c with a minimal stub block that ++ keeps only def_root_domain, init_rootdomain, init_defrootdomain, ++ rq_attach_root, sched_get_rd, sched_put_rd, the ++ sched_domains_mutex helpers, and empty no-op shells for ++ sched_init_domains and partition_sched_domains. ++ ++ Drops all sched_domain / sched_group construction, NUMA topology ++ discovery, performance-domain bookkeeping, asymmetric-capacity ++ classification, and sched-domain debug output. Safe on UP NOMMU ++ builds where SMP=n, NUMA=n, ENERGY_MODEL=n, CPU_FREQ=n, ++ CGROUPS=n, and CPUSETS=n. ++ ++ Re-enabling any of those Kconfigs requires reverting this knob ++ because none of the domain-construction logic remains reachable. ++ ++ Say N unless you are building a heavily size-constrained image. ++ + endmenu + + # +diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c +--- a/kernel/sched/topology.c ++++ b/kernel/sched/topology.c +@@ -7,6 +7,8 @@ + #include + #include "sched.h" + ++#ifndef CONFIG_SCHED_TOPOLOGY_MINIMAL ++ + DEFINE_MUTEX(sched_domains_mutex); + void sched_domains_mutex_lock(void) + { +@@ -2945,3 +2947,91 @@ void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[], + partition_sched_domains_locked(ndoms_new, doms_new, dattr_new); + sched_domains_mutex_unlock(); + } ++ ++#else /* CONFIG_SCHED_TOPOLOGY_MINIMAL */ ++ ++/* ++ * Minimal topology stub for UP NOMMU images. No domain construction, ++ * no NUMA, no perf domains, no asym capacity. Only the symbols that ++ * other TUs (core.c, rt.c, deadline.c stub, cpuset.h fallback) link ++ * against survive. See SCHED_TOPOLOGY_MINIMAL help in init/Kconfig ++ * for the gating rationale. ++ */ ++ ++DEFINE_MUTEX(sched_domains_mutex); ++DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); ++ ++void sched_domains_mutex_lock(void) ++{ ++ mutex_lock(&sched_domains_mutex); ++} ++ ++void sched_domains_mutex_unlock(void) ++{ ++ mutex_unlock(&sched_domains_mutex); ++} ++ ++struct root_domain def_root_domain; ++ ++static int init_rootdomain(struct root_domain *rd) ++{ ++ if (!zalloc_cpumask_var(&rd->span, GFP_KERNEL)) ++ goto out; ++ if (!zalloc_cpumask_var(&rd->online, GFP_KERNEL)) ++ goto free_span; ++ if (!zalloc_cpumask_var(&rd->dlo_mask, GFP_KERNEL)) ++ goto free_online; ++ if (!zalloc_cpumask_var(&rd->rto_mask, GFP_KERNEL)) ++ goto free_dlo_mask; ++ ++ rd->visit_cookie = 0; ++ init_dl_bw(&rd->dl_bw); ++ if (cpudl_init(&rd->cpudl) != 0) ++ goto free_rto_mask; ++ if (cpupri_init(&rd->cpupri) != 0) ++ goto free_cpudl; ++ return 0; ++ ++free_cpudl: ++ cpudl_cleanup(&rd->cpudl); ++free_rto_mask: ++ free_cpumask_var(rd->rto_mask); ++free_dlo_mask: ++ free_cpumask_var(rd->dlo_mask); ++free_online: ++ free_cpumask_var(rd->online); ++free_span: ++ free_cpumask_var(rd->span); ++out: ++ return -ENOMEM; ++} ++ ++void __init init_defrootdomain(void) ++{ ++ init_rootdomain(&def_root_domain); ++ atomic_set(&def_root_domain.refcount, 1); ++} ++ ++void rq_attach_root(struct rq *rq, struct root_domain *rd) ++{ ++ atomic_inc(&rd->refcount); ++ rq->rd = rd; ++ cpumask_set_cpu(rq->cpu, rd->span); ++} ++ ++void sched_get_rd(struct root_domain *rd) ++{ ++ atomic_inc(&rd->refcount); ++} ++ ++void sched_put_rd(struct root_domain *rd) ++{ ++ atomic_dec(&rd->refcount); ++} ++ ++int __init sched_init_domains(const struct cpumask *cpu_map) { return 0; } ++ ++void partition_sched_domains(int ndoms_new, cpumask_var_t doms_new[], ++ struct sched_domain_attr *dattr_new) { } ++ ++#endif /* CONFIG_SCHED_TOPOLOGY_MINIMAL */ diff --git a/patches/0016-tiny-sched-no-rich-api.patch b/patches/0016-tiny-sched-no-rich-api.patch new file mode 100644 index 0000000..9603070 --- /dev/null +++ b/patches/0016-tiny-sched-no-rich-api.patch @@ -0,0 +1,197 @@ +From: Jim Huang +Subject: [PATCH] tiny: sched: gate sched_setattr/setscheduler family behind CONFIG_SCHED_NO_RICH_API + +kernel/sched/syscalls.c carries the userspace policy/priority/affinity +control surface: sched_setparam (154), sched_getparam (155), +sched_setscheduler (156), sched_getscheduler (157), sched_get_priority_max +(159), sched_get_priority_min (160), sched_rr_get_interval (161), +sched_setaffinity (241), sched_getaffinity (242), sched_setattr (380), +sched_getattr (381). On a NOMMU image without `chrt`, `taskset`, or any +RT-aware applet (BusyBox here ships none of them, uClibc threads are +disabled, the kernel cannot host a third-party userspace), every one of +those handlers is dead reachability for the shipped binaries. + +The internal C-level helpers (sched_set_fifo / sched_set_fifo_low / +sched_set_fifo_secondary / sched_set_normal / sched_setattr_nocheck / +sched_setscheduler_nocheck / sched_setaffinity / sched_getaffinity / +__sched_setaffinity / dl_task_check_affinity) still have in-kernel +callers (RCU kthread priority bumps, kthread_park, compat layer) and +must remain live. sched_yield (158) and the SYSCALL_DEFINE1(nice) +wrapper are kept because hush / busybox / loop guards rely on them. + +CONFIG_SCHED_NO_RICH_API (default n; mainline behaviour preserved) wraps +the gated functions in #ifndef. The corresponding sys_ symbols +become undefined; kernel/sys_ni.c gains COND_SYSCALL entries so the +syscall-table slot weakly aliases sys_ni_syscall and the syscall +returns -ENOSYS. Internal helpers, the sched_yield / nice handlers, +and the do_sched_yield / yield / yield_to internal API remain live. + +Gated regions inside kernel/sched/syscalls.c: + + - sched_setscheduler / sched_setattr (non-static C wrappers, only + callable through the gated SYSCALL_DEFINEs at runtime; the in-tree + direct callers we surveyed -- kernel/locking/rtmutex_api.c and + kernel/sched/ext.c -- only mention them in comments) + + - do_sched_setscheduler + sched_copy_attr + get_params static helpers, + and SYSCALL_DEFINEs for sched_setscheduler / sched_setparam / + sched_setattr / sched_getscheduler / sched_getparam / sched_getattr + (the helpers have no other callers; LTO would otherwise have to walk + the static-fn graph through dropped SYSCALL_DEFINE bodies) + + - get_user_cpu_mask static helper + SYSCALL_DEFINE3(sched_setaffinity) + - SYSCALL_DEFINE3(sched_getaffinity) + - SYSCALL_DEFINE1(sched_get_priority_max), + SYSCALL_DEFINE1(sched_get_priority_min) + - sched_rr_get_interval static helper + SYSCALL_DEFINE2(sched_rr_get_interval) + + SYSCALL_DEFINE2(sched_rr_get_interval_time32) + +Risk: low for our applet set; any third-party RT/affinity tool breaks +immediately with -ENOSYS. busybox `nice` (uses SYSCALL_DEFINE1(nice), +not gated) keeps working; busybox `sleep` (clock_nanosleep) keeps +working; hush built-in `wait` / `kill` / `jobs` keep working. + +Measured on Cortex-M4 nommu mps2-an386 (linux-7.0): syscalls.c shrinks +from 3,192 to roughly 700 bytes / 41 -> ~10 symbols; linux.axf trims +2.0-2.5 KB after page-alignment cascade. + +--- + init/Kconfig | 21 +++++++++++++++++++++ + kernel/sched/syscalls.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ + kernel/sys_ni.c | 13 +++++++++++++ + 3 files changed, 78 insertions(+) + +diff --git a/init/Kconfig b/init/Kconfig +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -1012,6 +1012,27 @@ config SCHED_TOPOLOGY_MINIMAL + + Say N unless you are building a heavily size-constrained image. + ++config SCHED_NO_RICH_API ++ bool "Drop sched_setattr/setscheduler/affinity/priority/rr_get_interval syscalls" ++ default n ++ help ++ Stub the userspace policy/priority/affinity scheduling syscalls ++ to -ENOSYS via weak cond_syscall aliases: sched_setparam (154), ++ sched_getparam (155), sched_setscheduler (156), sched_getscheduler ++ (157), sched_get_priority_{max,min} (159/160), sched_rr_get_interval ++ (161), sched_{set,get}affinity (241/242), sched_{set,get}attr ++ (380/381). sched_yield (158) and nice (34) remain available; ++ in-kernel C helpers (sched_set_fifo / sched_setscheduler_nocheck / ++ sched_setattr_nocheck / sched_setaffinity / sched_getaffinity / ++ __sched_setaffinity / dl_task_check_affinity) stay live for RCU ++ / kthread / compat callers. ++ ++ Safe on NOMMU images that ship no `chrt`, no `taskset`, and no ++ RT-aware applet. Any third-party tool that programs scheduling ++ policy or affinity from userspace will break with -ENOSYS. ++ ++ Say N unless you are building a heavily size-constrained image. ++ + endmenu + + # +diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c +--- a/kernel/sched/syscalls.c ++++ b/kernel/sched/syscalls.c +@@ -755,6 +755,7 @@ int sched_setattr_nocheck_user(struct task_struct *p, const struct sched_attr *u + * + * NOTE that the task may be already dead. + */ ++#ifndef CONFIG_SCHED_NO_RICH_API + int sched_setscheduler(struct task_struct *p, int policy, + const struct sched_param *param) + { +@@ -765,6 +766,7 @@ int sched_setattr(struct task_struct *p, const struct sched_attr *attr) + { + return __sched_setscheduler(p, attr, true, true); + } ++#endif /* CONFIG_SCHED_NO_RICH_API */ + + int sched_setattr_nocheck(struct task_struct *p, const struct sched_attr *attr) + { +@@ -849,6 +851,7 @@ void sched_set_normal(struct task_struct *p, int nice) + } + EXPORT_SYMBOL_GPL(sched_set_normal); + ++#ifndef CONFIG_SCHED_NO_RICH_API + static int + do_sched_setscheduler(pid_t pid, int policy, struct sched_param __user *param) + { +@@ -1097,6 +1100,7 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr, + kattr.size = min(usize, sizeof(kattr)); + return copy_struct_to_user(uattr, usize, &kattr, sizeof(kattr), NULL); + } ++#endif /* CONFIG_SCHED_NO_RICH_API */ + + int dl_task_check_affinity(struct task_struct *p, const struct cpumask *mask) + { +@@ -1234,6 +1238,7 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask) + return retval; + } + ++#ifndef CONFIG_SCHED_NO_RICH_API + static int get_user_cpu_mask(unsigned long __user *user_mask_ptr, unsigned len, + struct cpumask *new_mask) + { +@@ -1268,6 +1273,7 @@ SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len, + free_cpumask_var(new_mask); + return retval; + } ++#endif /* CONFIG_SCHED_NO_RICH_API */ + + long sched_getaffinity(pid_t pid, struct cpumask *mask) + { +@@ -1289,6 +1295,7 @@ long sched_getaffinity(pid_t pid, struct cpumask *mask) + return 0; + } + ++#ifndef CONFIG_SCHED_NO_RICH_API + /** + * sys_sched_getaffinity - get the CPU affinity of a process + * @pid: pid of the process +@@ -1325,6 +1332,7 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len, + + return ret; + } ++#endif /* CONFIG_SCHED_NO_RICH_API */ + + static void do_sched_yield(void) + { +@@ -1452,6 +1460,7 @@ int __sched yield_to(struct task_struct *p, bool preempt) + } + EXPORT_SYMBOL_GPL(yield_to); + ++#ifndef CONFIG_SCHED_NO_RICH_API + /** + * sys_sched_get_priority_max - return maximum RT priority. + * @policy: scheduling class. +@@ -1570,3 +1579,4 @@ SYSCALL_DEFINE2(sched_rr_get_interval_time32, pid_t, pid, + return retval; + } + #endif ++#endif /* CONFIG_SCHED_NO_RICH_API */ +diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c +--- a/kernel/sys_ni.c ++++ b/kernel/sys_ni.c +@@ -333,6 +333,19 @@ COND_SYSCALL(stime32); + COND_SYSCALL(utime32); + COND_SYSCALL(adjtimex_time32); + COND_SYSCALL(sched_rr_get_interval_time32); ++ ++/* sched: rich policy/priority/affinity API gated by CONFIG_SCHED_NO_RICH_API */ ++COND_SYSCALL(sched_setparam); ++COND_SYSCALL(sched_getparam); ++COND_SYSCALL(sched_setscheduler); ++COND_SYSCALL(sched_getscheduler); ++COND_SYSCALL(sched_setattr); ++COND_SYSCALL(sched_getattr); ++COND_SYSCALL(sched_setaffinity); ++COND_SYSCALL(sched_getaffinity); ++COND_SYSCALL(sched_get_priority_max); ++COND_SYSCALL(sched_get_priority_min); ++COND_SYSCALL(sched_rr_get_interval); + COND_SYSCALL(nanosleep_time32); + COND_SYSCALL(rt_sigtimedwait_time32); + COND_SYSCALL_COMPAT(rt_sigtimedwait_time32); diff --git a/patches/0017-tiny-time-no-cpu-timers.patch b/patches/0017-tiny-time-no-cpu-timers.patch new file mode 100644 index 0000000..e526a14 --- /dev/null +++ b/patches/0017-tiny-time-no-cpu-timers.patch @@ -0,0 +1,181 @@ +From: Jim Huang +Subject: [PATCH] tiny: time: gate posix-cpu-timers.c behind CONFIG_POSIX_CPU_TIMERS + +Linux 7.0 has no standalone CONFIG_POSIX_CPU_TIMERS knob (only the +helper CONFIG_POSIX_CPU_TIMERS_TASK_WORK exists); kernel/time/posix-cpu-timers.c +is unconditionally compiled when CONFIG_POSIX_TIMERS=y. On a NOMMU +image with no userspace consumer of CLOCK_PROCESS_CPUTIME_ID, +CLOCK_THREAD_CPUTIME_ID, RLIMIT_CPU, or ITIMER_PROF/VIRTUAL, the file +is ~3.7 KB / 42 symbols of dead .text. BusyBox `date` reads +CLOCK_REALTIME (clock_realtime k_clock); BusyBox `sleep` calls +clock_nanosleep on CLOCK_MONOTONIC; neither touches CPU-clock IDs. + +This patch introduces a new CONFIG_POSIX_CPU_TIMERS Kconfig (default y +to preserve mainline behaviour). Setting it to n wraps the body of +posix-cpu-timers.c in #ifdef and provides no-op stubs for the +externally-called helpers plus stub k_clock dispatch tables for +CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID, and the dynamic +clock_posix_cpu range. + +External-API stubs (matching signatures in include/linux/posix-timers.h): + + Task / process lifecycle (called from kernel/fork.c, kernel/exit.c): + posix_cputimers_group_init -- empty + posix_cpu_timers_exit -- empty + posix_cpu_timers_exit_group -- empty + + Task-work helpers (clear_posix_cputimers_work, posix_cputimers_init_work) + are already inline-stubbed in include/linux/posix-timers.h via + CONFIG_POSIX_CPU_TIMERS_TASK_WORK=n; we do not redefine them. + + Tick / sysctl / itimer (called from kernel/time/timer.c, itimer.c, sys.c): + run_posix_cpu_timers -- empty (no expiry work) + update_rlimit_cpu -- returns 0 (RLIMIT_CPU silently accepted) + thread_group_sample_cputime -- zeros samples[] + set_process_cpu_timer -- zeros oldval if non-NULL + + k_clock dispatchers (referenced by kernel/time/posix-timers.c clock_id table): + clock_process -- {.clock_get_timespec, .clock_getres, + .timer_create, .nsleep} all return -EINVAL + clock_thread -- same shape + clock_posix_cpu -- same shape, used by dynamic clockid + + Effect on syscalls: clock_gettime / clock_getres / clock_nanosleep on + CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID return -EINVAL; + timer_create with those clock IDs returns -EINVAL; setrlimit(RLIMIT_CPU) + silently accepts the limit but the kernel never charges or signals; + ITIMER_PROF and ITIMER_VIRTUAL likewise become silent no-ops. No + syscall numbers vanish, so the userspace ABI surface is preserved + for tools that probe-and-skip. + +BusyBox here issues none of these calls; uClibc has realtime support +enabled but threads disabled, so CLOCK_THREAD_CPUTIME_ID is dead for +this image regardless of the gate. + +Risk: medium. Third-party profiling / CPU-budget tooling breaks. +Benign for the shipped applet set. + +Companion (deferred): CONFIG_TICK_DEP_MINIMAL would later trim the +tick-dependency machinery in kernel/time/tick-sched.c that backs +CPU-clock-driven scheduling (~0.5-0.8 KB extra). + +Measured on Cortex-M4 nommu mps2-an386 (linux-7.0): posix-cpu-timers.c +shrinks from 3,722 to ~200 bytes; linux.axf trims roughly 3.0-3.5 KB +after page-alignment cascade. + +--- + init/Kconfig | 17 ++++++++ + kernel/time/posix-cpu-timers.c | 76 ++++++++++++++++++++++++++++++++++ + 2 files changed, 93 insertions(+) + +diff --git a/init/Kconfig b/init/Kconfig +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -1033,6 +1033,23 @@ config SCHED_NO_RICH_API + + Say N unless you are building a heavily size-constrained image. + ++config POSIX_CPU_TIMERS ++ bool "POSIX per-process / per-thread CPU clocks" ++ depends on POSIX_TIMERS ++ default y ++ help ++ Build kernel/time/posix-cpu-timers.c, which implements ++ CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID for ++ clock_gettime / clock_getres / clock_nanosleep / timer_create, ++ RLIMIT_CPU enforcement, and ITIMER_PROF / ITIMER_VIRTUAL ++ expiry. ++ ++ Say N on size-constrained NOMMU images that never query CPU ++ clocks, never set RLIMIT_CPU, and never arm ITIMER_PROF / ++ ITIMER_VIRTUAL. Affected calls return -EINVAL; setrlimit and ++ setitimer become silent no-ops. No syscall numbers vanish. ++ Boot test before deploying. ++ + endmenu + + # +diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c +--- a/kernel/time/posix-cpu-timers.c ++++ b/kernel/time/posix-cpu-timers.c +@@ -19,6 +19,8 @@ + + #include "posix-timers.h" + ++#ifdef CONFIG_POSIX_CPU_TIMERS ++ + static void posix_cpu_timer_rearm(struct k_itimer *timer); + + void posix_cputimers_group_init(struct posix_cputimers *pct, u64 cpu_limit) +@@ -1668,3 +1670,71 @@ const struct k_clock clock_thread = { + .clock_get_timespec = thread_cpu_clock_get, + .timer_create = thread_cpu_timer_create, + }; ++ ++#else /* !CONFIG_POSIX_CPU_TIMERS */ ++ ++/* ++ * Minimal stub for !CONFIG_POSIX_CPU_TIMERS. CPU-clock IDs and ++ * RLIMIT_CPU / ITIMER_{PROF,VIRTUAL} expiry become silent no-ops; ++ * clock_gettime / clock_getres / clock_nanosleep / timer_create on ++ * CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID return -EINVAL ++ * via the k_clock stub-method tables registered below. See the ++ * POSIX_CPU_TIMERS help in init/Kconfig for the gating rationale. ++ */ ++ ++void posix_cputimers_group_init(struct posix_cputimers *pct, u64 cpu_limit) { } ++void posix_cpu_timers_exit(struct task_struct *tsk) { } ++void posix_cpu_timers_exit_group(struct task_struct *tsk) { } ++void run_posix_cpu_timers(void) { } ++ ++int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new) ++{ ++ return 0; ++} ++ ++void thread_group_sample_cputime(struct task_struct *tsk, u64 *samples) ++{ ++ samples[0] = samples[1] = samples[2] = 0; ++} ++ ++void set_process_cpu_timer(struct task_struct *tsk, unsigned int clkid, ++ u64 *newval, u64 *oldval) ++{ ++ if (oldval) ++ *oldval = 0; ++} ++ ++static int posix_cpu_stub_getres(const clockid_t which_clock, ++ struct timespec64 *tp) ++{ ++ return -EINVAL; ++} ++ ++static int posix_cpu_stub_get(const clockid_t which_clock, ++ struct timespec64 *tp) ++{ ++ return -EINVAL; ++} ++ ++static int posix_cpu_stub_create(struct k_itimer *new_timer) ++{ ++ return -EINVAL; ++} ++ ++static int posix_cpu_stub_nsleep(const clockid_t which_clock, int flags, ++ const struct timespec64 *rqtp) ++{ ++ return -EINVAL; ++} ++ ++#define POSIX_CPU_STUB_KCLOCK \ ++ .clock_getres = posix_cpu_stub_getres, \ ++ .clock_get_timespec = posix_cpu_stub_get, \ ++ .timer_create = posix_cpu_stub_create, \ ++ .nsleep = posix_cpu_stub_nsleep ++ ++const struct k_clock clock_posix_cpu = { POSIX_CPU_STUB_KCLOCK }; ++const struct k_clock clock_process = { POSIX_CPU_STUB_KCLOCK }; ++const struct k_clock clock_thread = { POSIX_CPU_STUB_KCLOCK }; ++ ++#endif /* CONFIG_POSIX_CPU_TIMERS */ diff --git a/patches/0018-tiny-sched-pelt-rt-mini.patch b/patches/0018-tiny-sched-pelt-rt-mini.patch new file mode 100644 index 0000000..d7292e1 --- /dev/null +++ b/patches/0018-tiny-sched-pelt-rt-mini.patch @@ -0,0 +1,146 @@ +From: Jim Huang +Subject: [PATCH] tiny: sched: thin pelt.c to its rt.c contract under CONFIG_SCHED_PELT_RT_MINI + +Patch 0014 (CONFIG_SCHED_FAIR_TINY) replaced fair.c's CFS body with a +3-priority O(1) FIFO that does not call PELT, and patch 0013 +(CONFIG_SCHED_DEADLINE_CLASS=n) reduced deadline.c to a stub class +whose only live callback is pick_task. The CFS-side and DL-side PELT +entry points (__update_load_avg_blocked_se, __update_load_avg_se, +__update_load_avg_cfs_rq, update_dl_rq_load_avg) consequently lose +all in-tree callers, but kernel/sched/pelt.c kept them as non-static +externs and LTO cannot dead-strip across that linkage boundary -- the +file stayed at its full 1,758-byte / 8-symbol footprint. + +CONFIG_SCHED_PELT_RT_MINI (default n; mainline behaviour preserved) +trims pelt.c to the rt.c contract. When set: + + Kept: + runnable_avg_yN_inv data table (~128 B; walked by decay_load) + decay_load (~84 B) + __accumulate_pelt_segments (~48 B) + accumulate_sum (helper for ___update_load_sum) + ___update_load_sum (helper for update_rt_rq_load_avg) + ___update_load_avg (helper for update_rt_rq_load_avg) + update_rt_rq_load_avg (~302 B; 4 callsites in rt.c) + + Stubbed (return 0): + __update_load_avg_blocked_se (no fair-class consumer post-0014) + __update_load_avg_se (no fair-class consumer post-0014) + __update_load_avg_cfs_rq (no fair-class consumer post-0014) + update_dl_rq_load_avg (DL class stubbed by 0013) + + Replaced with thin version: + update_other_load_avgs (now: only update_rt_rq_load_avg; + DL/HW/IRQ paths drop) + +update_hw_load_avg (CONFIG_SCHED_HW_PRESSURE, unselected) and +update_irq_load_avg (CONFIG_HAVE_SCHED_AVG_IRQ via IRQ_TIME_ACCOUNTING, +unselected) are already inline-stubbed in kernel/sched/pelt.h on this +target; we do not touch them here. core.c::sched_tick still calls +update_hw_load_avg unconditionally; it resolves to the inline no-op. + +Risk: low. Future EAS / schedutil / fair-class re-enablement requires +reverting this knob. + +Measured on Cortex-M4 nommu mps2-an386 (linux-7.0): pelt.c shrinks from +1,758 to ~700 bytes; linux.axf trims roughly 1.0-1.4 KB depending on +how LTO cascades into core.c sched_tick after the stubs flatten. + +--- + init/Kconfig | 22 ++++++++++++++++++++++ + kernel/sched/pelt.c | 18 ++++++++++++++++++ + 2 files changed, 40 insertions(+) + +diff --git a/init/Kconfig b/init/Kconfig +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -1050,6 +1050,28 @@ config POSIX_CPU_TIMERS + setitimer become silent no-ops. No syscall numbers vanish. + Boot test before deploying. + ++config SCHED_PELT_RT_MINI ++ bool "Trim PELT to its rt.c contract" ++ default n ++ help ++ Drop the CFS-side and DL-side PELT entry points ++ (__update_load_avg_blocked_se / __update_load_avg_se / ++ __update_load_avg_cfs_rq / update_dl_rq_load_avg) from ++ kernel/sched/pelt.c. Replace their bodies with -> 0 stubs. ++ Keep update_rt_rq_load_avg and its helpers (decay_load, ++ __accumulate_pelt_segments, accumulate_sum, ___update_load_sum, ++ ___update_load_avg, runnable_avg_yN_inv) live for rt.c's ++ 4 callsites. Replace update_other_load_avgs with a thin version ++ that only calls update_rt_rq_load_avg. ++ ++ Safe when SCHED_FAIR_TINY=y (fair.c does not call PELT) and ++ SCHED_DEADLINE_CLASS=n (deadline.c stubbed and does not call ++ PELT). Re-enabling either requires reverting this knob because ++ the dropped entry points are stubbed to no-ops, not removed ++ from the link. ++ ++ Say N unless you are building a heavily size-constrained image. ++ + endmenu + + # +diff --git a/kernel/sched/pelt.c b/kernel/sched/pelt.c +--- a/kernel/sched/pelt.c ++++ b/kernel/sched/pelt.c +@@ -293,6 +293,7 @@ + * load_avg = \Sum se->avg.load_avg + */ + ++#ifndef CONFIG_SCHED_PELT_RT_MINI + int __update_load_avg_blocked_se(u64 now, struct sched_entity *se) + { + if (___update_load_sum(now, &se->avg, 0, 0, 0)) { +@@ -331,6 +332,13 @@ int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq) + + return 0; + } ++#else /* CONFIG_SCHED_PELT_RT_MINI */ ++/* Fair-class PELT is dead under SCHED_FAIR_TINY=y; stub the externs. */ ++int __update_load_avg_blocked_se(u64 now, struct sched_entity *se) { return 0; } ++int __update_load_avg_se(u64 now, struct cfs_rq *cfs_rq, ++ struct sched_entity *se) { return 0; } ++int __update_load_avg_cfs_rq(u64 now, struct cfs_rq *cfs_rq) { return 0; } ++#endif /* CONFIG_SCHED_PELT_RT_MINI */ + + /* + * rt_rq: +@@ -370,6 +378,7 @@ int update_rt_rq_load_avg(u64 now, struct rq *rq, int running) + * + */ + ++#ifndef CONFIG_SCHED_PELT_RT_MINI + int update_dl_rq_load_avg(u64 now, struct rq *rq, int running) + { + if (___update_load_sum(now, &rq->avg_dl, +@@ -384,6 +393,10 @@ int update_dl_rq_load_avg(u64 now, struct rq *rq, int running) + + return 0; + } ++#else /* CONFIG_SCHED_PELT_RT_MINI */ ++/* DL class stubbed by SCHED_DEADLINE_CLASS=n; no live caller. */ ++int update_dl_rq_load_avg(u64 now, struct rq *rq, int running) { return 0; } ++#endif /* CONFIG_SCHED_PELT_RT_MINI */ + + #ifdef CONFIG_SCHED_HW_PRESSURE + /* +@@ -476,6 +489,7 @@ int update_irq_load_avg(struct rq *rq, u64 running) + */ + bool update_other_load_avgs(struct rq *rq) + { ++#ifndef CONFIG_SCHED_PELT_RT_MINI + u64 now = rq_clock_pelt(rq); + const struct sched_class *curr_class = rq->donor->sched_class; + unsigned long hw_pressure = arch_scale_hw_pressure(cpu_of(rq)); +@@ -487,4 +501,8 @@ bool update_other_load_avgs(struct rq *rq) + update_dl_rq_load_avg(now, rq, curr_class == &dl_sched_class) | + update_hw_load_avg(rq_clock_task(rq), rq, hw_pressure) | + update_irq_load_avg(rq, 0); ++#else /* CONFIG_SCHED_PELT_RT_MINI: only RT class survives */ ++ lockdep_assert_rq_held(rq); ++ return update_rt_rq_load_avg(rq_clock_pelt(rq), rq, true); ++#endif + } diff --git a/patches/0019-tiny-sched-rt-tiny.patch b/patches/0019-tiny-sched-rt-tiny.patch new file mode 100644 index 0000000..080d595 --- /dev/null +++ b/patches/0019-tiny-sched-rt-tiny.patch @@ -0,0 +1,363 @@ +From: Jim Huang +Subject: [PATCH] tiny: sched: replace rt.c with a fixed-priority FIFO under CONFIG_SCHED_RT_TINY + +Mainline kernel/sched/rt.c carries the full PREEMPT_RT-class machinery +that none of our shipped configuration exercises: + + - SCHED_RR round-robin slice accounting (rr_nr_running, time_slice + decrement in task_tick_rt, requeue at the queue tail on slice + expiry) + - RT bandwidth period/runtime limiter (sched_rt_period_timer hrtimer, + do_balance_runtime, __disable_runtime, __enable_runtime, + do_sched_rt_period_timer, balance_runtime, sched_rt_runtime_exceeded) + - RT throttle and rt_rq_throttled() machinery + - Push/pull migration glue (push_rt_task, pull_rt_task, find_lock_rq, + has_pushable_tasks, rt_set_overload, rto_push_irq_work_func, the + HAVE_RT_PUSH_IPI infrastructure, rt_queue_push_tasks / + rt_queue_pull_task balance callbacks) + - cpupri-driven find_lowest_rq target selection + - sched_rt_handler / sched_rr_handler sysctl writers + +UP NOMMU has one CPU, so the entire pull/push surface (~424 B +pull_rt_task plus IPI plumbing) is dead. CONFIG_RT_GROUP_SCHED=n +unselects RT bandwidth and the period timer. BusyBox ships no `chrt` +applet, and patch 0016 (CONFIG_SCHED_NO_RICH_API) gates +sched_setscheduler from userspace, so SCHED_RR is unreachable from the +shipped applet set. RR vs FIFO has no userspace consumer. + +CONFIG_SCHED_RT_TINY (default n; mainline behaviour preserved) wraps +the rt.c body in #ifndef and substitutes a compact fixed-priority FIFO +class: + + Storage: the existing rt_rq->active rt_prio_array (MAX_RT_PRIO+1 + bitmap + per-priority list_heads) is retained verbatim; tasks chain + through &p->rt.run_list as before. + + Pick: find_first_bit(bitmap, MAX_RT_PRIO) + list_first_entry O(1) + Enqueue: list_add_tail + __set_bit O(1) + Dequeue: list_del_init + __clear_bit when the queue empties O(1) + + No RR rotation -- task_tick_rt only updates exec_start. No throttle + -- update_curr_rt only accounts run-time. No bandwidth -- the rt + bandwidth init / period timer plumbing is dropped. No push/pull -- + UP makes the entire migration class trivial. Cross-priority preemption + works the same as mainline: a wakeup at higher priority calls + resched_curr. RT > fair preemption still goes through the existing + class-chain walk in pick_next_task / pick_next_task_balance. + + PELT cascade: the tiny class also drops the four mainline call sites + to update_rt_rq_load_avg (in update_curr_rt, task_tick_rt, + switched_to_rt, switched_from_rt). Combined with patch 0018 + (CONFIG_SCHED_PELT_RT_MINI), update_rt_rq_load_avg loses every + in-tree caller. rq->avg_rt has no consumer in this build (no + CPU_FREQ_GOV_SCHEDUTIL, no ENERGY_MODEL, no IRQ_TIME_ACCOUNTING), + so the load tracking is pure overhead and dropping it is the design + intent. LTO will then dead-strip the remaining decay_load / + __accumulate_pelt_segments / accumulate_sum / ___update_load_sum / + ___update_load_avg helpers that 0018 explicitly kept. + + SCHED_RR collapses into SCHED_FIFO. Any task admitted with policy=2 + via sched_setscheduler_nocheck (RCU kthread priority bump path; the + userspace SYSCALL_DEFINE3(sched_setscheduler) is gated to -ENOSYS by + patch 0016) lands at its sched_priority and runs FIFO until it + blocks or yields. + +External-API stubs preserved for core.c / sched.h: + + rt_sched_class the class struct + init_rt_rq rt_prio_array init + sentinel + init_rt_bandwidth empty (CONFIG_RT_GROUP_SCHED=n) + init_sched_rt_class empty + sched_rr_timeslice storage; default RR_TIMESLICE + sysctl_sched_rt_period storage; default 1000000 + sysctl_sched_rt_runtime storage; default 950000 + unregister_rt_sched_group / free_rt_sched_group / alloc_rt_sched_group + only reachable under + CGROUP_SCHED=y; not provided + +Risk: medium. No bandwidth containment for runaway RT tasks. Only +matters if a future workload programs SCHED_FIFO priorities; the +shipped applet set does not. + +Measured on Cortex-M4 nommu mps2-an386 (linux-7.0): rt.c shrinks from +3,400 to roughly 1.5 KB; linux.axf trims 1.5-2.0 KB after +page-alignment cascade. + +--- + init/Kconfig | 29 ++++++++++++++++ + kernel/sched/rt.c | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++ + 2 files changed, 238 insertions(+) + +diff --git a/init/Kconfig b/init/Kconfig +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -1072,6 +1072,35 @@ config SCHED_PELT_RT_MINI + + Say N unless you are building a heavily size-constrained image. + ++config SCHED_RT_TINY ++ bool "Compact fixed-priority FIFO replacement for rt_sched_class" ++ default n ++ help ++ Replace kernel/sched/rt.c with a minimal fixed-priority FIFO ++ class. Drops: ++ ++ - SCHED_RR round-robin slice accounting (RR collapses to FIFO) ++ - RT bandwidth period/runtime limiter and the period hrtimer ++ - RT throttle / rt_rq_throttled ++ - Push/pull migration glue (UP makes the entire surface dead) ++ - cpupri-driven find_lowest_rq target selection ++ - sched_rt_handler / sched_rr_handler sysctl writers ++ ++ Keeps the priority-bitmap + per-priority FIFO list data ++ structure unchanged (rt_rq->active.bitmap + ++ rt_rq->active.queue[]); enqueue / dequeue / pick are O(1). ++ RT > fair preemption still works through the existing ++ class-chain walk; cross-RT-priority preemption uses ++ resched_curr at wakeup. ++ ++ Designed for size-constrained NOMMU images that ship no `chrt` ++ applet and where SCHED_RR vs SCHED_FIFO has no userspace ++ consumer. Re-enabling RT bandwidth, RT_GROUP_SCHED, SMP, or ++ any RT-aware workload requires reverting this knob. ++ ++ Say N unless you are building a heavily size-constrained image. ++ Boot test before deploying. ++ + endmenu + + # +diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c +--- a/kernel/sched/rt.c ++++ b/kernel/sched/rt.c +@@ -6,6 +6,8 @@ + + #include "sched.h" + #include "pelt.h" ++ ++#ifndef CONFIG_SCHED_RT_TINY + + int sched_rr_timeslice = RR_TIMESLICE; + /* More than 4 hours if BW_SHIFT equals 20. */ +@@ -2949,3 +2951,221 @@ void print_rt_stats(struct seq_file *m, int cpu) + print_rt_rq(m, cpu, rt_rq); + rcu_read_unlock(); + } ++ ++#else /* CONFIG_SCHED_RT_TINY */ ++ ++#include ++#include ++#include ++#include ++#include ++ ++/* ++ * Tiny fixed-priority FIFO RT class. Three operations are O(1): ++ * ++ * pick -> find_first_bit(bitmap, MAX_RT_PRIO) + list_first_entry ++ * enqueue -> list_add_tail + __set_bit ++ * dequeue -> list_del_init + __clear_bit if queue empties ++ * ++ * No RR slice rotation, no throttle, no bandwidth, no push/pull, no ++ * cpupri. Cross-priority preemption fires at wakeup via resched_curr; ++ * RT > fair preemption stays handled by the class-chain walk in ++ * pick_next_task_balance. See SCHED_RT_TINY help in init/Kconfig ++ * for the full rationale. ++ */ ++ ++/* Sysctl tunables that core.c / build_utility.c reference at init. */ ++int sched_rr_timeslice = RR_TIMESLICE; ++int sysctl_sched_rt_period = 1000000; ++int sysctl_sched_rt_runtime = 950000; ++ ++void init_rt_rq(struct rt_rq *rt_rq) ++{ ++ struct rt_prio_array *array = &rt_rq->active; ++ int i; ++ ++ for (i = 0; i < MAX_RT_PRIO; i++) { ++ INIT_LIST_HEAD(array->queue + i); ++ __clear_bit(i, array->bitmap); ++ } ++ /* Delimiter for bitsearch: */ ++ __set_bit(MAX_RT_PRIO, array->bitmap); ++ ++ rt_rq->rt_nr_running = 0; ++ rt_rq->rr_nr_running = 0; ++ rt_rq->highest_prio.curr = MAX_RT_PRIO - 1; ++ rt_rq->highest_prio.next = MAX_RT_PRIO - 1; ++ rt_rq->overloaded = false; ++ plist_head_init(&rt_rq->pushable_tasks); ++ rt_rq->rt_queued = 0; ++} ++ ++void init_rt_bandwidth(struct rt_bandwidth *rt_b, u64 period, u64 runtime) { } ++void __init init_sched_rt_class(void) { } ++ ++/* sched_class methods --------------------------------------------------- */ ++ ++static void enqueue_task_rt(struct rq *rq, struct task_struct *p, int flags) ++{ ++ struct rt_rq *rt_rq = &rq->rt; ++ struct sched_rt_entity *rt_se = &p->rt; ++ int prio = p->prio; ++ ++ if (rt_se->on_rq) ++ return; ++ ++ list_add_tail(&rt_se->run_list, &rt_rq->active.queue[prio]); ++ __set_bit(prio, rt_rq->active.bitmap); ++ rt_rq->rt_nr_running++; ++ if (prio < rt_rq->highest_prio.curr) ++ rt_rq->highest_prio.curr = prio; ++ rt_rq->rt_queued = 1; ++ add_nr_running(rq, 1); ++ rt_se->on_rq = 1; ++} ++ ++static bool dequeue_task_rt(struct rq *rq, struct task_struct *p, int flags) ++{ ++ struct rt_rq *rt_rq = &rq->rt; ++ struct sched_rt_entity *rt_se = &p->rt; ++ int prio = p->prio; ++ int next_prio; ++ ++ if (!rt_se->on_rq) ++ return false; ++ ++ update_curr_common(rq); ++ ++ list_del_init(&rt_se->run_list); ++ if (list_empty(&rt_rq->active.queue[prio])) ++ __clear_bit(prio, rt_rq->active.bitmap); ++ ++ rt_rq->rt_nr_running--; ++ rt_se->on_rq = 0; ++ sub_nr_running(rq, 1); ++ ++ next_prio = find_first_bit(rt_rq->active.bitmap, MAX_RT_PRIO); ++ rt_rq->highest_prio.curr = (next_prio < MAX_RT_PRIO) ? next_prio ++ : MAX_RT_PRIO - 1; ++ if (!rt_rq->rt_nr_running) ++ rt_rq->rt_queued = 0; ++ return true; ++} ++ ++static void yield_task_rt(struct rq *rq) ++{ ++ struct rt_rq *rt_rq = &rq->rt; ++ struct sched_rt_entity *rt_se = &rq->donor->rt; ++ int prio = rq->donor->prio; ++ ++ if (rt_se->on_rq && ++ rt_rq->active.queue[prio].prev != &rt_se->run_list) { ++ list_del(&rt_se->run_list); ++ list_add_tail(&rt_se->run_list, &rt_rq->active.queue[prio]); ++ } ++} ++ ++static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags) ++{ ++ if (p->prio < rq->donor->prio) ++ resched_curr(rq); ++} ++ ++static struct task_struct *pick_task_rt(struct rq *rq, struct rq_flags *rf) ++{ ++ struct rt_rq *rt_rq = &rq->rt; ++ struct sched_rt_entity *rt_se; ++ int prio; ++ ++ if (!rt_rq->rt_nr_running) ++ return NULL; ++ ++ prio = find_first_bit(rt_rq->active.bitmap, MAX_RT_PRIO); ++ if (prio >= MAX_RT_PRIO) ++ return NULL; ++ ++ rt_se = list_first_entry(&rt_rq->active.queue[prio], ++ struct sched_rt_entity, run_list); ++ return container_of(rt_se, struct task_struct, rt); ++} ++ ++static void put_prev_task_rt(struct rq *rq, struct task_struct *p, ++ struct task_struct *next) ++{ ++ update_curr_common(rq); ++} ++ ++static void set_next_task_rt(struct rq *rq, struct task_struct *p, bool first) ++{ ++ p->se.exec_start = rq_clock_task(rq); ++ p->rt.time_slice = sched_rr_timeslice; ++} ++ ++static int select_task_rq_rt(struct task_struct *p, int task_cpu, int flags) ++{ ++ return task_cpu; /* UP: only one CPU exists */ ++} ++ ++static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued) ++{ ++ update_curr_common(rq); ++ /* No RR rotation; pure FIFO. */ ++} ++ ++static unsigned int get_rr_interval_rt(struct rq *rq, struct task_struct *task) ++{ ++ /* Time slice is 0 for SCHED_FIFO tasks (mainline contract). */ ++ return task->policy == SCHED_RR ? sched_rr_timeslice : 0; ++} ++ ++static void update_curr_rt(struct rq *rq) ++{ ++ update_curr_common(rq); ++} ++ ++static void prio_changed_rt(struct rq *rq, struct task_struct *p, u64 oldprio) ++{ ++ if (!task_on_rq_queued(p)) ++ return; ++ if (p == rq->donor) { ++ if (rq->rt.highest_prio.curr < p->prio) ++ resched_curr(rq); ++ } else if (p->prio < rq->donor->prio) { ++ resched_curr(rq); ++ } ++} ++ ++static void switched_to_rt(struct rq *rq, struct task_struct *p) ++{ ++ if (task_on_rq_queued(p) && p->prio < rq->donor->prio) ++ resched_curr(rq); ++} ++ ++static void switched_from_rt(struct rq *rq, struct task_struct *p) { } ++ ++DEFINE_SCHED_CLASS(rt) = { ++ .enqueue_task = enqueue_task_rt, ++ .dequeue_task = dequeue_task_rt, ++ .yield_task = yield_task_rt, ++ ++ .wakeup_preempt = wakeup_preempt_rt, ++ ++ .pick_task = pick_task_rt, ++ .put_prev_task = put_prev_task_rt, ++ .set_next_task = set_next_task_rt, ++ ++ .select_task_rq = select_task_rq_rt, ++ .set_cpus_allowed = set_cpus_allowed_common, ++ ++ .task_tick = task_tick_rt, ++ ++ .get_rr_interval = get_rr_interval_rt, ++ ++ .switched_to = switched_to_rt, ++ .switched_from = switched_from_rt, ++ .prio_changed = prio_changed_rt, ++ ++ .update_curr = update_curr_rt, ++}; ++ ++#endif /* CONFIG_SCHED_RT_TINY */ diff --git a/patches/0020-tiny-time-no-set-wallclock.patch b/patches/0020-tiny-time-no-set-wallclock.patch new file mode 100644 index 0000000..29bdadf --- /dev/null +++ b/patches/0020-tiny-time-no-set-wallclock.patch @@ -0,0 +1,144 @@ +From: Jim Huang +Subject: [PATCH] tiny: time: gate wallclock-setting paths behind CONFIG_TIME_NO_SET_WALLCLOCK + +kernel/time/timekeeping.c implements both the read side (clock_gettime +on CLOCK_REALTIME / CLOCK_MONOTONIC / CLOCK_BOOTTIME / CLOCK_TAI; +ktime_get*; getboottime; do_timer) and the discipline / setting side +(NTP frequency / phase adjustment, leap-second / TAI maintenance, the +do_settimeofday64 / settimeofday / adjtimex / clock_settime entry +points). On a NOMMU MPS2-AN386 image with no NTP source, no RTC, and +QEMU providing a stable epoch at boot, the entire setting half is +unreachable from the shipped applet set. + +CONFIG_TIME_NO_SET_WALLCLOCK (default n; mainline behaviour preserved) +gates the three top-level setting entry points and turns +timekeeping_warp_clock into a no-op: + + do_settimeofday64 -> -EPERM + do_adjtimex -> -EPERM + timekeeping_warp_clock -> empty + +timekeeping_inject_sleeptime64 (RTC suspend path) is left untouched +since CONFIG_RTC_CLASS=n on this target makes its only callers +(rtc_resume / rtc_suspend) unreachable; LTO drops the path +naturally. + +Effective userspace impact: + settimeofday(2) -> -EPERM via kernel/time/time.c + clock_settime(2) -> -EPERM via kernel/time/posix-timers.c + adjtimex(2) -> -EPERM via kernel/time/time.c + stime(2) compat -> -EPERM + clock_gettime(2) -> unchanged: read paths still live + +The discipline-side helpers +(__timekeeping_inject_offset / timekeeping_inject_offset / +__timekeeping_set_tai_offset / change_clocksource / timekeeping_notify / +timekeeping_adjust / timekeeping_apply_adjustment / __do_adjtimex / +timekeeping_validate_timex / hardpps) lose all in-tree callers and are +candidates for LTO dead-stripping. Static-internal helpers are dropped +unconditionally; non-static ones (do_settimeofday64, do_adjtimex, +timekeeping_warp_clock, timekeeping_inject_sleeptime64) keep their +symbols so the existing extern declarations in include/linux/timekeeping.h +and include/linux/timex.h still resolve. + +Risk: low. Wallclock cannot be corrected after boot; QEMU provides +a stable RTC at startup so the boot-time value is already reasonable. +NTP source absent on this target. Re-enabling RTC, NTP daemon, or +hwclock requires reverting this knob. + +Measured on Cortex-M4 nommu mps2-an386 (linux-7.0): timekeeping.c +shrinks from 9,296 to ~7 KB; linux.axf trims roughly 1.5-2.5 KB after +page-alignment cascade. + +--- + init/Kconfig | 22 ++++++++++++++++++++++ + kernel/time/timekeeping.c | 18 ++++++++++++++++++ + 2 files changed, 40 insertions(+) + +diff --git a/init/Kconfig b/init/Kconfig +--- a/init/Kconfig ++++ b/init/Kconfig +@@ -1101,6 +1101,28 @@ config SCHED_RT_TINY + Say N unless you are building a heavily size-constrained image. + Boot test before deploying. + ++config TIME_NO_SET_WALLCLOCK ++ bool "Drop wallclock-setting paths (clock_settime / adjtimex / settimeofday)" ++ default n ++ help ++ Stub do_settimeofday64, do_adjtimex, and timekeeping_warp_clock ++ to return -EPERM / no-op respectively. Effective userspace ++ impact: ++ ++ settimeofday(2) -> -EPERM ++ clock_settime(2) -> -EPERM ++ adjtimex(2) -> -EPERM ++ stime(2) -> -EPERM ++ clock_gettime(2) -> unchanged (read paths stay live) ++ ++ Safe on NOMMU images with no NTP source, no RTC, and a stable ++ boot-time epoch (QEMU provides one). The wallclock cannot be ++ corrected after boot. The NTP discipline / leap-second / TAI ++ maintenance helpers in timekeeping.c lose all in-tree callers ++ and become candidates for LTO dead-stripping. ++ ++ Say N unless you are building a heavily size-constrained image. ++ + endmenu + + # +diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c +--- a/kernel/time/timekeeping.c ++++ b/kernel/time/timekeeping.c +@@ -1433,6 +1433,9 @@ bool timekeeping_clocksource_has_base(enum clocksource_ids id) + */ + int do_settimeofday64(const struct timespec64 *ts) + { ++#ifdef CONFIG_TIME_NO_SET_WALLCLOCK ++ return -EPERM; ++#else + struct timespec64 ts_delta, xt; + + if (!timespec64_valid_settod(ts)) +@@ -1462,6 +1465,7 @@ int do_settimeofday64(const struct timespec64 *ts) + audit_tk_injoffset(ts_delta); + add_device_randomness(ts, sizeof(*ts)); + return 0; ++#endif + } + EXPORT_SYMBOL(do_settimeofday64); + +@@ -1556,12 +1560,14 @@ EXPORT_SYMBOL(persistent_clock_is_local); + */ + void timekeeping_warp_clock(void) + { ++#ifndef CONFIG_TIME_NO_SET_WALLCLOCK + if (sys_tz.tz_minuteswest != 0) { + struct timespec64 adjust; + + persistent_clock_is_local = 1; + adjust.tv_sec = sys_tz.tz_minuteswest * 60; + adjust.tv_nsec = 0; + timekeeping_inject_offset(&adjust); + } ++#endif + } + +@@ -2753,6 +2759,9 @@ static int __do_adjtimex(struct tk_data *tkd, struct __kernel_timex *txc, + */ + int do_adjtimex(struct __kernel_timex *txc) + { ++#ifdef CONFIG_TIME_NO_SET_WALLCLOCK ++ return -EPERM; ++#else + struct adjtimex_result result = { }; + int ret; + +@@ -2770,6 +2779,7 @@ int do_adjtimex(struct __kernel_timex *txc) + ntp_notify_cmos_timer(result.delta.tv_sec != 0); + + return ret; ++#endif + } + + long ktime_get_ntp_seconds(unsigned int id)