Skip to content

Add built-in Resource Visibility Virtualization#77

Open
maazm7d wants to merge 5 commits intoravindu644:mainfrom
maazm7d:main
Open

Add built-in Resource Visibility Virtualization#77
maazm7d wants to merge 5 commits intoravindu644:mainfrom
maazm7d:main

Conversation

@maazm7d
Copy link
Copy Markdown
Contributor

@maazm7d maazm7d commented Apr 20, 2026

Add built-in Resource Visibility Virtualization

Summary

Implements a zero-dependency virtualization layer that overrides /proc/meminfo, /proc/cpuinfo, /proc/stat, /proc/uptime, and /proc/loadavg inside containers. Applications now see the resource limits enforced by Cgroups instead of the host's totals, providing LXCFS-like functionality without the FUSE dependency.


Motivation

Runtimes like Java, Go, and Node.js read /proc directly to size thread pools, heap limits, and worker counts. Without virtualization, a container restricted to 512MB and 1 CPU would still see the host's 32GB and 16 cores, leading to OOM kills, CPU throttling, and degraded performance.


Changes

New files

  • src/virtualize.c — Core virtualization logic for all five /proc files
  • src/virtualize.h — Public API declarations
  • tests/verify_virtualization.sh — Functional verification script

Modified files

  • src/boot.c — Calls ds_virtualize_init() after pivot_root, guarded by /proc mountpoint check
  • src/container.c — Wires up monitor loop with signalfd/poll for periodic updates; records start_time before fork; stores PID namespace inode for recycling protection
  • src/droidspace.h — Adds virtualization, start_time, and ns_inode fields to ds_config
  • src/main.c — Adds --virtualization CLI flag (OPT_VIRTUALIZATION = 268)
  • src/config.c — Persists virtualization field in container config
  • Makefile — Adds virtualize.c to the source list

Implementation Details

File Strategy
/proc/meminfo Overrides MemTotal/Free/Available from cgroup limits; scales other components by ratio; zeroes swap; integrates cgroup v2 memory.stat for AnonPages, Cached, Slab
/proc/cpuinfo Truncates processor entries to cpu_quota / cpu_period
/proc/stat Two-pass recompute: sums per-CPU counters for visible CPUs, rewrites aggregate cpu line
/proc/uptime Derives uptime from cfg->start_time (CLOCK_MONOTONIC); scales idle time by CPU ratio
/proc/loadavg Scales load averages and runnable/total counts by container_cpus / host_cpus; masks host last-PID

Reliability

  • Files are written in-place (O_WRONLY + ftruncate) to preserve bind-mount inodes — rename-based atomic updates would silently break the bind mounts
  • PID recycling race is prevented by comparing the container's PID namespace inode (/proc/<pid>/ns/pid) before every update
  • Cgroup path discovery tries three common layouts to support varied distributions
  • Monitor loop uses signalfd + poll(500ms) instead of sleep for clean signal handling

Opt-in

The feature is disabled by default. Enable per-container via CLI or config:

--virtualization
# or in container config:
virtualization=1

maazm7d added 3 commits April 19, 2026 03:04
…4#73)

* Implement Cgroup-based Resource Limits (Memory, CPU, PIDs) 

This patch introduces comprehensive resource management capabilities to
Droidspaces, allowing users to restrict container consumption of
Memory, CPU, and PIDs.

Key changes:
- Added --memory, --cpus, and --pids-limit CLI options with validation.
- Implemented Cgroup V1 and V2 support for resource governance.
- Automated Cgroup V2 controller activation via subtree_control.
- Added configuration persistence for resource limits in container.config.
- Enhanced 'info' command with real-time resource usage statistics.
- Improved failure handling and logging for cgroup operations.

Verified with integration and stress testing on Linux x86_64.

* cgroup: implement robust partial controller support

Enhance Cgroup handling to gracefully manage environments with partial
controller support (common on Android). Key changes:

- Implement `ds_cgroup_is_supported` to explicitly detect controller
  availability in both V1 and V2 hierarchies.
- Refine V2 bootstrap to only attempt enabling supported controllers in
  `cgroup.subtree_control`.
- Update `ds_cgroup_apply_limits` to skip unsupported controllers with a
  clear warning instead of failing or silently succeeding.
- Add `ds_cgroup_get_limits` to read actual enforced limits from the
  filesystem.
- Update `info` command to display both requested and enforced limits,
  marking unsupported ones as "(not enforced)".

This ensures predictable and transparent behavior across diverse kernel
configurations.

---------

Signed-off-by: maazm7d <maazm7d@gmail.com>
…ive)

Add built-in resource visibility virtualization

Implement a zero-dependency virtualization layer for /proc/meminfo,
/proc/cpuinfo, /proc/stat, /proc/uptime, and /proc/loadavg. This ensures
that containerized applications see the resource limits (RAM, CPUs) and
load metrics enforced by Cgroups instead of the host's total resources.

Key features:
- src/virtualize.c and src/virtualize.h for scaled resource data generation.
- Consistent /proc/meminfo via component scaling and Cgroup v2 integration.
- Accurate CPU virtualization with aggregate recomputation in /proc/stat.
- Scaled /proc/loadavg with host PID masking and scaled runnable/total.
- True per-container uptime and scaled idle time in /proc/uptime.
- In-place file updates preserving bind-mount inodes.
- Robust PID recycling protection via PID namespace inode verification.
- High-performance monitor loop using signalfd and poll (500ms heartbeat).
- Safe dynamic memory allocation for variable-sized system files.
- Resilient cgroup path discovery supporting various distributions.
- --virtualization flag integrated into CLI and configuration.
- Container boot sets up tmpfs and bind-mounts virtual proc files.
- Functional verification script in tests/verify_virtualization.sh.

This provides LXCFS-like functionality in a <260KB static binary,
significantly improving compatibility for Java, Go, and Node.js.
maazm7d added 2 commits April 20, 2026 23:26
- Enhanced container discovery logic to scan the 'Containers/' directory,
  allowing the runtime to detect and show stopped containers.
- Improved auto-resolution of container names to prioritize running systems
  but fall back to installed ones.
- Fixed a monitor warning by skipping the network handshake in host mode.
- Robustified memory virtualization by adding fallbacks for missing cgroup
  limits and preventing division by zero during ratio calculation.
- Capped virtualized CPU counts at the actual host online processor count.
- Standardized /proc/meminfo formatting for better compatibility with
  strict parsers.
- Modified ds_config_load_by_name to correctly return an error if the
  metadata file is missing.
This commit addresses several critical issues 
- Fixed `nproc` core count reporting by virtualizing CPU sysfs entries:
  `/sys/devices/system/cpu/{online,possible,present}`.
- Refined `/proc/meminfo` virtualization by capping all reported memory
  fields at the container's MemTotal, fixing usage detection in fastfetch.
- Modified `sanitize_container_name` to allow the dot `.` character,
  preserving OS version numbers (e.g., "Ubuntu 24.04" -> "Ubuntu-24.04").
- Enforced consistent container name sanitization early in the lifecycle
  to fix "dead" internal logging and metadata path inconsistencies.
- Resolved a compiler error in `src/container.c` regarding unused
  return value of `read` from signalfd.
@ravindu644 ravindu644 force-pushed the main branch 4 times, most recently from a3a3800 to d70d165 Compare April 22, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant