Skip to content

Support madvise, msync, and mremap on high-VA mmap regions#83

Merged
jserv merged 1 commit into
sysprog21:mainfrom
Max042004:fix-madvise-high-va
Jul 2, 2026
Merged

Support madvise, msync, and mremap on high-VA mmap regions#83
jserv merged 1 commit into
sysprog21:mainfrom
Max042004:fix-madvise-high-va

Conversation

@Max042004

@Max042004 Max042004 commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator

Problem

Three memory syscalls -- sys_madvise, sys_msync, and sys_mremap -- plus
the file-backed MAP_SHARED path in sys_mmap_high_va were all
primary-window-only. They computed off = addr - ipa_base and reached into
host_base + off, which only resolves for identity regions (gpa_base == start). High-VA mmap regions -- Rosetta's own slab/JIT and guest mmap(NULL)
placements -- back their pages at gpa_base (a named mapping or overflow
segment), so these paths rejected the range with -ENOMEM/-EFAULT or wrote to
the wrong host address.

Two concrete failures motivate this:

  • x86_64 Node.js / V8. V8's page allocator decommits guard/code pages with
    mprotect(PROT_NONE) + madvise(MADV_DONTNEED) and CHECK_EQ(0, ret)s the
    madvise return, so the spurious -ENOMEM aborted Node the moment its JIT
    initialized:

    # Fatal error in , line 0
    # Check failed: 0 == ret.
    

    --verbose pins it to madvise(advice=4 MADV_DONTNEED) -> -12 (ENOMEM) on a
    page in the high-VA window (mmap(NULL) under Rosetta lands at e.g.
    0x7fffff7fd000).

  • apt (issue msync() returns ENOMEM (errno 12) for high-VA guest mappings created by Rosetta #108). apt memory-maps its package-list cache with MAP_SHARED
    and msyncs it; under an x86_64 guest that mapping is high-VA, so sys_msync
    returned -ENOMEM and apt failed with
    E: Unable to synchronize mmap - msync (12: Cannot allocate memory). The
    mapping could not even be created, because sys_mmap_high_va refused
    file-backed MAP_SHARED with -ENODEV.

Fix

Route every admission and data-movement path through the region tracker and
host_ptr_for_gpa(gpa_base + ...) so primary and high-VA regions act on their
real backing. Identity regions have gpa_base == start, so this collapses to
host_base + off and primary behaviour is byte-for-byte unchanged.

  • madvise MADV_DONTNEED accepts high-VA ranges the tracker records as
    mapped and zeroes / restores them on their gpa_base backing (drops the
    previous zero-fill-only scope-out for high-VA file-backed pages).
  • msync admission drops the guest_size bound (the existing coverage loop
    validates both windows), and sync_shared_aliases_range /
    refresh_shared_region_range resolve the guest bytes through gpa_base.
  • sys_mmap_high_va accepts file-backed MAP_SHARED as a snapshot-style
    shared region (pread on map, msync writes dirty bytes back), so the high-VA
    shared mappings msync operates on can be created.
  • mremap admits high-VA sources and resolves the source read/zero through
    gpa_base in the shrink, MREMAP_FIXED, and MREMAP_MAYMOVE paths. The
    destination stays in the primary window (find_free_gap /
    mremap_extend_range), so an mremap(MAYMOVE) of a high-VA region relocates
    it there with its contents intact -- no high-VA destination backing or new
    Stage-2 machinery is needed.

Tests

Vendored x86_64 static ELFs run through elfuse + Rosetta (rebuild recipe in
tests/fixtures/rosetta/README.md):

  • x86_64-rosetta-madvise -- MADV_DONTNEED on writable / PROT_NONE guard /
    multi-page high-VA ranges (the V8 decommit pattern).
  • x86_64-rosetta-msync -- high-VA MAP_SHARED write-back
    (single/multi-page/MS_ASYNC), verified against the backing file.
  • x86_64-rosetta-mremap -- MREMAP_MAYMOVE grow (contents preserved) and
    in-place shrink.

Wired as make test-rosetta-{madvise,msync,mremap} (and test-rosetta-all)
plus the matching suites in tests/test-matrix.sh.

The aarch64 unit tests test-madvise / test-msync / test-mremap /
test-mremap-infra cover the unchanged primary-window path. An end-to-end
MAP_SHARED store scenario (write + msync + cross-mapping refresh + mremap
grow + madvise) passes with this change and fails (-ENODEV/-ENOMEM) on the
pre-fix baseline.

Notes

  • macOS CI runners cannot run guest tests (no HVF entitlement), so the Rosetta
    suites were verified locally on Apple silicon; they self-skip (exit 77) where
    the translator or timeout(1) is unavailable.
  • The vendored x86_64 fixtures are not built in-tree (the Makefile targets
    aarch64 hosts); rebuild them out of tree per the README when the sources
    change.
  • For high-VA file-backed MAP_SHARED served through the FUSE sysroot layer,
    the backing is materialized read-only, so msync write-back to the original
    sysroot file is best-effort; real host-fd backings (e.g. /dev/shm) write
    back exactly. This matches how the primary window already treats such
    mappings.

Fixes #108.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 7 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="tests/test-rosetta-madvise.sh">

<violation number="1" location="tests/test-rosetta-madvise.sh:66">
P2: Script exits 0 even when tests fail — missing `if [ "$fail" -gt 0 ]; then exit 1; fi` after `report_summary`. This masks test failures in standalone runs and weakens the matrix runner's belt-and-suspenders `|| rc=1` check.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread src/syscall/mem.c
printf '%s\n' "$madv_out" >&2
fi

report_summary "$total"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Script exits 0 even when tests fail — missing if [ "$fail" -gt 0 ]; then exit 1; fi after report_summary. This masks test failures in standalone runs and weakens the matrix runner's belt-and-suspenders || rc=1 check.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At tests/test-rosetta-madvise.sh, line 66:

<comment>Script exits 0 even when tests fail — missing `if [ "$fail" -gt 0 ]; then exit 1; fi` after `report_summary`. This masks test failures in standalone runs and weakens the matrix runner's belt-and-suspenders `|| rc=1` check.</comment>

<file context>
@@ -0,0 +1,66 @@
+    printf '%s\n' "$madv_out" >&2
+fi
+
+report_summary "$total"
</file context>

Comment thread src/syscall/mem.c Outdated
Comment thread src/syscall/mem.c
jserv

This comment was marked as duplicate.

jserv

This comment was marked as duplicate.

@jserv jserv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase latest main branch and resolve conflicts and refine per review.

@Max042004 Max042004 force-pushed the fix-madvise-high-va branch 3 times, most recently from 6c44a18 to b348ca7 Compare July 2, 2026 09:53
@Max042004 Max042004 changed the title Fix madvise(MADV_DONTNEED) on high-VA mmap regions Support madvise, msync, and mremap on high-VA mmap regions Jul 2, 2026

@jserv jserv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforce rules described in https://cbea.ms/git-commit/ carefully.

sys_madvise, sys_msync, and sys_mremap were all primary-window-only:
they computed off = addr - ipa_base and reached into host_base + off,
which only resolves for identity regions (gpa_base == start). High-VA
mmap regions -- Rosetta's slab/JIT and guest mmap(NULL) placements --
back their pages at gpa_base (a named mapping or overflow segment), so
those paths rejected the range with -ENOMEM/-EFAULT or wrote to the
wrong host address. V8's page allocator decommits guard/code pages with
mprotect(PROT_NONE)+madvise(MADV_DONTNEED) and CHECK_EQ(0, ret)s the
result, so the spurious ENOMEM aborted x86_64 Node.js the moment its JIT
initialized; apt's high-VA MAP_SHARED package-list cache hit the same
wall in sys_msync (issue sysprog21#108).

Route every admission and data-movement path through the region tracker
and host_ptr_for_gpa(gpa_base + ...) so primary and high-VA regions act
on their real backing. Identity regions have gpa_base == start, so this
collapses to host_base + off and primary behaviour is byte-for-byte
unchanged:

- madvise MADV_DONTNEED accepts high-VA ranges the tracker records as
  mapped and zeroes / restores them on their gpa_base backing.
- msync's admission drops the guest_size bound (the coverage loop
  validates both windows) and sync_shared_aliases_range /
  refresh_shared_region_range resolve the guest bytes through gpa_base.
- sys_mmap_high_va accepts file-backed MAP_SHARED as a snapshot-style
  shared region (pread on map, msync writes dirty bytes back), so the
  high-VA shared mappings msync operates on can be created (sysprog21#108).
- mremap admits high-VA sources and resolves the source read/zero
  through gpa_base in the shrink, MREMAP_FIXED, and MREMAP_MAYMOVE
  paths; the destination stays in the primary window (find_free_gap /
  mremap_extend_range), so an mremap(MAYMOVE) of a high-VA region
  relocates it there with its contents intact -- no high-VA destination
  backing or new Stage-2 machinery is needed.

Regression tests (vendored x86_64 static ELFs run through elfuse +
Rosetta): x86_64-rosetta-madvise (MADV_DONTNEED on writable / PROT_NONE
guard / multi-page high-VA ranges -- the V8 decommit pattern),
x86_64-rosetta-msync (high-VA MAP_SHARED write-back,
single/multi-page/MS_ASYNC), and x86_64-rosetta-mremap (MREMAP_MAYMOVE
grow with contents preserved and in-place shrink). Wired as make
test-rosetta-{madvise,msync,mremap} (and test-rosetta-all) plus the
matching suites in tests/test-matrix.sh, with the rebuild recipe in
tests/fixtures/rosetta/README.md.

The aarch64 unit tests test-madvise / test-msync / test-mremap /
test-mremap-infra (primary-path coverage) pass; the new Rosetta suites
pass. An end-to-end MAP_SHARED store scenario (write + msync +
cross-mapping refresh + mremap grow + madvise) passes on this branch and
fails (ENODEV/ENOMEM) on the pre-fix baseline.
@Max042004 Max042004 force-pushed the fix-madvise-high-va branch from b348ca7 to 115c659 Compare July 2, 2026 10:36
@jserv jserv merged commit e8fa8cc into sysprog21:main Jul 2, 2026
4 checks passed
@jserv

jserv commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Thank @Max042004 for contributing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

msync() returns ENOMEM (errno 12) for high-VA guest mappings created by Rosetta

2 participants