Skip to content

ci: fix aarch64 firmware crash with multiple PCI segments#10

Open
saravan2 wants to merge 1 commit intocloud-hypervisor:chfrom
saravan2:release
Open

ci: fix aarch64 firmware crash with multiple PCI segments#10
saravan2 wants to merge 1 commit intocloud-hypervisor:chfrom
saravan2:release

Conversation

@saravan2
Copy link
Copy Markdown
Member

@saravan2 saravan2 commented Apr 7, 2026

  • The aarch64 UEFI firmware uses FdtPciHostBridgeLib to discover PCI host bridges from the device tree. This library asserts that only one pci-host-ecam-generic FDT node exists. Cloud-hypervisor generates one node per PCI segment, so when we launch a guest with num_pci_segments > 1 the DEBUG firmware hits the assert and terminates. This assert is aarch64 specific since amd64 does not use FDT for PCI discovery.

  • FdtPciHostBridgeLib only enumerates the first PCI host bridge (segment 0) regardless of build type.
    Segments 1-95 are not visible to UEFI. This is not a functional problem because:

    • Cloud-hypervisor provides ACPI tables (MCFG, DSDT) describing all segments directly to the guest via CloudHvAcpiPlatformDxe
    • The Linux kernel re-enumerates PCI from the MCFG table and assigns BARs independently
    • Boot devices (virtio-blk, virtio-net) reside on segment 0 which UEFI does enumerate
  • Switch both amd64 and aarch64 builds from DEBUG to RELEASE.

Before

ARM64 integration test failure

# Test Result
118 test_generic_vhost_user_multi_segment_hotplug TRY 4 FAIL
119 test_net_multi_segment_hotplug TRY 4 FAIL
120 test_pci_multiple_segments_numa_node TRY 4 FAIL
121 test_pmem_multi_segment_hotplug TRY 4 FAIL
122 test_virtio_fs_multi_segment_hotplug TRY 4 FAIL

The CLOUDHV_EFI.fd from ch-13b4963ec4 release hangs during boot specifically when --platform num_pci_segments=96 is set.

Validation

Produced aarch64 firmware on my fork : https://github.com/saravan2/edk2/releases/tag/release-build

  • aarch64 integration test boot with 96 pci segments
saravanand@vaeq-cu2a-r118-lab-staging-hv-04:~/test-edk2$ sha256sum CLOUDHV_EFI.fd
c0e9d6a905160e9a15b10852ce75462cf1e5554615f55c914099f546f165c338  CLOUDHV_EFI.fd
saravanand@vaeq-cu2a-r118-lab-staging-hv-04:~/test-edk2$ cat launch-integ-pci.sh
  sudo /home/saravanand/cloud-hypervisor/target/aarch64-unknown-linux-musl/release/cloud-hypervisor \
    --cpus boot=1 \
    --memory size=512M,hotplug_size=2048M,shared=on \
    --kernel /home/saravanand/test-edk2/CLOUDHV_EFI.fd \
    --cmdline "root=/dev/vda1 console-hvc0 rw systemd.journal.forward_to_console=1" \
    --disk path=/home/saravanand/workloads/jammy-server-cloudimg-arm64-custom-20220329-0.raw path=/home/saravanand/test-generic-initiator/seed.img \
    --net tap=,mac=,ip=192.168.249.1,mask=255.255.255.128 \
    --platform num_pci_segments=96 \
    --serial tty \
    --console off
    
saravanand@vaeq-cu2a-r118-lab-staging-hv-04:~/test-edk2$ bash launch-integ-pci.sh
cloud-hypervisor:   0.001702s: <vmm> WARN:vmm/src/device_manager.rs:2679 -- No image_type specified - detected as raw. Configuration updated to persist type across reboots and migrations
cloud-hypervisor:   0.001732s: <vmm> WARN:vmm/src/device_manager.rs:2685 -- Autodetected raw image type. Disabling sector 0 writes.
cloud-hypervisor:   0.001862s: <vmm> WARN:vmm/src/device_manager.rs:2679 -- No image_type specified - detected as raw. Configuration updated to persist type across reboots and migrations
cloud-hypervisor:   0.001884s: <vmm> WARN:vmm/src/device_manager.rs:2685 -- Autodetected raw image type. Disabling sector 0 writes.
UEFI firmware (version  built at 22:26:02 on Apr  7 2026)







Press ESCAPE for boot options ....EFI stub: Booting Linux Kernel...
EFI stub: Generating empty DTB
EFI stub: Loaded initrd from LINUX_EFI_INITRD_MEDIA_GUID device path
EFI stub: Exiting boot services...
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd4f0]
[    0.000000] Linux version 5.15.0-23-generic (buildd@bos02-arm64-028) (gcc (Ubuntu 11.2.0-18ubuntu1) 11.2.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #23-Ubuntu SMP Fri Mar 11 14:57:40 UTC 2022 (Ubuntu 5.15.0-23.23-generic 5.15.27)
[    0.000000] efi: EFI v2.70 by EDK II
[    0.000000] efi: ACPI 2.0=0x5f3f9018 MEMATTR=0x5ebf1018 MOKvar=0x5ec1b000 RNG=0x5fb4d018 MEMRESERVE=0x5ec1ff18
[    0.000000] efi: seeding entropy pool
[    0.000000] random: fast init done
[    0.000000] efi: memattr: Unexpected EFI Memory Attributes table version 2
[    0.000000] secureboot: Secure boot disabled
[    0.000000] ACPI: Early table checksum verification disabled
...
# Inside the guest 96 pci segments were visible
ubuntu@ubuntu:~$ lspci -s '*:00:00.0' | wc -l
96
ubuntu@ubuntu:~$ lspci -D -s '*:00:00.0'
0000:00:00.0 Host bridge: Intel Corporation Device 0d57
0001:00:00.0 Host bridge: Intel Corporation Device 0d57
...
005e:00:00.0 Host bridge: Intel Corporation Device 0d57
005f:00:00.0 Host bridge: Intel Corporation Device 0d57
ubuntu@ubuntu:~$ sudo poweroff
ubuntu@ubuntu:~$ [   88.818776] reboot: Power down

@saravan2 saravan2 marked this pull request as ready for review April 7, 2026 22:42
@saravan2
Copy link
Copy Markdown
Member Author

saravan2 commented Apr 7, 2026

@rbradford, @likebreath

I did not see any benefit for continuing with debug build for amd64 firmware.

I switched to release build for amd64 firmware so that it is consistent with our aarch64 firmware where DEBUG build has to be avoided.

When this PR gets merged, we would have to trigger a new release.

The aarch64 UEFI firmware uses FdtPciHostBridgeLib
to discover PCI host bridges from the device tree.
This library asserts that only one pci-host-ecam-
generic FDT node exists. Cloud-hypervisor generates
one node per PCI segment, so when we launch a guest
with num_pci_segments > 1 the DEBUG firmware hits
the assert and terminates. This assert is aarch64
specific since amd64 does not use FDT for PCI
discovery.

FdtPciHostBridgeLib only enumerates the first PCI
host bridge (segment 0) regardless of build type.
Segments 1-95 are not visible to UEFI. This is not
a functional problem because:

  - Cloud-hypervisor provides ACPI tables (MCFG,
    DSDT) describing all segments directly to the
    guest via CloudHvAcpiPlatformDxe
  - The Linux kernel re-enumerates PCI from the
    MCFG table and assigns BARs independently
  - Boot devices (virtio-blk, virtio-net) reside
    on segment 0 which UEFI does enumerate

Switch both amd64 and aarch64 builds from DEBUG to
RELEASE.

Signed-off-by: Saravanan D <saravanand@crusoe.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant