Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions src/vmm/src/devices/virtio/block/virtio/device.rs
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,7 @@ pub struct ConfigSpace {
pub max_write_zeroes_seg: u32, // offset 52
pub write_zeroes_may_unmap: u8, // offset 56
pub(crate) _unused1: [u8; 3], // offset 57 (spec field — virtio_blk_config.unused1)
pub(crate) _pad: [u8; 4], // offset 60 (Rust alignment padding to 64; spec ends at 60)
pub(crate) _pad: [u8; 4], // offset 60 (Rust alignment padding to 64; spec ends at 60)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like an accidental change

}
const _: () = assert!(std::mem::size_of::<ConfigSpace>() == 64);
// Compile-time guards against accidental layout drift. The byte offsets here
Expand Down Expand Up @@ -722,9 +722,10 @@ impl VirtioBlock {
}

fn drain_and_flush(&mut self, discard: bool) {
if let Err(err) = self.disk.file_engine.drain_and_flush(discard) {
error!("Failed to drain ops and flush block data: {:?}", err);
}
self.disk
.file_engine
.drain_and_flush(discard)
.expect("virtio-block: failed to drain ops and flush block data");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can add a comment on why we crash here

}

/// Prepare device for being snapshotted.
Expand Down
53 changes: 33 additions & 20 deletions src/vmm/src/io_uring/queue/submission.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
// SPDX-License-Identifier: Apache-2.0

use std::fmt::Debug;
use std::io::Error as IOError;
use std::io::{Error as IOError, ErrorKind};
use std::mem;
use std::num::Wrapping;
use std::os::unix::io::RawFd;
Expand Down Expand Up @@ -130,26 +130,39 @@ impl SubmissionQueue {
if min_complete > 0 {
flags |= generated::IORING_ENTER_GETEVENTS;
}
// SAFETY: Safe because values are valid and we check the return value.
let submitted = SyscallReturnCode(unsafe {
libc::syscall(
libc::SYS_io_uring_enter,
self.io_uring_fd,
self.to_submit,
min_complete,
flags,
std::ptr::null::<libc::sigset_t>(),
)
})
.into_result()?;
// It's safe to convert to u32 since the syscall didn't return an error.
let submitted = u32::try_from(submitted).unwrap();

// This is safe since submitted <= self.to_submit. However we use a saturating_sub
// for extra safety.
self.to_submit = self.to_submit.saturating_sub(submitted);

Ok(submitted)
// The number of retries is completely arbitrary here. I assume that this
// will happen rarely and that if it happens subsequent retry will immediately
// succeed. If we fall in a storm of interrupts something else is probably wrong
// so let the consumer know.
let mut eintr_retries = 3;
loop {
// SAFETY: Safe because values are valid and we check the return value.
let ret = SyscallReturnCode(unsafe {
libc::syscall(
libc::SYS_io_uring_enter,
self.io_uring_fd,
self.to_submit,
min_complete,
flags,
std::ptr::null::<libc::sigset_t>(),
)
})
.into_result();
match ret {
Ok(num) => {
// It's safe to convert to u32 since the syscall didn't return an error.
let submitted = u32::try_from(num).unwrap();
self.to_submit = self.to_submit.saturating_sub(submitted);
return Ok(submitted);
}
Err(err) if err.kind() == ErrorKind::Interrupted && eintr_retries > 0 => {
eintr_retries -= 1;
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude suggests we clear to_submit on retrials:

      // All SQEs were submitted before the wait; only retry the wait.
      self.to_submit = 0;
  When io_uring_enter is called with IORING_ENTER_GETEVENTS (min_complete > 0), the kernel follows a strict two-phase
  sequence:
  1. Flush the SQ ring — consumes exactly to_submit entries, advancing the SQ head.
  2. Wait for completions — blocks until min_complete CQEs are ready.
  
  EINTR can only fire in phase 2 (the wait), so all to_submit entries are already submitted and the SQ ring is empty (head == tail) by the time the syscall
  returns EINTR.

  On retry the code calls io_uring_enter(fd, self.to_submit, min_complete, flags) with the original to_submit value. The kernel finds 0 entries in the ring,
  submits 0, and waits again — returning 0. The update self.to_submit = self.to_submit.saturating_sub(0) leaves to_submit permanently inflated by the number of
   already-submitted-but-not-counted entries.
  
  Subsequent kick_submission_queue calls (which call submit(0) with min_complete=0) will pass a larger-than-correct nr to io_uring_enter. The kernel caps at
  ring occupancy so no duplicate submissions occur, but the to_submit accounting never converges back to zero until the inflated value happens to be "used up"
  by later real submissions — which it never will be if to_submit started at 5 and 5 real entries keep being added and drained.

I'm not sure how to verify that.

}
Err(err) => return Err(SQueueError::from(err)),
}
}
}

fn mmap(
Expand Down
Loading