Add hio_sendfile() API for zero-copy file transfer using sendfile(2)#821
Add hio_sendfile() API for zero-copy file transfer using sendfile(2)#821
Conversation
Implements: - Platform-abstracted sendfile in nio.c (Linux sendfile, macOS/FreeBSD sendfile, generic pread+write fallback) - hio_sendfile() public API in hloop.h with write queue integration - Sendfile state tracking in hio_t (sendfile_fd, sendfile_offset, sendfile_remain) - SSL fallback using pread + hio_write - Windows fallback in overlapio.c - Integration into HTTP large file handler for unlimited rate mode - hio_write_bufsize accounts for sendfile remaining bytes Co-authored-by: ithewei <26049660+ithewei@users.noreply.github.com> Agent-Logs-Url: https://github.com/ithewei/libhv/sessions/e59504ed-e745-410d-a8ab-e99946d80a3c
…mprove error messages - Add in_fd validation in hio_sendfile() for both nio.c and overlapio.c - Fix generic sendfile fallback to handle partial writes correctly - Check sendfile_remain under mutex before unlocking in nio_write - Improve SSL fallback with better EOF/error distinction - Add fileno() validation in HTTP handler - Improve Windows fallback with separate seek/read error handling Co-authored-by: ithewei <26049660+ithewei@users.noreply.github.com> Agent-Logs-Url: https://github.com/ithewei/libhv/sessions/e59504ed-e745-410d-a8ab-e99946d80a3c
There was a problem hiding this comment.
Pull request overview
This PR introduces a new hio_sendfile() API to support zero-copy (or near zero-copy) file transfer at the event-loop I/O layer, and wires it into the HTTP server’s large-file response path to avoid userspace buffering for unlimited-rate transfers.
Changes:
- Add
hio_sendfile(hio_t*, int in_fd, off_t offset, size_t length)public API and track sendfile state inhio_t(sendfile_fd/offset/remain). - Integrate sendfile into the non-IOCP (
nio.c) write path and adjust write-completion accounting viahio_write_bufsize()includingsendfile_remain. - Update
HttpHandler::defaultLargeFileHandlerto usehio_sendfile()whenlimit_rate < 0.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| http/server/HttpHandler.cpp | Uses hio_sendfile() for unlimited large-file responses and ends the response on write completion. |
| event/nio.c | Implements sendfile state machine for non-IOCP backends and integrates it into HV_WRITE handling. |
| event/overlapio.c | Adds a Windows (IOCP) hio_sendfile() fallback implemented via read + hio_write(). |
| event/hloop.h | Declares the new hio_sendfile() API and documents expected behavior. |
| event/hevent.h | Extends hio_t with sendfile-related state fields. |
| event/hevent.c | Initializes sendfile state and updates hio_write_bufsize() to include pending sendfile bytes. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // @param offset: starting offset in the file | ||
| // @param length: number of bytes to send | ||
| // @return 0 on success (async operation started), -1 on error | ||
| // hwrite_cb is called as data is sent. When complete, write_queue is empty. |
There was a problem hiding this comment.
The hio_sendfile doc comment is misleading/incomplete for consumers of hwrite_cb: sendfile bytes are not part of the write queue, so completion is not just “write_queue is empty”; it also requires sendfile_remain == 0. Also, the buf pointer passed to hwrite_cb can be NULL for sendfile writes (__write_cb(io, NULL, nsent)), so callbacks must not assume buf != NULL. Please clarify these semantics in the API comment.
| // hwrite_cb is called as data is sent. When complete, write_queue is empty. | |
| // hwrite_cb is called as data is sent. For sendfile transfers, the bytes being | |
| // sent are not queued in write_queue; completion is when write_queue is empty | |
| // AND the internal sendfile_remain counter has reached 0. | |
| // NOTE: For sendfile writes, hwrite_cb may be invoked with buf == NULL | |
| // (e.g. __write_cb(io, NULL, nsent)); callbacks must not assume buf != NULL | |
| // and should rely on the length argument (e.g. nwrite) rather than buf contents. |
| hrecursive_mutex_unlock(&io->write_mutex); | ||
| __write_cb(io, NULL, nsent); | ||
| if (complete) { | ||
| io->sendfile_fd = -1; |
There was a problem hiding this comment.
io->sendfile_fd = -1; is written after releasing write_mutex (both in nio_write completion handling and in the immediate-send path here). Since hio_sendfile/write code is otherwise synchronized via write_mutex, updating sendfile state outside the lock introduces a data race with other threads calling hio_sendfile/hio_write_bufsize/event logic. Consider setting/clearing sendfile_fd (and any related fields) while holding write_mutex, or making these fields atomic if they must be touched lock-free.
| hrecursive_mutex_unlock(&io->write_mutex); | |
| __write_cb(io, NULL, nsent); | |
| if (complete) { | |
| io->sendfile_fd = -1; | |
| if (complete) { | |
| io->sendfile_fd = -1; | |
| } | |
| hrecursive_mutex_unlock(&io->write_mutex); | |
| __write_cb(io, NULL, nsent); | |
| if (complete) { |
| closeFile(); | ||
| } | ||
| }; | ||
| hio_sendfile(io, filefd, 0, length); |
There was a problem hiding this comment.
hio_sendfile(io, ...) return value is ignored. If it fails (e.g., io already closed / invalid fd), this handler still returns HTTP_STATUS_UNFINISHED, leaves the file open, and relies on onwrite that may never fire to call closeFile()/writer->End(). Handle hio_sendfile errors (set an appropriate HTTP error, closeFile(), and end/close the writer) before returning.
| hio_sendfile(io, filefd, 0, length); | |
| int rv = hio_sendfile(io, filefd, 0, length); | |
| if (rv != 0) { | |
| // sendfile failed synchronously: clean up and report error | |
| resp->status_code = HTTP_STATUS_INTERNAL_SERVER_ERROR; | |
| resp->content_length = 0; | |
| closeFile(); | |
| writer->End(); | |
| return resp->status_code; | |
| } |
| // SSL: fall back to read + hio_write (sendfile cannot bypass SSL encryption) | ||
| // NOTE: hio_write is non-blocking and queues data internally, | ||
| // so this won't block the event loop even for large files. | ||
| if (io->io_type == HIO_TYPE_SSL) { | ||
| char buf[65536]; | ||
| off_t cur_offset = offset; | ||
| size_t remaining = length; | ||
| while (remaining > 0) { | ||
| size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf); | ||
| ssize_t nread = pread(in_fd, buf, to_read, cur_offset); | ||
| if (nread < 0) { | ||
| hloge("hio_sendfile pread error: %s", strerror(errno)); | ||
| return -1; | ||
| } | ||
| if (nread == 0) { | ||
| hlogw("hio_sendfile: unexpected EOF at offset %lld", (long long)cur_offset); | ||
| break; | ||
| } | ||
| int nwrite = hio_write(io, buf, nread); | ||
| if (nwrite < 0) return nwrite; | ||
| cur_offset += nread; | ||
| remaining -= nread; | ||
| } |
There was a problem hiding this comment.
The SSL fallback reads the entire [offset, offset+length) synchronously in a tight loop and calls hio_write repeatedly. This can block the event-loop thread on disk I/O and can enqueue up to length bytes into the write queue (triggering max_write_bufsize overflow / unexpected connection close) which defeats the large-file streaming goal. Consider implementing the SSL path as incremental chunking driven by HV_WRITE (similar to the non-SSL sendfile state machine), or at least stop reading once the write queue reaches a threshold and resume on write-complete events.
| // SSL: fall back to read + hio_write (sendfile cannot bypass SSL encryption) | |
| // NOTE: hio_write is non-blocking and queues data internally, | |
| // so this won't block the event loop even for large files. | |
| if (io->io_type == HIO_TYPE_SSL) { | |
| char buf[65536]; | |
| off_t cur_offset = offset; | |
| size_t remaining = length; | |
| while (remaining > 0) { | |
| size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf); | |
| ssize_t nread = pread(in_fd, buf, to_read, cur_offset); | |
| if (nread < 0) { | |
| hloge("hio_sendfile pread error: %s", strerror(errno)); | |
| return -1; | |
| } | |
| if (nread == 0) { | |
| hlogw("hio_sendfile: unexpected EOF at offset %lld", (long long)cur_offset); | |
| break; | |
| } | |
| int nwrite = hio_write(io, buf, nread); | |
| if (nwrite < 0) return nwrite; | |
| cur_offset += nread; | |
| remaining -= nread; | |
| } | |
| // SSL: fall back to read + hio_write (sendfile cannot bypass SSL encryption). | |
| // To avoid blocking the event-loop thread and overfilling the write queue, | |
| // limit how much data we read and enqueue in a single call. | |
| if (io->io_type == HIO_TYPE_SSL) { | |
| char buf[65536]; | |
| off_t cur_offset = offset; | |
| size_t remaining = length; | |
| /* Cap the amount of data we enqueue in one call to hio_sendfile over SSL. | |
| * This bounds both disk I/O time and queued bytes, even if 'length' is large. */ | |
| const size_t MAX_SSL_SENDFILE_ENQUEUE = 1024 * 1024; /* 1 MiB per call */ | |
| size_t total_enqueued = 0; | |
| while (remaining > 0 && total_enqueued < MAX_SSL_SENDFILE_ENQUEUE) { | |
| size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf); | |
| if (to_read > (MAX_SSL_SENDFILE_ENQUEUE - total_enqueued)) { | |
| to_read = MAX_SSL_SENDFILE_ENQUEUE - total_enqueued; | |
| } | |
| ssize_t nread; | |
| do { | |
| nread = pread(in_fd, buf, to_read, cur_offset); | |
| } while (nread < 0 && errno == EINTR); | |
| if (nread < 0) { | |
| if (errno == EAGAIN) { | |
| /* Caller can retry later; don't treat as fatal here. */ | |
| break; | |
| } | |
| hloge("hio_sendfile pread error: %s", strerror(errno)); | |
| return -1; | |
| } | |
| if (nread == 0) { | |
| /* Reached EOF before sending the requested length. */ | |
| hlogw("hio_sendfile: unexpected EOF at offset %lld", (long long)cur_offset); | |
| break; | |
| } | |
| int nwrite = hio_write(io, buf, (size_t)nread); | |
| if (nwrite < 0) { | |
| return nwrite; | |
| } | |
| cur_offset += nread; | |
| remaining -= (size_t)nread; | |
| total_enqueued += (size_t)nread; | |
| } | |
| /* If 'remaining' is still > 0, the caller may invoke hio_sendfile again | |
| * with an updated offset/length to continue sending. */ |
| hrecursive_mutex_lock(&io->write_mutex); | ||
|
|
||
| io->sendfile_fd = in_fd; | ||
| io->sendfile_offset = offset; | ||
| io->sendfile_remain = length; | ||
|
|
There was a problem hiding this comment.
hio_sendfile overwrites io->sendfile_fd/offset/remain unconditionally. If a caller invokes hio_sendfile again while a previous sendfile is still active, the in-flight transfer state will be corrupted. Please add a guard that returns an error (e.g., busy) when sendfile_fd >= 0 && sendfile_remain > 0, or explicitly cancel/finish the previous sendfile before starting a new one.
| if (nsent == 0 && (io->io_type & HIO_TYPE_SOCK_STREAM)) { | ||
| goto disconnect; |
There was a problem hiding this comment.
In the sendfile path, treating nsent == 0 as a stream disconnect is not necessarily correct: sendfile(2) (and the pread+write fallback) can return 0 on EOF, even when the socket is still healthy. This currently jumps to disconnect and closes the connection, and it also prevents any completion notification when EOF occurs. Consider handling nsent == 0 as completion (set sendfile_remain = 0 / clear sendfile_fd) or as an explicit sendfile error (set io->error) rather than a socket disconnect.
| if (nsent == 0 && (io->io_type & HIO_TYPE_SOCK_STREAM)) { | |
| goto disconnect; | |
| /* Treat nsent == 0 as sendfile completion (EOF on file), not as a socket disconnect. */ | |
| if (nsent == 0) { | |
| io->sendfile_remain = 0; |
| int hio_sendfile (hio_t* io, int in_fd, off_t offset, size_t length) { | ||
| if (io->closed) return -1; | ||
| if (in_fd < 0) return -1; | ||
| if (length == 0) return 0; | ||
| // NOTE: Windows fallback uses read + hio_write. | ||
| // hio_write is non-blocking (queues via IOCP), so this won't block. | ||
| char buf[65536]; | ||
| off_t cur_offset = offset; | ||
| size_t remaining = length; | ||
| while (remaining > 0) { | ||
| size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf); | ||
| if (_lseeki64(in_fd, cur_offset, SEEK_SET) < 0) return -1; | ||
| ssize_t nread = _read(in_fd, buf, (unsigned int)to_read); | ||
| if (nread < 0) return -1; | ||
| if (nread == 0) break; // EOF | ||
| int nwrite = hio_write(io, buf, nread); | ||
| if (nwrite < 0) return nwrite; | ||
| cur_offset += nread; | ||
| remaining -= nread; | ||
| } | ||
| return 0; |
There was a problem hiding this comment.
The Windows implementation loops until length is fully read and calls hio_write for every 64KB chunk. On IOCP this allocates a new buffer per queued send (see hio_write4), so this can rapidly consume large amounts of memory for big files and can also block the loop thread on synchronous _read calls. This should be reworked to be incremental (one chunk per writable/completion event) with backpressure similar to the Unix implementation, instead of enqueueing the entire file in one call.
Large file serving via
HttpHandler::sendFilereads entire file content through userspace buffers (40KB chunks), causing unnecessary memory pressure and CPU overhead from kernel↔user copies. This adds ahio_sendfile()API that uses OS-level zero-copy mechanisms.New API
Platform backends
sendfile(2)— true zero-copy, data stays in kernelsendfile(2)with platform-specific signaturespread()+write(), one 64KB chunk per event loop tickpread()+hio_write()(must pass through encryption layer)_read()+hio_write()fallbackEvent loop integration
sendfile_fd,sendfile_offset,sendfile_remaintohio_tHV_WRITEevent handling innio_writehio_write_bufsize()includessendfile_remainsohio_write_is_complete()works correctlyhio_handle_eventskeepsHV_WRITEregistered while sendfile is activehio_closeHTTP handler integration
defaultLargeFileHandlernow useshio_sendfile()whenlimit_rate < 0(unlimited):Rate-limited transfers (
limit_rate > 0) retain the existing timer-based chunked approach since sendfile has no built-in throttling.Original prompt
📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.