Skip to content

Add hio_sendfile() API for zero-copy file transfer using sendfile(2)#821

Open
Copilot wants to merge 3 commits intomasterfrom
copilot/implement-sendfile-zero-copy
Open

Add hio_sendfile() API for zero-copy file transfer using sendfile(2)#821
Copilot wants to merge 3 commits intomasterfrom
copilot/implement-sendfile-zero-copy

Conversation

Copy link
Contributor

Copilot AI commented Mar 23, 2026

Large file serving via HttpHandler::sendFile reads entire file content through userspace buffers (40KB chunks), causing unnecessary memory pressure and CPU overhead from kernel↔user copies. This adds a hio_sendfile() API that uses OS-level zero-copy mechanisms.

New API

// event/hloop.h
HV_EXPORT int hio_sendfile(hio_t* io, int in_fd, off_t offset, size_t length);

Platform backends

  • Linux: sendfile(2) — true zero-copy, data stays in kernel
  • macOS/FreeBSD: sendfile(2) with platform-specific signatures
  • Generic Unix fallback: pread() + write(), one 64KB chunk per event loop tick
  • SSL: pread() + hio_write() (must pass through encryption layer)
  • Windows IOCP: _read() + hio_write() fallback

Event loop integration

  • Added sendfile_fd, sendfile_offset, sendfile_remain to hio_t
  • Sendfile waits for write queue to drain, then takes over HV_WRITE event handling in nio_write
  • hio_write_bufsize() includes sendfile_remain so hio_write_is_complete() works correctly
  • hio_handle_events keeps HV_WRITE registered while sendfile is active
  • State cleaned up on hio_close

HTTP handler integration

defaultLargeFileHandler now uses hio_sendfile() when limit_rate < 0 (unlimited):

} else if (service->limit_rate < 0 && file->fp && fileno(file->fp) >= 0) {
    int filefd = fileno(file->fp);
    size_t length = resp->content_length;
    writer->EndHeaders();
    writer->onwrite = [this](HBuf* buf) {
        if (writer->isWriteComplete()) {
            resp->content_length = 0;
            writer->End();
            closeFile();
        }
    };
    hio_sendfile(io, filefd, 0, length);
    return HTTP_STATUS_UNFINISHED;
}

Rate-limited transfers (limit_rate > 0) retain the existing timer-based chunked approach since sendfile has no built-in throttling.

Original prompt

This section details on the original issue you should resolve

<issue_title>【咨询】大文件传输</issue_title>
<issue_description>您好,请问是否有sendfile实现零拷贝的计划呢?
HttpHandler::sendFile(使用的是read->WriteBody) 走的还是用户态的文件全量读取,内存消耗会比较大。
当前我使用的是分块流式传输来实现的大文件传输,不过分块传输走的也是用户态拷贝,但是可以解决内存消耗问题。

下面是AI生成的内容,仅供参考:

1、期望行为:
使用 sendfile(2)(Linux)/ TransmitFile(Windows)让数据直接在内核态从文件页缓存传到 socket 缓冲区:

2、实现建议:
针对 defaultLargeFileHandler 场景:
- HTTP 响应头通过 hio_write() 正常发送(headers 很小,用户态拷贝没影响)
- Headers 刷新完成后,通过 sendfile(fd_socket, fd_file, &offset, count) 传输 body
- sendfile 可能不会一次发完,需要注册 EPOLLOUT 事件在 socket 可写时继续
- 传输完成后调用 writer->End() 清理

3、需要考虑的问题:
- 需要 socket fd:sendfile(2) 需要 socket 文件描述符。hio_fd() 可以提供,但需要绕过事件循环的写缓冲区管理
- 写缓冲区冲突:hio_write() 使用内部写队列,sendfile 需要在 headers 完全刷新后才能开始,避免数据混乱
- SSL/TLS 不兼容:sendfile(2) 无法与 SSL 连接配合(数据必须经过 SSL 加密层),此优化仅适用于明文 HTTP
- 限速功能:当前 limit_rate 功能基于 timer 分块读取,sendfile 需要用 TCP_CORK / TCP_NODELAY 或 OS 级流控代替
- 跨平台:Linux sendfile(2)、macOS sendfile(2)(参数不同)、Windows TransmitFile 需要平台抽象层

4、最小可行方案:
- 如果完整实现较复杂,也可以考虑暴露一个 hook,让用户在 headers 发送后接管 body 传输
// 在 headers 发送后,允许用户直接操作 socket fd 传输文件
writer->SendFileBody(filepath, offset, length);
-或者提供获取底层 socket fd 的方法,让用户自行实现 sendfile。

5、参考:
- nginx 默认启用 sendfile,静态文件吞吐量提升 30-50%
- Go net/http 通过 io.Copy + *os.File 自动使用 sendfile
- 此外 Linux 5.6+ 的 io_uring IORING_OP_SPLICE / IORING_OP_SEND_ZC 也是潜在的零拷贝方案

</issue_description>

Comments on the Issue (you are @copilot in this section)


📍 Connect Copilot coding agent with Jira, Azure Boards or Linear to delegate work to Copilot in one click without leaving your project management tool.

Copilot AI and others added 2 commits March 23, 2026 03:26
Implements:
- Platform-abstracted sendfile in nio.c (Linux sendfile, macOS/FreeBSD sendfile, generic pread+write fallback)
- hio_sendfile() public API in hloop.h with write queue integration
- Sendfile state tracking in hio_t (sendfile_fd, sendfile_offset, sendfile_remain)
- SSL fallback using pread + hio_write
- Windows fallback in overlapio.c
- Integration into HTTP large file handler for unlimited rate mode
- hio_write_bufsize accounts for sendfile remaining bytes

Co-authored-by: ithewei <26049660+ithewei@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ithewei/libhv/sessions/e59504ed-e745-410d-a8ab-e99946d80a3c
…mprove error messages

- Add in_fd validation in hio_sendfile() for both nio.c and overlapio.c
- Fix generic sendfile fallback to handle partial writes correctly
- Check sendfile_remain under mutex before unlocking in nio_write
- Improve SSL fallback with better EOF/error distinction
- Add fileno() validation in HTTP handler
- Improve Windows fallback with separate seek/read error handling

Co-authored-by: ithewei <26049660+ithewei@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ithewei/libhv/sessions/e59504ed-e745-410d-a8ab-e99946d80a3c
Copilot AI changed the title [WIP] Add support for zero-copy file transfer using sendfile Add hio_sendfile() API for zero-copy file transfer using sendfile(2) Mar 23, 2026
Copilot AI requested a review from ithewei March 23, 2026 03:37
@ithewei ithewei marked this pull request as ready for review March 24, 2026 07:13
Copilot AI review requested due to automatic review settings March 24, 2026 07:13
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new hio_sendfile() API to support zero-copy (or near zero-copy) file transfer at the event-loop I/O layer, and wires it into the HTTP server’s large-file response path to avoid userspace buffering for unlimited-rate transfers.

Changes:

  • Add hio_sendfile(hio_t*, int in_fd, off_t offset, size_t length) public API and track sendfile state in hio_t (sendfile_fd/offset/remain).
  • Integrate sendfile into the non-IOCP (nio.c) write path and adjust write-completion accounting via hio_write_bufsize() including sendfile_remain.
  • Update HttpHandler::defaultLargeFileHandler to use hio_sendfile() when limit_rate < 0.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
http/server/HttpHandler.cpp Uses hio_sendfile() for unlimited large-file responses and ends the response on write completion.
event/nio.c Implements sendfile state machine for non-IOCP backends and integrates it into HV_WRITE handling.
event/overlapio.c Adds a Windows (IOCP) hio_sendfile() fallback implemented via read + hio_write().
event/hloop.h Declares the new hio_sendfile() API and documents expected behavior.
event/hevent.h Extends hio_t with sendfile-related state fields.
event/hevent.c Initializes sendfile state and updates hio_write_bufsize() to include pending sendfile bytes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

// @param offset: starting offset in the file
// @param length: number of bytes to send
// @return 0 on success (async operation started), -1 on error
// hwrite_cb is called as data is sent. When complete, write_queue is empty.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hio_sendfile doc comment is misleading/incomplete for consumers of hwrite_cb: sendfile bytes are not part of the write queue, so completion is not just “write_queue is empty”; it also requires sendfile_remain == 0. Also, the buf pointer passed to hwrite_cb can be NULL for sendfile writes (__write_cb(io, NULL, nsent)), so callbacks must not assume buf != NULL. Please clarify these semantics in the API comment.

Suggested change
// hwrite_cb is called as data is sent. When complete, write_queue is empty.
// hwrite_cb is called as data is sent. For sendfile transfers, the bytes being
// sent are not queued in write_queue; completion is when write_queue is empty
// AND the internal sendfile_remain counter has reached 0.
// NOTE: For sendfile writes, hwrite_cb may be invoked with buf == NULL
// (e.g. __write_cb(io, NULL, nsent)); callbacks must not assume buf != NULL
// and should rely on the length argument (e.g. nwrite) rather than buf contents.

Copilot uses AI. Check for mistakes.
Comment on lines +746 to +749
hrecursive_mutex_unlock(&io->write_mutex);
__write_cb(io, NULL, nsent);
if (complete) {
io->sendfile_fd = -1;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io->sendfile_fd = -1; is written after releasing write_mutex (both in nio_write completion handling and in the immediate-send path here). Since hio_sendfile/write code is otherwise synchronized via write_mutex, updating sendfile state outside the lock introduces a data race with other threads calling hio_sendfile/hio_write_bufsize/event logic. Consider setting/clearing sendfile_fd (and any related fields) while holding write_mutex, or making these fields atomic if they must be touched lock-free.

Suggested change
hrecursive_mutex_unlock(&io->write_mutex);
__write_cb(io, NULL, nsent);
if (complete) {
io->sendfile_fd = -1;
if (complete) {
io->sendfile_fd = -1;
}
hrecursive_mutex_unlock(&io->write_mutex);
__write_cb(io, NULL, nsent);
if (complete) {

Copilot uses AI. Check for mistakes.
closeFile();
}
};
hio_sendfile(io, filefd, 0, length);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hio_sendfile(io, ...) return value is ignored. If it fails (e.g., io already closed / invalid fd), this handler still returns HTTP_STATUS_UNFINISHED, leaves the file open, and relies on onwrite that may never fire to call closeFile()/writer->End(). Handle hio_sendfile errors (set an appropriate HTTP error, closeFile(), and end/close the writer) before returning.

Suggested change
hio_sendfile(io, filefd, 0, length);
int rv = hio_sendfile(io, filefd, 0, length);
if (rv != 0) {
// sendfile failed synchronously: clean up and report error
resp->status_code = HTTP_STATUS_INTERNAL_SERVER_ERROR;
resp->content_length = 0;
closeFile();
writer->End();
return resp->status_code;
}

Copilot uses AI. Check for mistakes.
Comment on lines +699 to +721
// SSL: fall back to read + hio_write (sendfile cannot bypass SSL encryption)
// NOTE: hio_write is non-blocking and queues data internally,
// so this won't block the event loop even for large files.
if (io->io_type == HIO_TYPE_SSL) {
char buf[65536];
off_t cur_offset = offset;
size_t remaining = length;
while (remaining > 0) {
size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf);
ssize_t nread = pread(in_fd, buf, to_read, cur_offset);
if (nread < 0) {
hloge("hio_sendfile pread error: %s", strerror(errno));
return -1;
}
if (nread == 0) {
hlogw("hio_sendfile: unexpected EOF at offset %lld", (long long)cur_offset);
break;
}
int nwrite = hio_write(io, buf, nread);
if (nwrite < 0) return nwrite;
cur_offset += nread;
remaining -= nread;
}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSL fallback reads the entire [offset, offset+length) synchronously in a tight loop and calls hio_write repeatedly. This can block the event-loop thread on disk I/O and can enqueue up to length bytes into the write queue (triggering max_write_bufsize overflow / unexpected connection close) which defeats the large-file streaming goal. Consider implementing the SSL path as incremental chunking driven by HV_WRITE (similar to the non-SSL sendfile state machine), or at least stop reading once the write queue reaches a threshold and resume on write-complete events.

Suggested change
// SSL: fall back to read + hio_write (sendfile cannot bypass SSL encryption)
// NOTE: hio_write is non-blocking and queues data internally,
// so this won't block the event loop even for large files.
if (io->io_type == HIO_TYPE_SSL) {
char buf[65536];
off_t cur_offset = offset;
size_t remaining = length;
while (remaining > 0) {
size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf);
ssize_t nread = pread(in_fd, buf, to_read, cur_offset);
if (nread < 0) {
hloge("hio_sendfile pread error: %s", strerror(errno));
return -1;
}
if (nread == 0) {
hlogw("hio_sendfile: unexpected EOF at offset %lld", (long long)cur_offset);
break;
}
int nwrite = hio_write(io, buf, nread);
if (nwrite < 0) return nwrite;
cur_offset += nread;
remaining -= nread;
}
// SSL: fall back to read + hio_write (sendfile cannot bypass SSL encryption).
// To avoid blocking the event-loop thread and overfilling the write queue,
// limit how much data we read and enqueue in a single call.
if (io->io_type == HIO_TYPE_SSL) {
char buf[65536];
off_t cur_offset = offset;
size_t remaining = length;
/* Cap the amount of data we enqueue in one call to hio_sendfile over SSL.
* This bounds both disk I/O time and queued bytes, even if 'length' is large. */
const size_t MAX_SSL_SENDFILE_ENQUEUE = 1024 * 1024; /* 1 MiB per call */
size_t total_enqueued = 0;
while (remaining > 0 && total_enqueued < MAX_SSL_SENDFILE_ENQUEUE) {
size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf);
if (to_read > (MAX_SSL_SENDFILE_ENQUEUE - total_enqueued)) {
to_read = MAX_SSL_SENDFILE_ENQUEUE - total_enqueued;
}
ssize_t nread;
do {
nread = pread(in_fd, buf, to_read, cur_offset);
} while (nread < 0 && errno == EINTR);
if (nread < 0) {
if (errno == EAGAIN) {
/* Caller can retry later; don't treat as fatal here. */
break;
}
hloge("hio_sendfile pread error: %s", strerror(errno));
return -1;
}
if (nread == 0) {
/* Reached EOF before sending the requested length. */
hlogw("hio_sendfile: unexpected EOF at offset %lld", (long long)cur_offset);
break;
}
int nwrite = hio_write(io, buf, (size_t)nread);
if (nwrite < 0) {
return nwrite;
}
cur_offset += nread;
remaining -= (size_t)nread;
total_enqueued += (size_t)nread;
}
/* If 'remaining' is still > 0, the caller may invoke hio_sendfile again
* with an updated offset/length to continue sending. */

Copilot uses AI. Check for mistakes.
Comment on lines +725 to +730
hrecursive_mutex_lock(&io->write_mutex);

io->sendfile_fd = in_fd;
io->sendfile_offset = offset;
io->sendfile_remain = length;

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hio_sendfile overwrites io->sendfile_fd/offset/remain unconditionally. If a caller invokes hio_sendfile again while a previous sendfile is still active, the in-flight transfer state will be corrupted. Please add a guard that returns an error (e.g., busy) when sendfile_fd >= 0 && sendfile_remain > 0, or explicitly cancel/finish the previous sendfile before starting a new one.

Copilot uses AI. Check for mistakes.
Comment on lines +490 to +491
if (nsent == 0 && (io->io_type & HIO_TYPE_SOCK_STREAM)) {
goto disconnect;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the sendfile path, treating nsent == 0 as a stream disconnect is not necessarily correct: sendfile(2) (and the pread+write fallback) can return 0 on EOF, even when the socket is still healthy. This currently jumps to disconnect and closes the connection, and it also prevents any completion notification when EOF occurs. Consider handling nsent == 0 as completion (set sendfile_remain = 0 / clear sendfile_fd) or as an explicit sendfile error (set io->error) rather than a socket disconnect.

Suggested change
if (nsent == 0 && (io->io_type & HIO_TYPE_SOCK_STREAM)) {
goto disconnect;
/* Treat nsent == 0 as sendfile completion (EOF on file), not as a socket disconnect. */
if (nsent == 0) {
io->sendfile_remain = 0;

Copilot uses AI. Check for mistakes.
Comment on lines +390 to +410
int hio_sendfile (hio_t* io, int in_fd, off_t offset, size_t length) {
if (io->closed) return -1;
if (in_fd < 0) return -1;
if (length == 0) return 0;
// NOTE: Windows fallback uses read + hio_write.
// hio_write is non-blocking (queues via IOCP), so this won't block.
char buf[65536];
off_t cur_offset = offset;
size_t remaining = length;
while (remaining > 0) {
size_t to_read = remaining < sizeof(buf) ? remaining : sizeof(buf);
if (_lseeki64(in_fd, cur_offset, SEEK_SET) < 0) return -1;
ssize_t nread = _read(in_fd, buf, (unsigned int)to_read);
if (nread < 0) return -1;
if (nread == 0) break; // EOF
int nwrite = hio_write(io, buf, nread);
if (nwrite < 0) return nwrite;
cur_offset += nread;
remaining -= nread;
}
return 0;
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows implementation loops until length is fully read and calls hio_write for every 64KB chunk. On IOCP this allocates a new buffer per queued send (see hio_write4), so this can rapidly consume large amounts of memory for big files and can also block the loop thread on synchronous _read calls. This should be reworked to be incremental (one chunk per writable/completion event) with backpressure similar to the Unix implementation, instead of enqueueing the entire file in one call.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

【咨询】大文件传输

3 participants