Skip to content

Add bounds checking for output tensor buffer in wasi-nn llama.cpp#4847

Open
sumleo wants to merge 2 commits intobytecodealliance:mainfrom
sumleo:fix/wasi-nn-output-bounds-check
Open

Add bounds checking for output tensor buffer in wasi-nn llama.cpp#4847
sumleo wants to merge 2 commits intobytecodealliance:mainfrom
sumleo:fix/wasi-nn-output-bounds-check

Conversation

@sumleo
Copy link
Contributor

@sumleo sumleo commented Feb 25, 2026

Summary

The get_output function in the wasi-nn llama.cpp backend copies data into output_tensor->buf without checking against output_tensor->size. This can lead to out-of-bounds writes when the model generates output longer than the caller-provided buffer.

Two vulnerable paths were identified:

  1. Metadata path (index == 1): memcpy(output_tensor->buf, output_metadata, strlen(output_metadata)) copies up to 127 bytes of metadata JSON with no size check against the destination buffer.

  2. Token output loop (index == 0): memcpy(output_tensor->buf + end_pos, buf, strlen(buf)) accumulates token pieces in a loop with no bounds checking, allowing unbounded writes past the buffer.

Fix

  • For the metadata path: clamp the copy length to output_tensor->size before calling memcpy.
  • For the token output loop: check whether appending the next token piece would exceed output_tensor->size, and break out of the loop if so.

Both fixes use the existing output_tensor->size field from the tensor_data struct defined in wasi_nn_types.h.

Test plan

  • Verify that callers providing a sufficiently large buffer see no behavior change.
  • Verify that callers providing a small buffer receive truncated output without memory corruption.

The get_output function copies LLM output into output_tensor->buf
without checking against output_tensor->size, allowing writes
past the buffer when the model generates output longer than the
caller-provided buffer. Add size checks for both the metadata
path and the token output loop.
Copy link
Contributor

@lum1n0us lum1n0us left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, there is a inconsistency problem. Callers cannot distinguish between successful completion vs. truncated output.

Might want to follow OpenVINO processing to return errors loudly.

Instead of silently truncating output when the buffer is too small,
return the too_large error with a diagnostic message. This makes the
behavior consistent with the OpenVINO backend's get_output and allows
callers to distinguish between successful completion and insufficient
buffer size.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants