Skip to content

Bug: ndarray_from_cframe(copy=True) crashes with munmap_chunk(): invalid pointer when cframe contains a SPECIAL chunk before a NORMAL chunk #611

@Karol-G

Description

@Karol-G

Environment

OS Linux 5.15.0
python-blosc2 4.1.2
C-Blosc2 2.23.1 (2026-03-03)
Python 3.x (conda)
NumPy 2.x

Summary

Calling blosc2.ndarray_from_cframe(cframe, copy=True) aborts the process with a native heap corruption error whenever the cframe contains a SPECIAL (zero-run-length) chunk that is followed by at least one NORMAL chunk.

munmap_chunk(): invalid pointer
Aborted (core dumped)

Minimal Reproduction

import blosc2
import numpy as np

# 14-element array: first 7 elements are zero (-> SPECIAL chunk),
# last 7 elements are nonzero (-> NORMAL chunk).
arr = np.zeros(14, dtype=np.float32)
arr[7:] = 1.0

b2arr = blosc2.asarray(arr, chunks=(7,), blocks=(3,))
cframe = b2arr.to_cframe()

# Crashes: munmap_chunk(): invalid pointer / Aborted (core dumped)
blosc2.ndarray_from_cframe(cframe, copy=True)

Observed vs. Expected Behaviour

Expected ndarray_from_cframe returns a valid NDArray with the original data.
Observed Process aborts immediately with munmap_chunk(): invalid pointer.

Trigger Condition

The crash depends solely on the ordering of SPECIAL and NORMAL chunks in the cframe. A chunk is SPECIAL (zero-run-length encoded) when all its elements are zero.

Chunk sequence Result
[NORMAL, NORMAL] OK
[SPECIAL, SPECIAL] OK
[NORMAL, NORMAL, SPECIAL] OK — specials at tail only
[SPECIAL, NORMAL] CRASH
[NORMAL, SPECIAL, NORMAL] CRASH
[SPECIAL, SPECIAL, NORMAL] CRASH

Rule: crash occurs when any SPECIAL chunk has at least one NORMAL chunk after it.

Additional Observations

  • The bug is codec- and filter-independent: reproduces with BLOSCLZ, LZ4, LZ4HC, ZLIB, ZSTD and with NOFILTER, SHUFFLE, BITSHUFFLE.
  • ndarray_from_cframe(cframe, copy=False) does not crash (but the cframe bytes object must outlive the returned NDArray).
  • blosc2.open() on a file-backed NDArray does not crash.
  • The crash is in native code (blosc2_schunk_from_buffer), not in Python.

Suspected Root Cause

In blosc2_schunk_from_buffer(copy=True), the C code appears to store a raw pointer into the source cframe buffer for SPECIAL chunks rather than malloc()-ing a private copy. When the schunk (or its chunks) are later freed, free() is called on that non-malloc()'d pointer, corrupting the heap.

The copy=False path is unaffected because no ownership transfer is attempted for SPECIAL chunks — the pointer into the caller-owned buffer is valid for as long as the buffer lives.

Workaround

Avoid the cframe round-trip when loading a file-backed NDArray. blosc2.open() returns a fully functional NDArray without going through ndarray_from_cframe:

# Safe alternative for file-backed arrays
store = blosc2.open(filepath, mode="r")

If an in-memory copy is required:

store = blosc2.open(filepath, mode="r")
in_memory = blosc2.asarray(store[:])

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions