Skip to content

Avoid heap materialization for CRC32 page checksums in ColumnChunkPageWriteStore #3484

@arouel

Description

@arouel

Describe the enhancement requested

ColumnChunkPageWriter.writePage() and ColumnChunkPageWriter.writePageV2() call BytesInput.toByteArray() to feed compressed page data into CRC32.update(byte[]). When the writer uses a direct ByteBufferAllocator, this forces a full heap copy of every compressed page solely for checksumming.

CRC32.update(ByteBuffer) has been available since Java 9 and operates directly on the buffer's memory without copying. Replacing crc.update(compressedBytes.toByteArray()) with crc.update(compressedBytes.toByteBuffer(releaser)) eliminates one heap byte[] allocation per page. The releaser is already a field on ColumnChunkPageWriter.

In a local benchmark (~350M records, SNAPPY compression, DirectCodecFactory with off-heap ByteBufferAllocator), this change reduced sampled byte[] allocation by 23% (8,565 MB to 6,604 MB) and GC collections by 13% (103 to 90) compared to DirectCodecFactory alone.

Component(s)

Core

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions