Skip to content

[Python] Scanning batch size is limited to 65536 (2**16). #20164

@asfimport

Description

@asfimport

Scanning batches is documented to default to a batch size of 1,000,000. But the behavior is that batch size defaults to - and is limited to - 65536.

In []: dataset.count_rows()
Out[]: 538038292

In []: next(dataset.to_batches()).num_rows
Out[]: 65536

In []: next(dataset.to_batches(batch_size=10**6)).num_rows
Out[]: 65536

In []: next(dataset.to_batches(batch_size=10**4)).num_rows
Out[]: 10000

 

 

Environment: macOS
Reporter: A. Coady / @coady

Note: This issue was originally created as ARROW-16015. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions