Scanning batches is documented to default to a batch size of 1,000,000. But the behavior is that batch size defaults to - and is limited to - 65536.
In []: dataset.count_rows()
Out[]: 538038292
In []: next(dataset.to_batches()).num_rows
Out[]: 65536
In []: next(dataset.to_batches(batch_size=10**6)).num_rows
Out[]: 65536
In []: next(dataset.to_batches(batch_size=10**4)).num_rows
Out[]: 10000
Environment: macOS
Reporter: A. Coady / @coady
Note: This issue was originally created as ARROW-16015. Please see the migration documentation for further details.
Scanning batches is documented to default to a batch size of 1,000,000. But the behavior is that batch size defaults to - and is limited to - 65536.
Environment: macOS
Reporter: A. Coady / @coady
Note: This issue was originally created as ARROW-16015. Please see the migration documentation for further details.