Skip to content

Add streaming ItemReader for legacy .xls files#216

Open
kebbaman wants to merge 1 commit into
spring-projects:mainfrom
kebbaman:gh-215-streaming-xls
Open

Add streaming ItemReader for legacy .xls files#216
kebbaman wants to merge 1 commit into
spring-projects:mainfrom
kebbaman:gh-215-streaming-xls

Conversation

@kebbaman

@kebbaman kebbaman commented Jun 28, 2026

Copy link
Copy Markdown

Fixes #215

Adds a low-memory streaming reader for legacy binary .xls (BIFF8/HSSF) files - the
HSSF counterpart to the existing StreamingXlsxItemReader.

What this adds

  • StreamingXlsItemReader<T> and a package-private XlsSheet in the streaming
    package, both built on the existing Sheet (Iterable<String[]>) abstraction and
    AbstractExcelItemReader, so all current configuration, mapping, blank-row handling
    and restart behaviour are inherited unchanged.

Approach

  • Reads HSSF records in pull mode via RecordFactoryInputStream#nextRecord() - the
    HSSF analogue of the StAX loop in StreamingSheet - assembling rows one at a time,
    so memory stays bounded regardless of file size.
  • The shared strings table is read once from the workbook globals and shared across
    sheets; each sheet is exposed as its own stream positioned at its BoundSheetRecord
    offset, mirroring how the xlsx reader gives each sheet an independent stream.
  • Formulas return the cached value, consistent with the streaming xlsx reader.

Notes

  • Additive only: no changes to existing readers, abstractions, or dependencies.
  • ./mvnw -B package is green (compile + io.spring.javaformat checkstyle + tests).
  • Commits are DCO signed-off.

Signed-off-by: Kebba Manneh <kebbaman@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add streaming ItemReader for large legacy binary .xls (HSSF) files

2 participants