Add CsvStreamResponse for streaming large CSV exports#150
Draft
dereuromark wants to merge 2 commits into
Draft
Conversation
Memory-efficient streaming sibling of CsvView built on the new Cake\Http\Response\AbstractStreamResponse base. Emits rows to the wire as they are produced rather than building the entire CSV in memory first, so memory use stays constant regardless of dataset size and the client sees the first row after one round trip instead of after the full export is generated. Reuses CsvView's row-formatting logic (delimiter, enclosure, escape, eol, BOM, setSeparator, excel preset, iconv/mbstring transcoding with strict/ignore/transliterate modes) so the streaming and non-streaming paths produce byte-identical output for the same configuration. A future cleanup could extract the shared row formatter; this PR keeps the duplication explicit and contained. Tears cleanly on mid-stream encoding failures: logs via Log::error() and stops emitting further rows so the client receives a valid but truncated CSV rather than a corrupt one. Tests (23) cover the wire format (header/footer, extract by path / format / callable, custom delimiter+EOL, BOM-on-first-row-only, excel preset, setSeparator), input shapes (array, generator, empty), error paths (tear on unrenderable value, tear on strict transcoding failure, footer-omitted on tear), encoding (UTF-8 to ISO-8859-1 with strict and ignore modes), and the flushEvery validation inherited from the base. The cakephp/cakephp constraint temporarily points at the feature-abstract-stream-response branch on the dereuromark fork while the upstream PR is in review. Will be flipped to dev-5.next (or ^5.4 once 5.4 releases) before merge.
Add a "Streaming large exports" section covering: when to reach for the streaming response over CsvView, the controller-only usage pattern, the full option list (row formatting reused from CsvView plus the inherited flushEvery), the excel shorthand, custom filename via withDownload(), and the tear-cleanly mid-stream error contract.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Proof-of-concept
CsvStreamResponsecompanion toCsvViewfor memory-efficient streaming of large CSV exports. Built on the newCake\Http\Response\AbstractStreamResponsebase proposed upstream in cakephp/cakephp#19431.Implements the design pivot agreed in issue #89: drop the originally sketched
CsvStreamView(a View subclass that would have had to disable most of the View machinery to stream) in favor of a Response subclass that matches core'sJsonStreamResponseshape.Usage
One line in the controller, no
viewBuilder()gymnastics. Symmetric withJsonStreamResponsefrom core.What it inherits from
AbstractStreamResponseCallbackStreamwiring + body callbackoutput()/outputAndFlush()/flushOutputBuffers()primitivesflushEverythreshold counterX-Accel-Buffering: noheaderContent-TypewithApp.encoding-derived charsetLog::error()shim (gated oncakephp/logbeing installed)withStreamOptions()/getStreamOptions())What it owns
CSV wire format: header / footer rows,
extract(Hash paths, sprintf formats, callables,EntityInterface::toArray()), BOM-on-first-row, optionalsep={delimiter}line,delimiter/enclosure/escape, in-field newline replacement,eolbetween rows, dualdataEncoding→csvEncodingtranscoding via iconv (withstrict/ignore/transliteratemodes) or mbstring fallback, and theexcelshorthand that forces BOM + CRLF + UTF-8 in one toggle.The row-formatting logic mirrors
CsvView::_generateRow()and_transcode()so the streaming and non-streaming paths emit byte-identical output for the same config. The duplication is explicit on purpose for this PoC; extracting a shared row formatter is a clean follow-up once both PRs land.Mid-stream errors — tear cleanly
A row that cannot be encoded (unrenderable extract path, strict-mode transcoding failure, fputcsv refusal) is logged via
Log::error()and the stream tears: no further rows, no footer, truncated CSV on the wire. Trades a partial response for a valid-up-to-the-error file plus a server-side log entry that surfaces in Sentry / equivalent — exactly what jose suggested in the issue thread.Composer change — temporary
The
cakephp/cakephpconstraint points at thefeature-abstract-stream-responsebranch on thedereuromark/cakephpfork via arepositoriesentry. This is a draft PR scaffold — once cakephp/cakephp#19431 merges to5.nextthe constraint flips todev-5.next(or^5.4after 5.4 tags), therepositoriesentry is removed, andminimum-stabilityreverts.To prevent lock in and allow users to use the packagist in general with older Cake versions, I recommend though:
Open decisions
CsvView\Http\Response\CsvStreamResponse. Plugin namespace isCsvView\and HTTP responses elsewhere in Cake live underHttp\Response\, so this fits both. Alternative: drop intosrc/Response/to skip the deeper path._generateRow()/_transcode()fromCsvView. Follow-up could extract to a trait or aCsvRowFormatterclass once the shape is reviewed. Out of scope here to keep the diff focused.1(flush every row, lowest latency). Easy to change to 50 / 100 if reviewers prefer fewer flush calls at the cost of slightly delayed first-byte.Opening as draft — blocked on cakephp/cakephp#19431 review/merge, and on the three decisions above.