Skip to content

Add CsvStreamResponse for streaming large CSV exports#150

Draft
dereuromark wants to merge 2 commits into
5.xfrom
feature-csv-stream-response
Draft

Add CsvStreamResponse for streaming large CSV exports#150
dereuromark wants to merge 2 commits into
5.xfrom
feature-csv-stream-response

Conversation

@dereuromark
Copy link
Copy Markdown
Member

@dereuromark dereuromark commented May 12, 2026

Summary

Proof-of-concept CsvStreamResponse companion to CsvView for memory-efficient streaming of large CSV exports. Built on the new Cake\Http\Response\AbstractStreamResponse base proposed upstream in cakephp/cakephp#19431.

Implements the design pivot agreed in issue #89: drop the originally sketched CsvStreamView (a View subclass that would have had to disable most of the View machinery to stream) in favor of a Response subclass that matches core's JsonStreamResponse shape.

Usage

public function export()
{
    $rows = $this->Articles->find()->disableBufferedResults();

    return new CsvStreamResponse($rows, [
        'header' => ['id', 'title', 'created'],
        'extract' => ['id', 'title', ['created', '%s']],
    ]);
}

One line in the controller, no viewBuilder() gymnastics. Symmetric with JsonStreamResponse from core.

What it inherits from AbstractStreamResponse

  • CallbackStream wiring + body callback
  • output() / outputAndFlush() / flushOutputBuffers() primitives
  • flushEvery threshold counter
  • X-Accel-Buffering: no header
  • Content-Type with App.encoding-derived charset
  • Log::error() shim (gated on cakephp/log being installed)
  • PSR-7 immutability (withStreamOptions() / getStreamOptions())

What it owns

CSV wire format: header / footer rows, extract (Hash paths, sprintf formats, callables, EntityInterface::toArray()), BOM-on-first-row, optional sep={delimiter} line, delimiter/enclosure/escape, in-field newline replacement, eol between rows, dual dataEncodingcsvEncoding transcoding via iconv (with strict / ignore / transliterate modes) or mbstring fallback, and the excel shorthand that forces BOM + CRLF + UTF-8 in one toggle.

The row-formatting logic mirrors CsvView::_generateRow() and _transcode() so the streaming and non-streaming paths emit byte-identical output for the same config. The duplication is explicit on purpose for this PoC; extracting a shared row formatter is a clean follow-up once both PRs land.

Mid-stream errors — tear cleanly

A row that cannot be encoded (unrenderable extract path, strict-mode transcoding failure, fputcsv refusal) is logged via Log::error() and the stream tears: no further rows, no footer, truncated CSV on the wire. Trades a partial response for a valid-up-to-the-error file plus a server-side log entry that surfaces in Sentry / equivalent — exactly what jose suggested in the issue thread.

Composer change — temporary

The cakephp/cakephp constraint points at the feature-abstract-stream-response branch on the dereuromark/cakephp fork via a repositories entry. This is a draft PR scaffold — once cakephp/cakephp#19431 merges to 5.next the constraint flips to dev-5.next (or ^5.4 after 5.4 tags), the repositories entry is removed, and minimum-stability reverts.

To prevent lock in and allow users to use the packagist in general with older Cake versions, I recommend though:

  • Document the CsvStreamResponse as 5.4+ feature and fail early if used otherwise.

Open decisions

  1. Class locationCsvView\Http\Response\CsvStreamResponse. Plugin namespace is CsvView\ and HTTP responses elsewhere in Cake live under Http\Response\, so this fits both. Alternative: drop into src/Response/ to skip the deeper path.
  2. Shared row formatter — current PR duplicates _generateRow() / _transcode() from CsvView. Follow-up could extract to a trait or a CsvRowFormatter class once the shape is reviewed. Out of scope here to keep the diff focused.
  3. Flush threshold default — left at the base's 1 (flush every row, lowest latency). Easy to change to 50 / 100 if reviewers prefer fewer flush calls at the cost of slightly delayed first-byte.

Opening as draft — blocked on cakephp/cakephp#19431 review/merge, and on the three decisions above.

Memory-efficient streaming sibling of CsvView built on the new
Cake\Http\Response\AbstractStreamResponse base. Emits rows to the wire
as they are produced rather than building the entire CSV in memory
first, so memory use stays constant regardless of dataset size and the
client sees the first row after one round trip instead of after the
full export is generated.

Reuses CsvView's row-formatting logic (delimiter, enclosure, escape,
eol, BOM, setSeparator, excel preset, iconv/mbstring transcoding with
strict/ignore/transliterate modes) so the streaming and non-streaming
paths produce byte-identical output for the same configuration. A
future cleanup could extract the shared row formatter; this PR keeps
the duplication explicit and contained.

Tears cleanly on mid-stream encoding failures: logs via Log::error()
and stops emitting further rows so the client receives a valid but
truncated CSV rather than a corrupt one.

Tests (23) cover the wire format (header/footer, extract by path /
format / callable, custom delimiter+EOL, BOM-on-first-row-only, excel
preset, setSeparator), input shapes (array, generator, empty), error
paths (tear on unrenderable value, tear on strict transcoding failure,
footer-omitted on tear), encoding (UTF-8 to ISO-8859-1 with strict and
ignore modes), and the flushEvery validation inherited from the base.

The cakephp/cakephp constraint temporarily points at the
feature-abstract-stream-response branch on the dereuromark fork while
the upstream PR is in review. Will be flipped to dev-5.next (or ^5.4
once 5.4 releases) before merge.
Add a "Streaming large exports" section covering: when to reach for the
streaming response over CsvView, the controller-only usage pattern, the
full option list (row formatting reused from CsvView plus the inherited
flushEvery), the excel shorthand, custom filename via withDownload(),
and the tear-cleanly mid-stream error contract.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant