Skip to content

Throw on iconv() failure and default fputcsv escape to empty string#149

Merged
dereuromark merged 1 commit into
5.xfrom
feature-iconv-strict
May 12, 2026
Merged

Throw on iconv() failure and default fputcsv escape to empty string#149
dereuromark merged 1 commit into
5.xfrom
feature-iconv-strict

Conversation

@dereuromark
Copy link
Copy Markdown
Member

@dereuromark dereuromark commented May 11, 2026

Summary

Two small correctness fixes around the row writer. Two commits, one per fix — revert independently if needed.

1. iconv() failure throws instead of silently dropping a row

When dataEncoding !== csvEncoding and the iconv path is selected, _generateRow() previously assigned iconv(...) straight into $csv. iconv() can return false (unsupported target encoding, malformed source bytes); that false was concatenated into the accumulator where (string)false === '', producing a row of nothing with no exception, no log, no signal to the user. Their export looked successful but quietly contained empty rows.

Now we check the return value and throw a CakeException naming the source and target encodings so the failure is impossible to miss.

Regression test: testIconvFailureThrows configures a bogus target encoding to force iconv() to return false and asserts the exception fires (wrapped by SerializedView's SerializationFailureException).

2. Default escape is now '' (RFC 4180, PHP 8.4 compliant)

PHP 8.4 deprecates passing a non-empty $escape to fputcsv(). The shipped default was the legacy '\\' value introduced by #145 specifically to make the parameter explicit — but on PHP 8.4 this triggers E_DEPRECATED for every rendered CSV. PHP plans to make it a hard error in a future release.

Switching the default to '':

  • Silences the deprecation today.
  • Produces RFC 4180-compliant output: quotes inside a field are escaped by doubling them ("") instead of by a preceding backslash.
  • Users who want the legacy backslash behavior can still set escape => '\\' explicitly (and will see the matching PHP deprecation; documented in the docblock).

Behavior change: when a field contains a literal backslash immediately before the enclosure character, the output now uses doubled-quote escaping instead of backslash-quote. In practice every mainstream CSV consumer (Excel, LibreOffice, Google Sheets, PHP's own fgetcsv) handles both correctly. Worth a CHANGELOG note at the next release.

Regression test: testDefaultEscapeIsRfc4180 renders a field with embedded double-quotes and asserts (a) the output matches the doubled-quote form and (b) PHP did not emit E_DEPRECATED during render.

=> new minor release

CI

The cs-stan job is expected to fail on this PR for the same pre-existing reason #147 is fixing (squizlabs/php_codesniffer v4 vs slevomat 8.29 incompatibility). Nothing in this PR affects code style; once #147 lands, rebasing makes CI green.

Relationship to other open PRs

Independent of:

Any of the three can land first; all three are clean low-risk additions on top of pristine 5.x.

@dereuromark dereuromark requested review from ADmad May 11, 2026 15:32
Comment thread src/View/CsvView.php Outdated
@dereuromark
Copy link
Copy Markdown
Member Author

Addressed in 54ab09b. Added a transcodingMode config option with three values exposed as class constants:

  • TRANSCODING_MODE_STRICT (default): throws on iconv failure (the original behavior of this PR).
  • TRANSCODING_MODE_IGNORE: passes //IGNORE to iconv (or mb_substitute_character('none') for mbstring) so unconvertible bytes are dropped and the rest of the row continues.
  • TRANSCODING_MODE_TRANSLITERATE: passes //TRANSLIT//IGNORE so accented Latin transliterates to ASCII where possible.

Tests for both ignore and transliterate paths added. Moved the conversion into a new _transcode() hook for cleanliness and so apps can override per-view.

@dereuromark dereuromark force-pushed the feature-iconv-strict branch from 54ab09b to c1c75fb Compare May 11, 2026 18:31
Three related changes to the row writer in `_generateRow()` /
`_transcode()`:

1. iconv() returning false (e.g. unsupported target encoding,
   malformed source bytes) previously got concatenated into the
   accumulator where `(string)false === ''` silently produced an
   empty row. Now throws a CakeException in `strict` mode.

2. PHP 8.4 deprecates passing a non-empty `$escape` to fputcsv().
   The shipped default was the legacy backslash, which triggers
   E_DEPRECATED on every render under 8.4+. Default is now `''`
   (RFC 4180-compliant: quotes inside a field are escaped by
   doubling them rather than by a preceding backslash). Users on
   legacy PHP-style escaping can still set `escape => '\\'`.

3. Per review: add a configurable `transcodingMode` option so
   apps that cannot fix bad input can opt out of the throw. Three
   class-constant values:

   - `TRANSCODING_MODE_STRICT` (default): throws on iconv failure.
   - `TRANSCODING_MODE_IGNORE`: passes `//IGNORE` to iconv so
     unconvertible characters are dropped and the rest of the row
     is preserved. For mbstring, `mb_substitute_character('none')`
     is set for the duration of the call.
   - `TRANSCODING_MODE_TRANSLITERATE`: passes `//TRANSLIT//IGNORE`
     so accented Latin etc. transliterate to ASCII; mbstring
     falls back to ignore (no transliteration support).

The transcoding logic moves into a new protected `_transcode()`
hook so apps can override the policy per-view.

The iconv warning emitted right before a strict-mode failure is
consumed by a scoped `set_error_handler()` so the user sees the
clear CakeException rather than two near-duplicate signals; this
also keeps PHPUnit's failOnWarning bar happy.

Tests: iconv strict throw, ignore drops unconvertible chars,
transliterate converts accented Latin, and fputcsv default emits
RFC 4180 escaping with no E_DEPRECATED.
@dereuromark dereuromark force-pushed the feature-iconv-strict branch from 4414975 to c28e4aa Compare May 11, 2026 19:33
@dereuromark dereuromark merged commit 03202e3 into 5.x May 12, 2026
7 checks passed
@dereuromark dereuromark deleted the feature-iconv-strict branch May 12, 2026 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants