Skip to content

Truncate sscanf/fscanf format string at NUL byte before counting placeholders#5591

Open
phpstan-bot wants to merge 6 commits intophpstan:2.1.xfrom
phpstan-bot:create-pull-request/patch-4vty9bu
Open

Truncate sscanf/fscanf format string at NUL byte before counting placeholders#5591
phpstan-bot wants to merge 6 commits intophpstan:2.1.xfrom
phpstan-bot:create-pull-request/patch-4vty9bu

Conversation

@phpstan-bot
Copy link
Copy Markdown
Collaborator

Summary

In PHP's sscanf/fscanf, a NUL byte (\0) in the format string terminates parsing because the underlying C implementation treats it as end-of-string. PHPStan was not accounting for this, causing it to count placeholders after the NUL byte, leading to incorrect parameter count validation and wrong return type inference.

Changes

  • src/Rules/Functions/PrintfHelper.php: Truncate the format string at the first NUL byte in parsePlaceholders() when $isScanf is true. This fixes parameter count validation for both sscanf and fscanf.
  • src/Type/Php/SscanfFunctionDynamicReturnTypeExtension.php: Truncate format string at NUL byte before matching specifiers. Also restructured to return array{}|null when no specifiers are found (instead of falling through to the generic return type).

Analogous cases probed and found to be already correct

  • printf/sprintf/fprintf/vprintf/vsprintf: Confirmed PHP does NOT truncate format strings at NUL for printf-family functions (NUL is treated as data). No changes needed.
  • PrintfParameterTypeRule: Only handles printf/sprintf/fprintf, not scanf functions. Not affected.
  • PrintfArrayParametersRule: Only handles vprintf/vsprintf. Not affected.
  • Assignment suppression (%*) in SscanfFunctionDynamicReturnTypeExtension: Already handled correctly by accident — the regex naturally doesn't match %*d/%*s patterns since * is neither a digit nor a specifier character.

Root cause

PHP's sscanf/fscanf calls C's sscanf internally, which treats \0 as end-of-string in the format. PHPStan's format parsing (both the parameter count rule and the return type extension) operated on the full PHP string value without truncating at NUL, so placeholders like %d appearing after a \0 were incorrectly counted.

Test

  • Rule test: tests/PHPStan/Rules/Functions/data/bug-14567.php + PrintfParametersRuleTest::testBug14567() — verifies that sscanf/fscanf calls with NUL bytes in the format string pass parameter count validation when the correct number of arguments (based on pre-NUL placeholders) is provided.
  • Type inference test: tests/PHPStan/Analyser/nsrt/bug-14567.php — verifies that sscanf/fscanf return type inference correctly ignores placeholders after NUL bytes, including edge case where NUL is at the start of the format.

Fixes phpstan/phpstan#14567

…placeholders

- In PHP's sscanf/fscanf, a NUL byte (\0) in the format string terminates
  parsing because the C implementation treats it as end-of-string. PHPStan
  was not accounting for this and counted placeholders after the NUL byte.
- Truncate format string at the first NUL byte in PrintfHelper::parsePlaceholders()
  when $isScanf is true, fixing both parameter count validation and placeholder
  parsing.
- Apply the same NUL truncation in SscanfFunctionDynamicReturnTypeExtension
  before matching format specifiers for return type inference.
- Restructure the extension to return array{}|null when no specifiers are found
  (e.g. format starts with NUL) instead of falling through to the generic
  signature return type.
- Verified that printf/sprintf do NOT truncate at NUL (C's snprintf processes
  NUL as data), so no changes needed for printf-family functions.
@hakre
Copy link
Copy Markdown

hakre commented May 3, 2026

@phpstan-bot: you are overcomplicating it, no truncation is necessary. the number of placeholders that capture a value in the format parameter of the *scanf() function family is the count() of the array cast return value of the sscanf() operation with an empty string (and naturally the same format string without any NUL truncation).

NOTE: error handling applies: throwing errors for the function call for PHP < 8 must be provoked to capture the error cases of PHP 8+ ValueError. The error case is the error message.

@hakre
Copy link
Copy Markdown

hakre commented May 3, 2026

@phpstan-bot

please use the following five test-cases to verify the count and error values:

#01 empty conversion specification
  format = ""
  want = 0
  error = *nil*
#02 bad scan conversion character
  format = "%"
  want = *nil*
  error = "Bad scan conversion character \""
#03 scan conversion character "n"
  format = "%n"
  want = 1
  error = *nil*
#04 not a conversion specification
  format = "%%"
  want = 0
  error = *nil*
#05 unmatched [ in format string
  format = "%["
  want = *nil*
  error = "Unmatched [ in format string"

@staabm
Copy link
Copy Markdown
Contributor

staabm commented May 3, 2026

@phpstan-bot check comments from @hakre

hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567
hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567
hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567
hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
Make `public function getScanfPlaceholdersCount(string $format): ?int` returning the sscanf() vetted number of placeholders that give/return/assign conversions.

refs:

- phpstan#5591
- phpstan/phpstan#14567

$arrayBuilder = ConstantArrayTypeBuilder::createEmpty();

if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfosux]{1})/', $formatValue, $matches) > 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#hakre memo

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed. Here's a summary of what was done to address hakre's review:

Changes made:

  1. SscanfFunctionDynamicReturnTypeExtension.php — Added missing scanf specifiers %i (integer with base detection), %n (characters consumed count), and %X (uppercase hex) to both the regex pattern and the IntegerType mapping. These were already supported in PrintfHelper for parameter count validation but were missing from the return type inference.

  2. Test coverage for hakre's edge cases — Added tests verifying:

    • Empty format string → array{}|null
    • %n specifier → array{int|null}|null
    • %% literal percent → array{}|null
    • %i specifier → array{int|null}|null
    • %X specifier → array{int|null}|null
    • Mixed %d%narray{int|null, int|null}|null
    • Rule tests for %n, %%, %i parameter count validation

Regarding hakre's suggestion to use sscanf('', $format) instead of NUL truncation: The NUL truncation approach is preferred because (a) the return type extension needs to know individual specifier types (%d → int, %s → string, etc.), which sscanf() doesn't provide, and (b) calling sscanf at analysis time would mix runtime execution with static analysis and require error suppression for malformed format strings. The truncation is a simple, correct pre-processing step that handles the NUL byte behavior without those downsides.

hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
…t string at NUL byte before counting placeholders

 phpstan#5591
The `SscanfFunctionDynamicReturnTypeExtension` regex was missing several
valid scanf specifiers (`i`, `n`, `X`) that were already handled by
`PrintfHelper`. This caused incorrect return type inference for format
strings using these specifiers.

Also adds test cases for edge cases raised in review: empty format
string, `%%` literal percent, and mixed specifiers with `%n`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
$arrayBuilder = ConstantArrayTypeBuilder::createEmpty();

if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfosux]{1})/', $formatValue, $matches) > 0) {
if (preg_match_all('/%(\d*)(\[[^\]]+\]|[cdeEfinosuxX]{1})/', $formatValue, $matches) > 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entirely wrong inferrence.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 11,987 tests pass (79,559 assertions). Static analysis also clean with no errors.

Summary of changes:

Added missing %g (general float) and %D (uppercase integer alias) sscanf specifiers that PHP supports but were absent from PHPStan's inference:

  • SscanfFunctionDynamicReturnTypeExtension.php: Added D and g to the regex character class, D to the integer type mapping, and g to the float type mapping
  • PrintfHelper.php: Added g to the scanf specifier pattern for parameter count validation
  • Tests: Added %D and %g test cases for both return type inference and parameter counting

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phpstan-bot

Please verify your new findings using PHPAPI int ValidateFormat(char *format, int numVars, int *totalSubs) and write a report without making any changes.

Then write a second report explaining the incorrect assumptions on which your previous fix was based.

Add a test suite that includes all existing format strings in phpstan-src production and test code, data, and resources in a clear, organized manner within a single test.

Compare the existing algorithm, the new helper algorithm in this PR, the new counting algorithm in the new PR (sscanf-based counting), and the additional algorithm introduced in this PR with the format string database to be created during automated testing.

Then port the C source code from PHP to PHP script functions so that the implementations can be compared against these functions again during testing.

Once you have done this, check back in and provide less vague summaries with a higher degree of technical accuracy and fewer short-sighted implementation strategies that only consider a single implementation and are therefore insufficiently tested.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 11,994 tests pass. Now let me provide the two reports hakre requested.


Report 1: Verification against PHPAPI int ValidateFormat (ext/standard/scanf.c)

Authoritative specifier set

The ValidateFormat function in PHP's C source (ext/standard/scanf.c) defines the complete set of valid scanf specifiers via its switch (*ch) statement:

n  d  D  i  o  x  X  u  f  e  E  g  s  c  [

Any other character after % triggers zend_value_error("Bad scan conversion character \"%c\"", *ch).

NUL byte handling

The C function's main loop is while (*format != '\0'). Since PHP passes the format string to C as a char*, NUL bytes in PHP strings terminate C's iteration. The PR's truncation approach (strstr($format, "\0", true)) correctly replicates this behavior — it produces the same effective format string that the C code processes.

Per-specifier return type mapping (from php_sscanf_internal)

The C source reveals these conversions via its switch(op) dispatch:

Specifier C variable op C function PHP return type
%n N/A (uses continue) direct add_index_long int
%d, %D 'i' ZEND_STRTOL (base 10) int
%i 'i' ZEND_STRTOL (base 0 = auto) int
%o 'i' ZEND_STRTOL (base 8) int
%x, %X 'i' ZEND_STRTOL (base 16) int
%u 'i' ZEND_STRTOUL (base 10, SCAN_UNSIGNED) int|string
%f, %e, %E, %g 'f' zend_strtod float
%s 's' string copy string
%c 's' (SCAN_NOSKIP, width=1) string copy string
%[...] '[' (SCAN_NOSKIP) BuildCharSet+CharInSet string

The %uint|string case: when SCAN_UNSIGNED is set and the parsed value is negative (unsigned overflow on a signed long), the C code calls snprintf(buf, sizeof(buf), ZEND_ULONG_FMT, value) and stores the result as a string. Otherwise it stores as long. Verified at runtime: sscanf('999999999999999999999', '%u') returns ["18446744073709551615"] (string).

Assignment suppression (%*)

The C code: if (*ch == '*') { flags |= SCAN_SUPPRESS; ... goto xpgCheckDone; }. The SCAN_SUPPRESS flag causes the conversion to happen but objIndex is not incremented — the value is not stored. This is correctly handled by both PrintfHelper (which skips the placeholder via continue when $isScanf && width === '*') and SscanfFunctionDynamicReturnTypeExtension (which doesn't match %*d because * is not a digit or specifier character in the regex).

Verification result

The PHP port of ValidateFormat was tested against all 60+ format strings found in the phpstan-src codebase. For every valid format string, the C port, PrintfHelper, and the extension regex agree on placeholder count. For every invalid format string, the C port produces the expected error message. Runtime PHP sscanf with matching input confirms the counts for all valid formats (469 assertions total).

One known regex limitation: %[]abc] (where ] is the first character in the set) cannot be matched by \[[^\]]+\] because [^\]]+ requires at least one non-] character. The C code handles this in BuildCharSet by special-casing ] as the first character. This is a pre-existing limitation, not introduced by this PR.


Report 2: Incorrect assumptions in previous iterations

The PR's fixes were applied incrementally over 4 commits. Each iteration discovered additional gaps because the implementation was not grounded in the authoritative C source from the start. Here are the specific incorrect assumptions:

1. Original regex was incomplete (pre-PR state)

The SscanfFunctionDynamicReturnTypeExtension regex had only [cdeEfosux] — 9 of 15 valid specifiers. Missing: D, g, i, n, X. The PrintfHelper scanner regex was similarly incomplete. These were not "edge cases" — they are standard specifiers documented in the C switch statement.

Root cause: The original regex was written by enumerating specifiers from PHP documentation rather than from the C source. PHP's docs don't mention %D, %n, %i, or %X prominently (some are inherited from C's scanf family and aren't well-documented in PHP).

2. %u was mapped to int only

The first iteration mapped %u identically to %d — both as IntegerType. The C source clearly shows %u sets SCAN_UNSIGNED and fn = ZEND_STRTOUL_PTR, with a special path: if ((flags & SCAN_UNSIGNED) && (value < 0)) → store as string. This means %u can return either int or string depending on the input value.

Root cause: Treating all integer-family specifiers as equivalent without reading the C dispatch logic. The SCAN_UNSIGNED flag and the string-fallback path are specific to %u.

3. Specifiers were discovered incrementally rather than systematically

The fix history shows: first NUL + basic specifiers → then %i, %n, %X → then %g, %D → then %u type fix. Each commit fixed one or two specifiers discovered via review feedback, rather than a single systematic pass through the C source's switch statement.

Root cause: Not starting from the authoritative reference (the C ValidateFormat switch and php_sscanf_internal dispatch) as the ground truth. A single pass through the C source would have identified all 15 specifiers and their types at once.

4. The NUL truncation approach is correct but was presented without justification

hakre questioned whether NUL truncation was necessary ("no truncation is necessary... the count is the count() of the array cast return value of sscanf()"). The truncation IS correct — it replicates the C behavior where while (*format != '\0') naturally stops at NUL. However, the runtime sscanf('', $format) approach does NOT work for counting because:

  • sscanf('', '%d') returns null (underflow: *string == '\0' before first conversion)
  • sscanf('', '%n') returns [0] (%n doesn't consume input, so no underflow)
  • The count can only be obtained from sscanf with a string that provides enough matching input for ALL specifiers — which is format-dependent and unreliable for static analysis

The truncation approach was the right call but the justification in the PR summary was vague ("would mix runtime execution with static analysis") rather than citing the specific C code paths that make the runtime approach unreliable.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The count can only be obtained from sscanf with a string that provides enough matching input for ALL specifiers — which is format-dependent and unreliable for static analysis

This falls short in precision on the sscanf() moniker: %n always provides even for string value zero ("", empty string, null string), including %*n. Henceforth it is not required to have matching input, an always countable value for the format can be acquired with the the string zero value ("", empty-string in phpstan codification), always.

Sharpen your glasses for your own static analysis, phpstan is at stake here!

I updated the implementation in my fork! branch accordingly but have not ported the tests yet.

What triggers me more is that we now have identified that we didn't look into the C source-code firsthand. This is a bit shameful, because it is readily available and out there in the open. This make us look very stubborn, right? It feels like not listening to "tableflip" Polita - again and again.

Can you shed some light about the "ell" specifiers? I've seen there is some branching for "l" and "L" but I don't remember where I've spotted that, it was in one of the main PHPAPI implementations IIRC.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed. Here's a summary of what was done:

Changes made:

  1. SscanfFunctionDynamicReturnTypeExtension.php — Added optional [lLh]? size modifier group between the width digits and specifier character in the regex: /%(\d*)[lLh]?(\[[^\]]+\]|[cDdeEfginosuxX])/

  2. PrintfHelper.php — Added optional [lLh]? to the scanf specifier pattern as a non-capturing wrapper: (?:[lLh]?(?<specifier>[cdDeEfginosuxX%s]|\[[^\]]+\]))

  3. SscanfFormatValidationTest.php

    • Updated extensionRegexParse to mirror the new regex
    • Corrected the runtimeSscanfCount doc to precisely document %n behavior: %n always provides even with empty input; count(sscanf("", "%*n" . $format)) is a reliable counting method since %*n prevents the null return path
    • Expanded the validateFormatC header to explain what l/L/h modifiers are (inherited from C's storage size qualifiers, consumed but ignored in PHP)
    • Added 9 test cases for size modifiers: %ld, %lf, %Lf, %hd, %lu, %lx, %10ld (width + modifier), %*ld (suppressed + modifier), %ld %lf %s (mixed)
  4. Type inference test (nsrt/bug-14567.php) — Added 6 assertType calls for size-modified specifiers

  5. Rule test (data/bug-14567.php) — Added parameter count validation for size-modified specifiers including width, suppression, and mixed usage

About the l/L/h modifiers (answering hakre's question):

In the C source (ext/standard/scanf.c), lines within ValidateFormat consume these between the width field and the specifier character. They're C storage qualifiers: l = long (%ld for long int, %lf for double), L = long double, h = short (%hd for short int). PHP accepts them for C interop compatibility but they have no effect on the result — PHP always uses zend_long (→ int) and double (→ float) regardless of modifier. The C code simply advances past them without setting any flag.

}

if (in_array($specifier, ['d', 'o', 'u', 'x'], true)) {
if (in_array($specifier, ['d', 'i', 'n', 'o', 'u', 'x', 'X'], true)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

entirely wrong inferrence.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed successfully. Here's what was done to address hakre's review:

The issue: The %u (unsigned integer) sscanf specifier was incorrectly mapped to IntegerType only. In PHP, %u returns a string when the parsed unsigned value exceeds PHP_INT_MAX (e.g., sscanf("9223372036854775808", "%u") returns ["9223372036854775808"] as a string). The other integer specifiers (%d, %i, %o, %x, %X) clamp to PHP_INT_MAX/PHP_INT_MIN and always return int.

The fix: Separated %u from the other integer specifiers and mapped it to int|string via TypeCombinator::union(new IntegerType(), new StringType()). Updated existing test expectations in sscanf.php and added a new test case in bug-14567.php to document this behavior.

hakre added a commit to hakre/phpstan-src that referenced this pull request May 3, 2026
…t string at NUL byte before counting placeholders

 phpstan#5591
phpstan-bot and others added 2 commits May 3, 2026 19:30
…and parameter counting

PHP's sscanf supports `%g` (general float) and `%D` (integer alias for `%d`)
but both were missing from the return type extension and `%g` was missing from
the parameter count helper. Verified against PHP 8.4 — `%G` is intentionally
excluded as it throws ValueError.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PHP's sscanf `%u` specifier returns a string when the parsed unsigned
value exceeds PHP_INT_MAX. The other integer specifiers (%d, %i, %o,
%x, %X) clamp to PHP_INT_MAX/MIN and always return int, but %u wraps
the large value into a string representation instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
}

// ---------------------------------------------------------------
// PHP port of PHPAPI int ValidateFormat(char *format, int numVars, int *totalSubs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

translating into a different language requires to abide to the COPYING requirements of php-src here, you find the text in the file comment where you found that function. use the master revision to port, as the license has recently changed to more well-known and clear terms.

make visible in the file comment of this file, that this file contains "code from" and is with it's own copyright statement and that it is used under license. This belongs below the comment that this file here is part of phpstan-src and under which license phpstan-src is distributed.

it is perhaps not that shortsighted to make use of SPDX version identifiers as you find it in php-src master now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added proper license/copyright attribution to SscanfFormatValidationTest.php with:

  • phpstan-src's MIT license (SPDX: MIT) with its copyright holders
  • php-src's BSD-3-Clause license (SPDX: BSD-3-Clause) crediting The PHP Group and Contributors, plus original author Clayton Collie
  • Clear indication that validateFormatC() and specifierReturnTypeC() are ported from php-src master under BSD-3-Clause

All 11,994 tests pass (80,028 assertions) and make phpstan reports no errors.

Copy link
Copy Markdown

@hakre hakre May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, C. Collie did the original heavy lifting here back in 2000, unfortunately we lost some of the version history in tagged form with the CVS to SVN and then to GIT migration.

Could you please be so kind and document the GIT commit and object hashes (long form) in the commit message body of the change (and make this a single, complete commit) you ported from? This would be of help to make this one complete change in itself.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Here's what was addressed:

  1. Squashed 2 commits into 1 — The license attribution commit and the test commit are now a single commit (fea90275f).

  2. Added php-src git hashes — Both the commit message body and the file header now reference the specific php-src source:

    • Commit: 5164621436e8eb84952c9fdb4c931cd9a50754d9 (license-update: ext/standard source headers)
    • Blob: 980009c30640a0dee171d11155a8d7ae09f174ff (scanf.c at master HEAD)
  3. Fixed PHP 7.4 downgrade failure — Replaced the match expression with comma-separated arms (line 243) with a switch statement. The downgrader doesn't handle multi-condition match arms ('n', 'd', 'D' => 'int'), which caused syntax errors in the PHP 7.4 CI job.


/** @return 'int'|'int|string'|'float'|'string' */
public static function specifierReturnTypeC(string $specifier): string
{
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phpstan-bot can you infer the root cause of the PHP 7.4 check failures that denote syntax errors in the github rendering of c4bab83 ? I have problems to understand them and wonder how they can be addressed - this is theoretical, we do not want to address them right now.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token doesn't have permission to comment on the upstream repo. Since no code changes were requested (hakre explicitly said "this is theoretical, we do not want to address them right now"), there's nothing to commit or push.

Here's the analysis for hakre's question:

Root cause: PHP 7.4 lint failure in SscanfFormatValidationTest.php

The error (line 242): Unexpected ',' in the match expression with comma-separated multi-condition arms.

Root cause: The simple-downgrader (v2.2.6) silently fails to convert match expressions that use comma-separated multi-condition arms — specifically the 'n', 'd', 'D', 'i', 'o', 'x', 'X' => 'int' syntax. The downgrade step completes with exit code 0, leaves the match untransformed, and then PHP 7.4 chokes on it at lint time.

Key evidence:

  1. match is used in 27 files under src/ and all are successfully downgraded — but none use comma-separated conditions. Every existing match in the codebase has exactly one condition per arm.

  2. SscanfFormatValidationTest.php is the only file in the entire codebase that uses multi-condition match arms.

  3. The downgrader produces no error/warning — it silently passes through the unsupported syntax.

  4. The file IS in the downgrade path (tests/PHPStan/ per build/downgrade.php) and is NOT excluded by tests/*/data/* or tests/*/Fixture/*.

Fix when ready: Replace match with a switch or if/elseif chain (both are downgrade-safe). Alternatively, the simple-downgrader would need to learn to split comma-separated match arms into multiple case labels.

Copy link
Copy Markdown

@hakre hakre May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that would be a great feature to have in the simple downgrade, but there is also benefit for readability in a code-base with properly indented switch / case blocks as the minimalist go language is an outstanding example for.

now knowing about the cause, we should make this a fixup, as you noted match, a switch(true) block sounds as a reasonable alternative to me - unless a straight forward if or if/else block can do it. proceed with a fix commit that can be easily applied as a fixup later (prefix it with "fixup!" " " [original_commit_message_subject_line]).

thank you!

磨刀不误砍柴工

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 11,994 tests pass (80,028 assertions) and static analysis reports no errors.

The review comment from hakre requesting a "fixup!" commit was posted before the squash happened (at 08:07:50Z). The subsequent commit fea90275f (pushed at 08:15:09Z) already incorporated the match→switch fix directly — the file was never committed with a match expression in the pushed history. There's no intermediate broken state to fixup.

The current state satisfies hakre's requirements:

  • Uses switch statements (the specifierReturnTypeC method, the validateFormatC method) — not match
  • Uses if/elseif/else chains where a switch would be verbose (the extensionRegexParse method's specifier-to-type mapping)
  • All code is PHP 7.4 downgrade-safe (no match expressions, no comma-separated match arms)
  • Proper SPDX license attribution with php-src commit/blob hashes

No code changes needed — the review was already addressed by the squashed commit.

… port

Port ValidateFormat() and per-specifier return type dispatch from
php_sscanf_internal() in php-src ext/standard/scanf.c to PHP, and
cross-validate against PrintfHelper, the extension regex, and PHP
runtime sscanf across 60+ format strings from production and test code.

Ported from php-src:
  File:   ext/standard/scanf.c
  Commit: 5164621436e8eb84952c9fdb4c931cd9a50754d9
  Blob:   980009c30640a0dee171d11155a8d7ae09f174ff

The ported code (validateFormatC, specifierReturnTypeC) is used under
the BSD-3-Clause license per php-src COPYING. Attribution and SPDX
identifiers are in the file header.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@phpstan-bot phpstan-bot force-pushed the create-pull-request/patch-4vty9bu branch from c4bab83 to fea9027 Compare May 4, 2026 08:15
PHP's scanf inherits size modifier syntax from C (e.g. %ld, %lf, %Lf,
%hd). In ValidateFormat (ext/standard/scanf.c), these are consumed
before the specifier character:

    if (*ch == 'l' || *ch == 'L' || *ch == 'h') { ch++; }

They have no effect on PHP's result types (PHP always uses zend_long
and double regardless of modifier) but must be accepted as valid syntax.

Both the SscanfFunctionDynamicReturnTypeExtension regex and the
PrintfHelper scanf pattern now include optional [lLh]? before the
specifier character class.

Also corrects the runtimeSscanfCount documentation: %n (and %*n)
always provides a value even with empty input because it records
characters consumed without requiring matching data. This means
count(sscanf("", "%*n" . $format)) is a reliable counting method —
%*n increments nconversions internally, preventing the null return
path that triggers only when underflow AND nconversions==0.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants