MDEV-39933 JSON_NORMALIZE returns wrong result for invalid JSON with …#5242
MDEV-39933 JSON_NORMALIZE returns wrong result for invalid JSON with …#5242akshatnehra wants to merge 1 commit into
Conversation
…embedded NUL json_normalize() uses strlen() to determine the length of the charset- converted string, but strlen() stops at the first NUL byte. When the input contains embedded NUL bytes after charset conversion (e.g., from latin1 to utf8mb4), the converted string is silently truncated to the valid JSON prefix, causing JSON_NORMALIZE to return a normalized result instead of NULL. Fix: use the return value of my_convert() as the actual converted length instead of strlen(). my_convert() returns the number of bytes written, which correctly accounts for embedded NUL bytes. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
There was a problem hiding this comment.
Code Review
This pull request fixes an issue in json_normalize where invalid JSON data containing embedded NUL bytes or trailing junk was incorrectly handled. By using the return value of my_convert instead of strlen to determine the size of the converted UTF-8 string, the function now correctly identifies and rejects invalid JSON. Test cases have been added to verify this behavior. I have no feedback to provide as there are no review comments.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
This is a bug fix. Please re-base on 10.11.
I also have some further suggestions below.
| @@ -1030,15 +1030,14 @@ json_normalize(DYNAMIC_STRING *result, | |||
| if (!s_utf8) | |||
| return 1; | |||
| memset(s_utf8, 0x00, in_size); | |||
There was a problem hiding this comment.
while you're at it, can you please remove this memset and replace it with a s_utf8[in_size] = 0; after the my_convert call please?
| # MDEV-39933: Incorrect result of JSON_NORMALIZE on invalid json data | ||
| # | ||
| # JSON with trailing literal junk (no NUL) - should return NULL | ||
| SELECT JSON_NORMALIZE('{"a":1}0junk'); |
There was a problem hiding this comment.
is this really reproducing anything? I suppose not.
In my unpatched server:
MariaDB [(none)]> SELECT JSON_NORMALIZE('{"a":1}0junk');
+--------------------------------+
| JSON_NORMALIZE('{"a":1}0junk') |
+--------------------------------+
| NULL |
+--------------------------------+
1 row in set (0.002 sec)
| JSON_NORMALIZE('{"a":1}0junk') | ||
| NULL | ||
| # JSON_VALID correctly rejects JSON with embedded NUL byte + trailing junk | ||
| SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)); |
There was a problem hiding this comment.
this one too:
MariaDB [(none)]> SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1));
+----------------------------------------------------------------------+
| JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)) |
+----------------------------------------------------------------------+
| 0 |
+----------------------------------------------------------------------+
1 row in set (0.003 sec)
| JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0)) USING latin1)) | ||
| NULL | ||
| # Valid JSON through conversion should still work | ||
| SELECT JSON_NORMALIZE(CONVERT('{"a":1}' USING latin1)); |
| JSON_NORMALIZE(CONVERT('{"a":1}' USING latin1)) | ||
| {"a":1.0E0} | ||
| # Multi-byte source charset (utf16) | ||
| SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'x') USING utf16)); |
There was a problem hiding this comment.
this is a bit too much and is not touching on any of the changed lines.
| JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'x') USING utf16)) | ||
| NULL | ||
| # NUL embedded inside a JSON string value | ||
| SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":"b', CHAR(0), 'c"}') USING latin1)); |
| JSON_NORMALIZE(CONVERT(CONCAT('[1,2,3]', CHAR(0), 'x') USING latin1)) | ||
| NULL | ||
| # Nested object - valid through conversion | ||
| SELECT JSON_NORMALIZE(CONVERT('{"a":{"b":1}}' USING latin1)); |
| JSON_NORMALIZE(CONVERT('{"a":{"b":1}}' USING latin1)) | ||
| {"a":{"b":1.0E0}} | ||
| # Empty string | ||
| SELECT JSON_NORMALIZE(CONVERT('' USING latin1)); |
| SELECT JSON_NORMALIZE(CONVERT('{"a":{"b":1}}' USING latin1)); | ||
|
|
||
| --echo # Empty string | ||
| SELECT JSON_NORMALIZE(CONVERT('' USING latin1)); |
There was a problem hiding this comment.
we typically end files with :
--echo End of 10.11 tests.
Description
This PR fixes MDEV-39933 where
JSON_NORMALIZE()incorrectly returns a normalized result instead of NULL for invalid JSON containing embedded NUL bytes after charset conversion.json_normalize()usesstrlen()to determine the converted string length, butstrlen()stops at the first NUL byte. This silently truncates the string to the valid JSON prefix, soJSON_NORMALIZEsees only{"a":1}instead of{"a":1}\0junk— returning a normalized result for whatJSON_VALIDcorrectly rejects.The fix involves:
my_convert()'s return value (actual bytes written) instead ofstrlen(), which correctly accounts for embedded NUL bytes.How can this PR be tested?
Run the MTR test:
Or manually:
Results from my testing
JSON_VALIDsays it's invalid butJSON_NORMALIZEreturns a result — inconsistent.Both functions now agree: invalid JSON returns NULL/0.
Affected versions
Reproduced on 10.11, 11.4, 11.8, and 12.3 (as confirmed in Jira comments). Lowest checked version: 10.11 with commit b89cb5d.
Reproducer:
Basing the PR against the correct MariaDB version
PR quality check
I have checked the
CODING_STANDARDS.mdfile and my PR conforms to this where appropriate.For any trivial modifications to the PR, I am fine with the reviewer making the changes themselves.