Skip to content

MDEV-39933 JSON_NORMALIZE returns wrong result for invalid JSON with …#5242

Open
akshatnehra wants to merge 1 commit into
MariaDB:mainfrom
akshatnehra:MDEV-39933
Open

MDEV-39933 JSON_NORMALIZE returns wrong result for invalid JSON with …#5242
akshatnehra wants to merge 1 commit into
MariaDB:mainfrom
akshatnehra:MDEV-39933

Conversation

@akshatnehra

Copy link
Copy Markdown
Contributor

Description

This PR fixes MDEV-39933 where JSON_NORMALIZE() incorrectly returns a normalized result instead of NULL for invalid JSON containing embedded NUL bytes after charset conversion.

json_normalize() uses strlen() to determine the converted string length, but strlen() stops at the first NUL byte. This silently truncates the string to the valid JSON prefix, so JSON_NORMALIZE sees only {"a":1} instead of {"a":1}\0junk — returning a normalized result for what JSON_VALID correctly rejects.

The fix involves:

  1. Using my_convert()'s return value (actual bytes written) instead of strlen(), which correctly accounts for embedded NUL bytes.

How can this PR be tested?

Run the MTR test:

build/mysql-test/mtr main.mdev_39933

Or manually:

-- JSON_VALID correctly says this is invalid
SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1));
-- Returns: 0

-- JSON_NORMALIZE should agree (return NULL for invalid JSON)
SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1));

Results from my testing

  1. Before changes
MariaDB> SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)) AS result;
+------------+
| result     |
+------------+
| {"a":1.0E0}|
+------------+

MariaDB> SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)) AS valid;
+-------+
| valid |
+-------+
|     0 |
+-------+

JSON_VALID says it's invalid but JSON_NORMALIZE returns a result — inconsistent.

  1. After changes
MariaDB> SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)) AS result;
+--------+
| result |
+--------+
| NULL   |
+--------+

MariaDB> SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)) AS valid;
+-------+
| valid |
+-------+
|     0 |
+-------+

Both functions now agree: invalid JSON returns NULL/0.

Affected versions

Reproduced on 10.11, 11.4, 11.8, and 12.3 (as confirmed in Jira comments). Lowest checked version: 10.11 with commit b89cb5d.

Reproducer:

SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1));

Basing the PR against the correct MariaDB version

  • This is a bug fix and the PR is based against the latest MariaDB development branch.

PR quality check

  • I have checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.

  • For any trivial modifications to the PR, I am fine with the reviewer making the changes themselves.

…embedded NUL

json_normalize() uses strlen() to determine the length of the charset-
converted string, but strlen() stops at the first NUL byte. When the
input contains embedded NUL bytes after charset conversion (e.g., from
latin1 to utf8mb4), the converted string is silently truncated to the
valid JSON prefix, causing JSON_NORMALIZE to return a normalized result
instead of NULL.

Fix: use the return value of my_convert() as the actual converted
length instead of strlen(). my_convert() returns the number of bytes
written, which correctly accounts for embedded NUL bytes.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue in json_normalize where invalid JSON data containing embedded NUL bytes or trailing junk was incorrectly handled. By using the return value of my_convert instead of strlen to determine the size of the converted UTF-8 string, the function now correctly identifies and rejects invalid JSON. Test cases have been added to verify this behavior. I have no feedback to provide as there are no review comments.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 16, 2026
@gkodinov gkodinov self-assigned this Jun 16, 2026

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

This is a bug fix. Please re-base on 10.11.
I also have some further suggestions below.

Comment thread strings/json_normalize.c
@@ -1030,15 +1030,14 @@ json_normalize(DYNAMIC_STRING *result,
if (!s_utf8)
return 1;
memset(s_utf8, 0x00, in_size);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while you're at it, can you please remove this memset and replace it with a s_utf8[in_size] = 0; after the my_convert call please?

# MDEV-39933: Incorrect result of JSON_NORMALIZE on invalid json data
#
# JSON with trailing literal junk (no NUL) - should return NULL
SELECT JSON_NORMALIZE('{"a":1}0junk');

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really reproducing anything? I suppose not.
In my unpatched server:

MariaDB [(none)]> SELECT JSON_NORMALIZE('{"a":1}0junk');
+--------------------------------+
| JSON_NORMALIZE('{"a":1}0junk') |
+--------------------------------+
| NULL                           |
+--------------------------------+
1 row in set (0.002 sec)

JSON_NORMALIZE('{"a":1}0junk')
NULL
# JSON_VALID correctly rejects JSON with embedded NUL byte + trailing junk
SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one too:

MariaDB [(none)]> SELECT JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1));
+----------------------------------------------------------------------+
| JSON_VALID(CONVERT(CONCAT('{"a":1}', CHAR(0), 'junk') USING latin1)) |
+----------------------------------------------------------------------+
|                                                                    0 |
+----------------------------------------------------------------------+
1 row in set (0.003 sec)

JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0)) USING latin1))
NULL
# Valid JSON through conversion should still work
SELECT JSON_NORMALIZE(CONVERT('{"a":1}' USING latin1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to re-test that IMHO

JSON_NORMALIZE(CONVERT('{"a":1}' USING latin1))
{"a":1.0E0}
# Multi-byte source charset (utf16)
SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'x') USING utf16));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit too much and is not touching on any of the changed lines.

JSON_NORMALIZE(CONVERT(CONCAT('{"a":1}', CHAR(0), 'x') USING utf16))
NULL
# NUL embedded inside a JSON string value
SELECT JSON_NORMALIZE(CONVERT(CONCAT('{"a":"b', CHAR(0), 'c"}') USING latin1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also this. too much IMHO

JSON_NORMALIZE(CONVERT(CONCAT('[1,2,3]', CHAR(0), 'x') USING latin1))
NULL
# Nested object - valid through conversion
SELECT JSON_NORMALIZE(CONVERT('{"a":{"b":1}}' USING latin1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this testing?

JSON_NORMALIZE(CONVERT('{"a":{"b":1}}' USING latin1))
{"a":{"b":1.0E0}}
# Empty string
SELECT JSON_NORMALIZE(CONVERT('' USING latin1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this testing?

SELECT JSON_NORMALIZE(CONVERT('{"a":{"b":1}}' USING latin1));

--echo # Empty string
SELECT JSON_NORMALIZE(CONVERT('' USING latin1));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we typically end files with :

--echo End of 10.11 tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants