Skip to content

MDEV-39970 Fix JSON_KEYS failing to deduplicate keys with escaped cha…#5245

Open
akshatnehra wants to merge 1 commit into
MariaDB:mainfrom
akshatnehra:MDEV-39970
Open

MDEV-39970 Fix JSON_KEYS failing to deduplicate keys with escaped cha…#5245
akshatnehra wants to merge 1 commit into
MariaDB:mainfrom
akshatnehra:MDEV-39970

Conversation

@akshatnehra

Copy link
Copy Markdown
Contributor

Description

This PR fixes MDEV-39970 where JSON_KEYS() fails to deduplicate keys that contain escaped characters (like \" or \\).

check_key_in_list() compares keys byte-by-byte using " as the end-of-key delimiter. When keys contain escape sequences (e.g., a\"b), the \" inside the key is misidentified as the closing delimiter, causing the comparison to mismatch and identical keys to appear as duplicates in the output.

The fix involves:

  1. Handling escape sequences in the comparison loop — when \ is encountered, the next character is consumed as part of the escape sequence rather than checked as a delimiter
  2. Applying the same escape-awareness to the skip-to-end loop that advances past non-matching keys

How can this PR be tested?

Run the MTR test:

build/mysql-test/mtr main.mdev_39970

Or manually:

-- Input: {"a\"b":1,"a\"b":2} — same key twice with escaped quote
SET @j=CONVERT(X'7B22615C2262223A312C22615C2262223A327D' USING utf8mb4);
SELECT JSON_KEYS(@j);
-- Expected: ["a\"b"]  (deduplicated to one key)

Results from my testing

  1. Before changes
MariaDB> SET @j=CONVERT(X'7B22615C2262223A312C22615C2262223A327D' USING utf8mb4);
MariaDB> SELECT @j AS input;
+-------------------------+
| input                   |
+-------------------------+
| {"a\"b":1,"a\"b":2}    |
+-------------------------+

MariaDB> SELECT JSON_KEYS(@j) AS result;
+--------------------+
| result             |
+--------------------+
| ["a\"b", "a\"b"]   |
+--------------------+

Duplicate key a"b appears twice — deduplication failed.

  1. After changes
MariaDB> SELECT JSON_KEYS(@j) AS result;
+-----------+
| result    |
+-----------+
| ["a\"b"]  |
+-----------+

Duplicate key correctly deduplicated to one entry.

Basing the PR against the correct MariaDB version

  • This is a bug fix and the PR is based against the latest MariaDB development branch.

PR quality check

  • I have checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am fine with the reviewer making the changes themselves.

…racters

check_key_in_list() compares keys byte-by-byte looking for "
as the key delimiter. When keys contain escaped characters (like
\" or \\), the unescaped " inside the key is misidentified as
the end-of-key delimiter, causing false mismatches and failing to
deduplicate identical keys.

Fix: handle escape sequences in both the comparison loop and the
skip-to-end loop. When a backslash is encountered, consume it along
with the following character as a single escaped unit, preventing
the escaped character from being mistaken for a delimiter.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39970, fixing an issue where JSON_KEYS fails to deduplicate keys containing escaped characters. It updates the key comparison logic in check_key_in_list within sql/item_jsonfunc.cc to correctly handle backslashes and escaped quotes, and adds comprehensive test cases. The review feedback highlights critical safety issues in the new logic, specifically pointing out potential out-of-bounds reads and false-positive matches during key comparison, as well as a potential infinite loop or out-of-bounds read when skipping non-matching keys in malformed JSON strings. Both comments suggest adding robust bounds checks to prevent these issues.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/item_jsonfunc.cc
Comment on lines +4129 to 4144
for (n_char=0; n_char < key_len; n_char++)
{
if (c[n_char] == '\\')
{
if (key[n_char] != '\\')
break;
n_char++;
if (n_char >= key_len || c[n_char] != key[n_char])
break;
continue;
}
if (c[n_char] == '"')
break;
if (c[n_char] != key[n_char])
break;
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There are two issues here:

  1. Out-of-bounds read: If key_len is larger than the remaining characters in c (i.e., c + n_char > end), accessing c[n_char] can lead to an out-of-bounds read.
  2. False positive match: If key ends with a backslash (e.g., a\) and c has an escaped character at that position (e.g., a\"), n_char is incremented to key_len and we break. However, since c[n_char] is " (the escaped quote), the post-loop check n_char == key_len && c[n_char] == '"' succeeds, leading to a false positive match.

We should add bounds checks and decrement n_char on premature break to prevent false matches.

    for (n_char=0; n_char < key_len; n_char++)
    {
      if (c + n_char >= end)
        break;
      if (c[n_char] == '\\')
      {
        if (key[n_char] != '\\')
          break;
        n_char++;
        if (n_char >= key_len || c + n_char >= end || c[n_char] != key[n_char])
        {
          n_char--;
          break;
        }
        continue;
      }
      if (c[n_char] == '"')
        break;
      if (c[n_char] != key[n_char])
        break;
    }

Comment thread sql/item_jsonfunc.cc
Comment on lines 4150 to 4158
else
{
while (c[n_char] != '"')
{
if (c[n_char] == '\\')
n_char++;
n_char++;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the JSON string in res is somehow malformed or truncated, the loop while (c[n_char] != '"') can increment n_char past end, leading to an out-of-bounds read or infinite loop. We should add a bounds check c + n_char < end to ensure safety.

    else
    {
      while (c + n_char < end && c[n_char] != '"')
      {
        if (c[n_char] == '\\')
        {
          n_char++;
          if (c + n_char >= end)
            break;
        }
        n_char++;
      }
    }

@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 16, 2026
@gkodinov gkodinov self-assigned this Jun 16, 2026

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

This needs a LOT more test cases covering various combinations. One such is mentioned below, but there's more to be sure.
Unless you're willing to adjust your ambition level, as suggested below.

Comment thread sql/item_jsonfunc.cc
while (c < end)
{
int n_char;
for (n_char=0; c[n_char] != '"' && n_char < key_len; n_char++)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this feels like a hack: it's not using the official json_read_keyname_chr() call to read a char from a string. And it's incomplete: see json_handle_esc for what is being missed: e.g. \u vs normal char etc.

I believe that a proper solution should be that a normalized set of search strings must be built. And that the incoming strings need to be normalized before being searched. If we want to preserve de-duplication that is. Read more on that below please.

The deduplication is not an official documented feature of JSON_KEYS. It's not mentioned neither in the MariaDB docs nor in the MySQL docs.

Theoretically, duplicate keys in a JSON object, i.e. {"a": 1, "a":2} is not valid JSON. So I honestly do not know why is the attempt made of processing that here.

I'd just extract the keys and be done with it: no duplicate checks. Why would one function check for invalid JSON in their arguments and the others wont?

This probably is something for the final reviewers to worry about though. The above is just my honest opinion.

Comment thread sql/item_jsonfunc.cc
{
if (c[n_char] == '\\')
{
if (key[n_char] != '\\')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is a double quote char a "break"? It usually is considered a literal quote.
Please try {"a\\\\b": 1, "a\\\\c": 2}. I'd be surprised if it worked correctly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forgot to mention: this needs to be fixed in all affected versions (it's a bug). So please rebase to 10.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants