MDEV-39999 Fix JSON path quoting for keys with special characters#5240
MDEV-39999 Fix JSON path quoting for keys with special characters#5240akshatnehra wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-39999 by quoting keys containing special characters in JSON paths generated by JSON_SEARCH. The reviewer feedback points out that the blacklist-based approach used to detect keys requiring quoting is prone to missing edge cases, such as empty keys, keys starting with a digit, or other special characters like hyphens. It is recommended to adopt a whitelist-based approach to ensure only safe alphanumeric identifiers starting with a letter or underscore are left unquoted.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| bool needs_quote= false; | ||
| for (size_t i= 0; i < k_len; i++) | ||
| { | ||
| if (k[i] == '.' || k[i] == '[' || k[i] == ']' || | ||
| k[i] == '"' || k[i] == ' ' || k[i] == '*' || k[i] == '\\') | ||
| { | ||
| needs_quote= true; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
The current blacklist-based approach for detecting keys that need quoting is prone to missing several edge cases:
- Empty keys (
k_len == 0): Currently, an empty key is not quoted, resulting in the invalid JSON path$.(as seen in the test results:"$."). It should be quoted as$."". - Keys starting with a digit: Keys starting with a digit (e.g.,
1abc) are not quoted, which is invalid in standard JSON path syntax. - Other special characters: Characters like hyphens (
-), slashes (/), or@are not in the blacklist and will be left unquoted, producing invalid paths (e.g.,$.a-b).
Using a whitelist approach (only allowing safe alphanumeric/underscore identifiers starting with a letter or underscore) is much more robust and correctly handles all these cases.
bool needs_quote= (k_len == 0);
if (!needs_quote)
{
char first= k[0];
if (!((first >= 'a' && first <= 'z') ||
(first >= 'A' && first <= 'Z') ||
first == '_'))
{
needs_quote= true;
}
else
{
for (size_t i= 1; i < k_len; i++)
{
char ch= k[i];
if (!((ch >= 'a' && ch <= 'z') ||
(ch >= 'A' && ch <= 'Z') ||
(ch >= '0' && ch <= '9') ||
ch == '_'))
{
needs_quote= true;
break;
}
}
}
}append_json_path() emits keys containing path-ambiguous characters (dot, brackets, quotes, space, asterisk, backslash) without quoting, producing paths that cannot be parsed back correctly. For example, a key "a.b" is emitted as $.a.b which is interpreted as key "a" then key "b". Fix: detect keys containing special characters and emit them as quoted identifiers (e.g. $."a.b"). Backslash and double-quote inside keys are escaped with a preceding backslash. All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
gkodinov
left a comment
There was a problem hiding this comment.
Thank you for your contribution! This is a preliminary review.
Please consider fixing the following (in addition to the notes below):
- rebase on 10.11: this is a bug fix. It needs to go to the lowest affected GA version
- fix the buildbot failures. I can currently see just one:
main.func_json w14 [ fail ]
Test ended at 2026-06-15 19:52:27
CURRENT_TEST: main.func_json
--- /home/buildbot/aarch64-debian-11/build/mysql-test/main/func_json.result 2026-06-15 19:33:34.000000000 +0000
+++ /home/buildbot/aarch64-debian-11/build/mysql-test/main/func_json.reject 2026-06-15 19:52:27.534594656 +0000
@@ -596,7 +596,7 @@
[]
SELECT JSON_search( '{"": "a"}', "one", 'a');
JSON_search( '{"": "a"}', "one", 'a')
-"$."
+"$.\"\""
select json_merge('{"a":"b"}', '{"a":"c"}') as ex ;
ex
{"a": ["b", "c"]}
Result length mismatch
- saving '/home/buildbot/aarch64-debian-11/build/mysql-test/var/14/log/main.func_json/' to '/home/buildbot/aarch64-debian-11/build/mysql-test/var/log/main.func_json/'
Retrying test main.func_json, attempt(2/3)...
| return TRUE; | ||
| if (needs_quote) | ||
| { | ||
| if (str->append('\\') || str->append('"')) |
There was a problem hiding this comment.
please consider using st_append_escaped() instead. See append_json_value_from_field(). And produce a "bracketed-selection" (see above).
| bool needs_quote= (k_len == 0); | ||
| for (size_t i= 0; i < k_len; i++) | ||
| { | ||
| if (k[i] == '.' || k[i] == '[' || k[i] == ']' || |
There was a problem hiding this comment.
it's more complex than this I'm afraid. https://www.rfc-editor.org/info/rfc9535/#segments-details defines what is a valid "shorthand" notation as follows:
child-segment = bracketed-selection /
("."
(wildcard-selector /
member-name-shorthand))
bracketed-selection = "[" S selector *(S "," S selector) S "]"
member-name-shorthand = name-first *name-char
name-first = ALPHA /
"_" /
%x80-D7FF /
; skip surrogate code points
%xE000-10FFFF
name-char = name-first / DIGIT
DIGIT = %x30-39 ; 0-9
ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
This is implemented by json_escape() in the MariaDB code I believe.
So here's what I would suggest: either always produce bracketed-selection (see above) or call st_append_escaped (or json_escape directly), see if the result is different from the source string and if it is, go with the bracketed expression that you have, otherwise do the unescaped source string.
This will have the disadvantage that you will always produce a bracketed escaped string. But I believe this is not such a big deal performance wise.
You could of course consider implementing a non-outputting version of json_escape() and call that json_needs_escaping() for example. But I believe it needs to be done along the same lines at least.
Description
This PR fixes MDEV-39999 where
JSON_SEARCH()and other functions usingappend_json_path()return unquoted paths for keys containing special characters, producing paths that cannot be parsed back.For example, a key
"a.b"is emitted as$.a.bwhich the JSON path parser interprets as key"a"followed by key"b"— making it impossible to round-tripJSON_SEARCH→JSON_VALUE.The fix involves:
.,[,],", space,*,\) inappend_json_path()$."a.b") with proper escaping of embedded double-quotes and backslashesHow can this PR be tested?
Run the MTR test:
Or verify the round-trip works:
Results from my testing
These paths are unparseable —
$.a.bmeans keyathen keyb, not keya.b.Paths are now properly quoted and can be used with
JSON_VALUE/JSON_EXTRACTto access the original key.Basing the PR against the correct MariaDB version
PR quality check
CODING_STANDARDS.mdfile and my PR conforms to this where appropriate.