Conversation
Yep. That's why I toyed with this idea. Your implementation is extremely close to what I had in mind. Looks pretty good at first glance, but I'll need to find some block of time to review this carefully. Thanks a lot! |
|
Great! Could you share benchmark results with/without this? We can use https://github.com/ruby/json/tree/master/benchmark . |
There was a problem hiding this comment.
Pull request overview
This PR refactors the C extension JSON parser from a recursive-descent implementation to an iterative state machine driven by an explicit “frame stack” of container parse states. This is foundational work toward streaming-style parsing and also avoids C call-stack exhaustion when parsing very deeply nested JSON with max_nesting disabled.
Changes:
- Introduces a new
json_frame_stack(with spill-to-heap and Ruby TypedData lifecycle management) to track container parsing state without recursion. - Replaces the recursive
json_parse_anyimplementation with an iterative loop using per-frame “phases” (VALUE/KEY/COLON/COMMA/DONE) to drive parsing. - Wires the frame stack into
JSON_ParserStateand initializes/cleans it up incParser_parse.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // resume purely from the frame stack. A JSON_FRAME_ROOT frame sits at the | ||
| // bottom of the stack, so the stack is never empty mid-parse and the document | ||
| // itself is just another frame whose value, once parsed, leaves its phase DONE. | ||
| static VALUE json_parse_any(JSON_ParserState *state, JSON_ParserConfig *config) |
|
At least on my machine, there is a 5-15% regression on most benchmarks. But we might be able to reclaim some of that. |
|
By introducing just a couple "computed gotos," the one benchmark that got hit the most can be made 15% faster: 3f15d4d And it's now on par with master: And that's just a quick hack, I couldn't add a computed goto for the main issue in that benchmark which is I think if we split the This also makes |
|
After the last change, most benchmarks are now on par if not a little bit faster. A few are still a little bit slower but I'll see what I can do. Also There's always a bit of variance, so 1.04x might not be very significative: |
|
I just realized I made a mistake when benchmarking the initial version, the regression was 5-15% across the board. I updated my previous comment. |
|
Well that's the last time I try to benchmark things on my machine, lol. I genuinely thought they were about equal, sorry about that. |
|
😂 no worries. To be fair I never took the time to cleanup and commit the benchmark harness I'm using. |
|
Ok, I can no longer measure a substantial difference. Event he benchmark that showed a few % slower don't do so reliably. I'm gonna read the diff a few more time but overall I think this is good to go. |
As opposed to a recursive loop. We do this by keeping a stack of frames (very similar to how the stack of values was already stored). Each frame represents the state of a container. Since there are only 2 in JSON, it doesn't have to get too complex.
Each frame in the iterative parser now holds an enum describing its "phase", in order to support suspending parsing.
JSON_PHASE_ARRAY_COMMA and JSON_PHASE_OBJECT_COMMA Allows to remove one conditional.
Saves having to go through the dispatch loop again.
Take less arguments so it's easier to read.
Switch the recursive descent to an iterative algorithm by keeping a stack of "frames". Each frame represents parsing a container (the points at which you would recurse). Within frames there are "phases"; each phase maps to effectively a token that its looking for next.
This does not get all of the way to streaming parsing (that would require partial-token support), but it gets most of the way there, and is all required pre-work.
On my machine there's no noticeable slow-down. (There's also no noticeable speed-up.) The only really noticeable side-effect is that if you pass
max_nesting: false, this will not crash anymore from running out of stack space.