Extract reusable functions for non-recursive external `$ref` schemas by jagould2012 · Pull Request #836 · fastify/fast-json-stringify

jagould2012 · 2026-04-09T11:24:35Z

Extract reusable functions for non-recursive external `$ref` schemas

Problem

When a non-recursive schema registered via $ref (with a $id) is referenced multiple times, fast-json-stringify inlines the full serialization code at every reference point. The functionsNamesBySchema cache — which correctly extracts reusable functions for recursive schemas — never activates for non-recursive schemas, regardless of how many times they appear.

This causes exponential code generation in real-world schemas where:

Entities reference shared sub-entities via $ref (e.g. a Contact schema referenced by 10+ other schemas)
anyOf/oneOf wraps $ref to express polymorphic fields (populated object OR string ID OR null)
Sub-entities themselves contain oneOf variants with further $ref relationships

In production schemas, a single route serializer can generate 48+ MB of JavaScript code, with total output across all routes exceeding 800 MB.

Root Cause

In buildObject() (index.js), the function extraction path at line 571 only triggers when:

recursivePaths.has(fullPath) — schema references itself (circular)
buildingSet.has(schema) — schema is currently being built (re-entrant)

For non-recursive external schemas, neither condition is true. The code falls through to the inline path (line 593), which:

Adds the schema to buildingSet
Generates inline code
Removes the schema from buildingSet

The next time the same $ref target is encountered, buildingSet no longer contains it, so it's inlined again. The functionsNamesBySchema Map is never populated, so the cache check at line 560 always misses.

Before (current behavior)

buildObject() called for schema with $id "contact.json"
  → functionsNamesBySchema.has(schema)?  NO (never populated for non-recursive)
  → recursivePaths.has(fullPath)?        NO (not recursive)
  → buildingSet.has(schema)?             NO (cleaned up after previous inline)
  → Falls through to INLINE path
  → Generates ~500 lines of serialization code
  → Removes from buildingSet

buildObject() called AGAIN for same "contact.json"
  → Same result — inlines again
  → And again, and again... (728 times in our test case)

Compounding with `anyOf`/`oneOf`

The problem is amplified by anyOf/oneOf processing in buildOneOf():

Each anyOf option is merged with the parent schema via mergeLocations() (line 494)
mergeLocations() clones the referenced schema and deletes its $id
The cloned schema is a new JavaScript object, so even if the original was somehow cached, the clone misses
The mergedSchemasIds cache (line 1143) keys by optionSchema object reference, but each { "$ref": "..." } at a different JSON position is a distinct object — so identical anyOf patterns at different properties never share cached merges

A schema with oneOf containing 3 object variants (each with nested anyOf refs) causes a 19x code size multiplier at every reference point. Combined with inlining, a single entity referenced 125 times across all schemas produces catastrophic output.

Fix

Add a third condition to the function extraction check: extract a function when the schema has a schemaId (meaning it was resolved from an external $ref with a $id).

- if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema)) {
+ if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {

This causes any externally-registered schema (one that was added via the schema option or addSchema()) to be extracted into a named function on first encounter, then reused via functionsNamesBySchema on subsequent encounters — the same mechanism that already works correctly for recursive schemas.

Why `schemaId` is the right signal

Schemas resolved from external $ref always have a schemaId (their $id value)
Inline anonymous schemas (defined directly in properties) have schemaId as '' — these continue to be inlined as before
Merged schemas from anyOf/oneOf get synthetic __fjs_merged_* IDs, which also triggers extraction — this is beneficial because merged schemas are the most expensive to regenerate

Why this is safe

The extracted function path (lines 572-590) is already battle-tested — it's the same code path used for every recursive schema today. The only change is that more schemas enter this path. The generated function has identical semantics to the inline code:

// Extracted function (already existing pattern)
function anonymous0 (input) {
    const obj = (input && typeof input.toJSON === 'function') ? input.toJSON() : input
    if (obj === null) return JSON_STR_EMPTY_OBJECT
    let json = ''
    // ... buildInnerObject() output — identical to inline version
    return json
}

// Call site: json += anonymous0(value)
// vs inline: const obj_N = ...; if (obj_N === null) { ... } else { ... }

Test Results

Tested against a production Fastify service with 90 JSON schemas (35 unique $ref targets, 252 total $ref usages, 250 anyOf, 6 oneOf).

Single complex entity serializer (8-10 relationship fields, 4-5 levels deep)

Metric	Before	After	Change
Generated code size	48.2 MB	36 KB	99.9% reduction
Compile time	373 ms	71 ms	81% faster
`firstName` inlined copies	728	0	eliminated
Extracted functions	0	1+ per shared schema	reuse working

All 45 read schemas (simulating GET /:id routes)

Metric	Before	After
Total generated code	~868 MB	700 KB
Failures	0	0
Total compile time	—	700 ms

List wrapper schemas (simulating GET / routes with pagination)

Metric	After
Total generated code (45 schemas)	87 KB
Failures	0

Functional correctness

All serialization tests pass with the patch, including:

Direct $ref — populated objects serialize correctly
anyOf with $ref — polymorphic fields (populated object, string ID, null) all serialize correctly
Nested oneOf — discriminated union variants serialize correctly
Deeply nested references — entity → sub-entity → sub-sub-entity chains work
Mixed arrays — arrays containing both populated objects and string IDs
Null handling — null values in optional anyOf fields

Functional test used

const populatedTrip = {
    _id: "trip-1",
    name: "Test Trip",
    type: "resort",
    status: "open",
    leaders: [
        {
            _id: "leader-1",
            contact: {
                _id: "contact-1",
                firstName: "John",
                lastName: "Smith",
                type: { key: "Staff", employeeId: "EMP-001" }
            },
            role: "trip_leader"
        },
        "leader-id-2",   // string ID ref
        null              // null ref
    ],
    members: [
        {
            contact: {
                _id: "contact-2",
                firstName: "Jane",
                lastName: "Doe",
                type: { key: "Customer", totalDives: 50 }
            },
            roommate: "contact-id-3",  // string ID ref
            status: "confirmed"
        }
    ],
    accommodations: [
        {
            _id: "acc-1",
            name: "Reef Resort",
            rooms: [{
                roomNumber: "101",
                occupants: [
                    { _id: "c-2", firstName: "Jane", lastName: "Doe" },
                    "contact-id-3"  // mixed array: object + string
                ]
            }]
        }
    ],
    schedule: null,
    notes: ["note-id-1"],
    payments: [],
    companyId: "company-1"
};

const result = serializer(populatedTrip);
const parsed = JSON.parse(result);
// All assertions pass:
// parsed.leaders[0].contact.firstName === "John"
// parsed.leaders[1] === "leader-id-2"
// parsed.leaders[2] === null
// parsed.members[0].contact.firstName === "Jane"
// parsed.accommodations[0].rooms[0].occupants[0].firstName === "Jane"
// parsed.accommodations[0].rooms[0].occupants[1] === "contact-id-3"
// parsed.schedule === null

Reproduction

Minimal script to demonstrate the issue and the fix:

const fjs = require('fast-json-stringify')

const contactSchema = {
    $id: 'contact.json',
    type: 'object',
    properties: {
        firstName: { type: 'string' },
        lastName: { type: 'string' },
        email: { type: 'string' }
    }
}

// Schema referencing contact 3 times via anyOf (common Mongoose populate pattern)
const parentSchema = {
    type: 'object',
    properties: {
        owner: {
            anyOf: [{ type: ['string', 'null'] }, { $ref: 'contact.json' }]
        },
        assignee: {
            anyOf: [{ type: ['string', 'null'] }, { $ref: 'contact.json' }]
        },
        reporter: {
            anyOf: [{ type: ['string', 'null'] }, { $ref: 'contact.json' }]
        }
    }
}

const serializer = fjs(parentSchema, { schema: { 'contact.json': contactSchema } })
const code = serializer.toString()

console.log('Code length:', code.length)
console.log('firstName occurrences:', (code.match(/firstName/g) || []).length)
// BEFORE: firstName appears 3 times (fully inlined at each anyOf)
// AFTER:  firstName appears 0 times in main (extracted to reusable function)

To see the catastrophic scaling, add oneOf variants to the contact schema and increase the reference count to 10+. Code size grows multiplicatively with each additional oneOf variant × each additional reference.

Test Suite

All 468 existing tests pass with zero failures. TypeScript definitions also pass.

npm run test

> fast-json-stringify@6.3.0 test
> npm run test:unit && npm run test:typescript

ℹ tests 468
ℹ suites 0
ℹ pass 468
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 3476.610542

> fast-json-stringify@6.3.0 test:typescript
> tsd

Coverage

----------------------------|---------|----------|---------|---------|
File                        | % Stmts | % Branch | % Funcs | % Lines |
----------------------------|---------|----------|---------|---------|
All files                   |   81.21 |    70.11 |    31.8 |   81.21 |
 index.js                   |   92.91 |       95 |     100 |   92.91 |
 location.js                |     100 |      100 |     100 |     100 |
 merge-schemas.js           |     100 |      100 |     100 |     100 |
 serializer.js              |   99.29 |    98.61 |     100 |   99.29 |
 standalone.js              |     100 |      100 |     100 |     100 |
 validator.js               |   97.91 |     91.3 |     100 |   97.91 |
----------------------------|---------|----------|---------|---------|

Benchmarks

npm run benchmark — no performance regression. Results are within noise margin across all categories.

Serialization throughput (ops/sec)

Benchmark	Baseline	Patched	Delta
creation	23,328	23,966	+2.7%
array [default]	15,889	16,330	+2.8%
array [json-stringify]	15,900	16,246	+2.2%
large array [default]	705	692	-1.8%
large array [json-stringify]	1,185	1,212	+2.3%
long string	40,223	42,184	+4.9%
short string	36,563,974	36,144,633	-1.1%
obj	16,151,523	15,984,577	-1.0%
date	2,450,388	2,428,318	-0.9%

All deltas are within the ±5% noise margin of the benchmarking harness. The patch adds no measurable runtime overhead — the function call indirection is negligible compared to the inlined code it replaces.

Full benchmark output (patched)

--- creation ---
🏆 AJV                                      x      35,949,980 ops/sec ±0.05%
   json-accelerator                         x       1,464,802 ops/sec ±5.33%
   compile-json-stringify                   x         862,181 ops/sec ±0.17%
   fast-json-stringify                      x          23,966 ops/sec ±0.30%

--- array ---
🏆 JSON.stringify                           x          29,670 ops/sec ±3.27%
   fast-json-stringify [default]            x          16,330 ops/sec ±0.56%
   fast-json-stringify [json-stringify]     x          16,246 ops/sec ±0.42%
   AJV                                      x          14,999 ops/sec ±0.49%
   json-accelerator                         x          14,019 ops/sec ±0.74%
   compile-json-stringify                   x          13,817 ops/sec ±0.77%

--- large array ---
🏆 fast-json-stringify [json-stringify]     x           1,212 ops/sec ±3.25%
   JSON.stringify                           x           1,204 ops/sec ±3.14%
   fast-json-stringify [default]            x             692 ops/sec ±4.36%
   AJV                                      x             689 ops/sec ±1.72%
   compile-json-stringify                   x             632 ops/sec ±2.23%

--- long string ---
🏆 fast-json-stringify                      x          42,184 ops/sec ±0.26%
   compile-json-stringify                   x          42,151 ops/sec ±0.26%
   json-accelerator                         x          41,951 ops/sec ±0.38%
   JSON.stringify                           x          41,579 ops/sec ±0.40%
   AJV                                      x          28,678 ops/sec ±0.40%

--- short string ---
🏆 fast-json-stringify                      x      36,144,633 ops/sec ±0.37%
   json-accelerator                         x      35,465,879 ops/sec ±0.05%
   compile-json-stringify                   x      25,225,952 ops/sec ±0.04%
   AJV                                      x      24,618,484 ops/sec ±0.18%
   JSON.stringify                           x      24,586,766 ops/sec ±7.33%

--- obj ---
🏆 compile-json-stringify                   x      24,321,656 ops/sec ±4.74%
   AJV                                      x      23,083,249 ops/sec ±0.05%
   JSON.stringify                           x      17,845,006 ops/sec ±0.06%
   json-accelerator                         x      16,171,382 ops/sec ±0.30%
   fast-json-stringify                      x      15,984,577 ops/sec ±0.15%

--- date ---
🏆 fast-json-stringify                      x       2,428,318 ops/sec ±0.26%
   json-accelerate                          x       2,385,754 ops/sec ±0.05%
   JSON.stringify                           x       1,519,450 ops/sec ±0.04%
   compile-json-stringify                   x       1,508,135 ops/sec ±0.04%

Impact

This change benefits any Fastify application that:

Uses addSchema() to register shared schemas with $id
References those schemas via $ref from multiple locations
Uses anyOf/oneOf to express polymorphic fields (very common with ORMs like Mongoose, Prisma, etc.)

The performance improvement scales with schema complexity — applications with deeply nested cross-referencing schemas see the most dramatic improvement, but even simple schemas with repeated $ref targets benefit from reduced code duplication.

Eomm

I think the PR description could be transformed into a test

But this PR LGTM

Eomm · 2026-04-10T12:42:39Z

index.js

  const fullPath = `${schemaId}#${jsonPointer}`

-  if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema)) {
+  if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {


Suggested change

if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {

if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || schemaId !== '') {

Eomm · 2026-04-10T12:42:49Z

index.js

  const fullPath = `${schemaId}#${jsonPointer}`

-  if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema)) {
+  if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {


Suggested change

if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {

if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || schemaId !== '') {

Force use of cache for $ref objects

3948226

jagould2012 mentioned this pull request Apr 9, 2026

Cache compiled serializers across encapsulated contexts fastify/fast-json-stringify-compiler#84

Open

Eomm reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extract reusable functions for non-recursive external `$ref` schemas#836

Extract reusable functions for non-recursive external `$ref` schemas#836
jagould2012 wants to merge 1 commit intofastify:mainfrom
jagould2012:main

jagould2012 commented Apr 9, 2026

Uh oh!

Eomm left a comment

Uh oh!

Eomm Apr 10, 2026

Uh oh!

Eomm Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	if (context.recursivePaths.has(fullPath) \|\| context.buildingSet.has(schema) \|\| (schemaId && schemaId !== '')) {
	if (context.recursivePaths.has(fullPath) \|\| context.buildingSet.has(schema) \|\| schemaId !== '') {

Uh oh!

Conversation

jagould2012 commented Apr 9, 2026

Extract reusable functions for non-recursive external $ref schemas

Problem

Root Cause

Before (current behavior)

Compounding with anyOf/oneOf

Fix

Why schemaId is the right signal

Why this is safe

Test Results

Single complex entity serializer (8-10 relationship fields, 4-5 levels deep)

All 45 read schemas (simulating GET /:id routes)

List wrapper schemas (simulating GET / routes with pagination)

Functional correctness

Functional test used

Reproduction

Test Suite

Coverage

Benchmarks

Serialization throughput (ops/sec)

Full benchmark output (patched)

Impact

Uh oh!

Eomm left a comment

Choose a reason for hiding this comment

Uh oh!

Eomm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Eomm Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Extract reusable functions for non-recursive external `$ref` schemas

Compounding with `anyOf`/`oneOf`

Why `schemaId` is the right signal