Skip to content

Extract reusable functions for non-recursive external $ref schemas#836

Open
jagould2012 wants to merge 1 commit intofastify:mainfrom
jagould2012:main
Open

Extract reusable functions for non-recursive external $ref schemas#836
jagould2012 wants to merge 1 commit intofastify:mainfrom
jagould2012:main

Conversation

@jagould2012
Copy link
Copy Markdown

Extract reusable functions for non-recursive external $ref schemas

Problem

When a non-recursive schema registered via $ref (with a $id) is referenced multiple times, fast-json-stringify inlines the full serialization code at every reference point. The functionsNamesBySchema cache — which correctly extracts reusable functions for recursive schemas — never activates for non-recursive schemas, regardless of how many times they appear.

This causes exponential code generation in real-world schemas where:

  • Entities reference shared sub-entities via $ref (e.g. a Contact schema referenced by 10+ other schemas)
  • anyOf/oneOf wraps $ref to express polymorphic fields (populated object OR string ID OR null)
  • Sub-entities themselves contain oneOf variants with further $ref relationships

In production schemas, a single route serializer can generate 48+ MB of JavaScript code, with total output across all routes exceeding 800 MB.

Root Cause

In buildObject() (index.js), the function extraction path at line 571 only triggers when:

  1. recursivePaths.has(fullPath) — schema references itself (circular)
  2. buildingSet.has(schema) — schema is currently being built (re-entrant)

For non-recursive external schemas, neither condition is true. The code falls through to the inline path (line 593), which:

  1. Adds the schema to buildingSet
  2. Generates inline code
  3. Removes the schema from buildingSet

The next time the same $ref target is encountered, buildingSet no longer contains it, so it's inlined again. The functionsNamesBySchema Map is never populated, so the cache check at line 560 always misses.

Before (current behavior)

buildObject() called for schema with $id "contact.json"
  → functionsNamesBySchema.has(schema)?  NO (never populated for non-recursive)
  → recursivePaths.has(fullPath)?        NO (not recursive)
  → buildingSet.has(schema)?             NO (cleaned up after previous inline)
  → Falls through to INLINE path
  → Generates ~500 lines of serialization code
  → Removes from buildingSet

buildObject() called AGAIN for same "contact.json"
  → Same result — inlines again
  → And again, and again... (728 times in our test case)

Compounding with anyOf/oneOf

The problem is amplified by anyOf/oneOf processing in buildOneOf():

  • Each anyOf option is merged with the parent schema via mergeLocations() (line 494)
  • mergeLocations() clones the referenced schema and deletes its $id
  • The cloned schema is a new JavaScript object, so even if the original was somehow cached, the clone misses
  • The mergedSchemasIds cache (line 1143) keys by optionSchema object reference, but each { "$ref": "..." } at a different JSON position is a distinct object — so identical anyOf patterns at different properties never share cached merges

A schema with oneOf containing 3 object variants (each with nested anyOf refs) causes a 19x code size multiplier at every reference point. Combined with inlining, a single entity referenced 125 times across all schemas produces catastrophic output.

Fix

Add a third condition to the function extraction check: extract a function when the schema has a schemaId (meaning it was resolved from an external $ref with a $id).

- if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema)) {
+ if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {

This causes any externally-registered schema (one that was added via the schema option or addSchema()) to be extracted into a named function on first encounter, then reused via functionsNamesBySchema on subsequent encounters — the same mechanism that already works correctly for recursive schemas.

Why schemaId is the right signal

  • Schemas resolved from external $ref always have a schemaId (their $id value)
  • Inline anonymous schemas (defined directly in properties) have schemaId as '' — these continue to be inlined as before
  • Merged schemas from anyOf/oneOf get synthetic __fjs_merged_* IDs, which also triggers extraction — this is beneficial because merged schemas are the most expensive to regenerate

Why this is safe

The extracted function path (lines 572-590) is already battle-tested — it's the same code path used for every recursive schema today. The only change is that more schemas enter this path. The generated function has identical semantics to the inline code:

// Extracted function (already existing pattern)
function anonymous0 (input) {
    const obj = (input && typeof input.toJSON === 'function') ? input.toJSON() : input
    if (obj === null) return JSON_STR_EMPTY_OBJECT
    let json = ''
    // ... buildInnerObject() output — identical to inline version
    return json
}

// Call site: json += anonymous0(value)
// vs inline: const obj_N = ...; if (obj_N === null) { ... } else { ... }

Test Results

Tested against a production Fastify service with 90 JSON schemas (35 unique $ref targets, 252 total $ref usages, 250 anyOf, 6 oneOf).

Single complex entity serializer (8-10 relationship fields, 4-5 levels deep)

Metric Before After Change
Generated code size 48.2 MB 36 KB 99.9% reduction
Compile time 373 ms 71 ms 81% faster
firstName inlined copies 728 0 eliminated
Extracted functions 0 1+ per shared schema reuse working

All 45 read schemas (simulating GET /:id routes)

Metric Before After
Total generated code ~868 MB 700 KB
Failures 0 0
Total compile time 700 ms

List wrapper schemas (simulating GET / routes with pagination)

Metric After
Total generated code (45 schemas) 87 KB
Failures 0

Functional correctness

All serialization tests pass with the patch, including:

  • Direct $ref — populated objects serialize correctly
  • anyOf with $ref — polymorphic fields (populated object, string ID, null) all serialize correctly
  • Nested oneOf — discriminated union variants serialize correctly
  • Deeply nested references — entity → sub-entity → sub-sub-entity chains work
  • Mixed arrays — arrays containing both populated objects and string IDs
  • Null handling — null values in optional anyOf fields

Functional test used

const populatedTrip = {
    _id: "trip-1",
    name: "Test Trip",
    type: "resort",
    status: "open",
    leaders: [
        {
            _id: "leader-1",
            contact: {
                _id: "contact-1",
                firstName: "John",
                lastName: "Smith",
                type: { key: "Staff", employeeId: "EMP-001" }
            },
            role: "trip_leader"
        },
        "leader-id-2",   // string ID ref
        null              // null ref
    ],
    members: [
        {
            contact: {
                _id: "contact-2",
                firstName: "Jane",
                lastName: "Doe",
                type: { key: "Customer", totalDives: 50 }
            },
            roommate: "contact-id-3",  // string ID ref
            status: "confirmed"
        }
    ],
    accommodations: [
        {
            _id: "acc-1",
            name: "Reef Resort",
            rooms: [{
                roomNumber: "101",
                occupants: [
                    { _id: "c-2", firstName: "Jane", lastName: "Doe" },
                    "contact-id-3"  // mixed array: object + string
                ]
            }]
        }
    ],
    schedule: null,
    notes: ["note-id-1"],
    payments: [],
    companyId: "company-1"
};

const result = serializer(populatedTrip);
const parsed = JSON.parse(result);
// All assertions pass:
// parsed.leaders[0].contact.firstName === "John"
// parsed.leaders[1] === "leader-id-2"
// parsed.leaders[2] === null
// parsed.members[0].contact.firstName === "Jane"
// parsed.accommodations[0].rooms[0].occupants[0].firstName === "Jane"
// parsed.accommodations[0].rooms[0].occupants[1] === "contact-id-3"
// parsed.schedule === null

Reproduction

Minimal script to demonstrate the issue and the fix:

const fjs = require('fast-json-stringify')

const contactSchema = {
    $id: 'contact.json',
    type: 'object',
    properties: {
        firstName: { type: 'string' },
        lastName: { type: 'string' },
        email: { type: 'string' }
    }
}

// Schema referencing contact 3 times via anyOf (common Mongoose populate pattern)
const parentSchema = {
    type: 'object',
    properties: {
        owner: {
            anyOf: [{ type: ['string', 'null'] }, { $ref: 'contact.json' }]
        },
        assignee: {
            anyOf: [{ type: ['string', 'null'] }, { $ref: 'contact.json' }]
        },
        reporter: {
            anyOf: [{ type: ['string', 'null'] }, { $ref: 'contact.json' }]
        }
    }
}

const serializer = fjs(parentSchema, { schema: { 'contact.json': contactSchema } })
const code = serializer.toString()

console.log('Code length:', code.length)
console.log('firstName occurrences:', (code.match(/firstName/g) || []).length)
// BEFORE: firstName appears 3 times (fully inlined at each anyOf)
// AFTER:  firstName appears 0 times in main (extracted to reusable function)

To see the catastrophic scaling, add oneOf variants to the contact schema and increase the reference count to 10+. Code size grows multiplicatively with each additional oneOf variant × each additional reference.

Test Suite

All 468 existing tests pass with zero failures. TypeScript definitions also pass.

npm run test

> fast-json-stringify@6.3.0 test
> npm run test:unit && npm run test:typescript

ℹ tests 468
ℹ suites 0
ℹ pass 468
ℹ fail 0
ℹ cancelled 0
ℹ skipped 0
ℹ todo 0
ℹ duration_ms 3476.610542

> fast-json-stringify@6.3.0 test:typescript
> tsd

Coverage

----------------------------|---------|----------|---------|---------|
File                        | % Stmts | % Branch | % Funcs | % Lines |
----------------------------|---------|----------|---------|---------|
All files                   |   81.21 |    70.11 |    31.8 |   81.21 |
 index.js                   |   92.91 |       95 |     100 |   92.91 |
 location.js                |     100 |      100 |     100 |     100 |
 merge-schemas.js           |     100 |      100 |     100 |     100 |
 serializer.js              |   99.29 |    98.61 |     100 |   99.29 |
 standalone.js              |     100 |      100 |     100 |     100 |
 validator.js               |   97.91 |     91.3 |     100 |   97.91 |
----------------------------|---------|----------|---------|---------|

Benchmarks

npm run benchmark — no performance regression. Results are within noise margin across all categories.

Serialization throughput (ops/sec)

Benchmark Baseline Patched Delta
creation 23,328 23,966 +2.7%
array [default] 15,889 16,330 +2.8%
array [json-stringify] 15,900 16,246 +2.2%
large array [default] 705 692 -1.8%
large array [json-stringify] 1,185 1,212 +2.3%
long string 40,223 42,184 +4.9%
short string 36,563,974 36,144,633 -1.1%
obj 16,151,523 15,984,577 -1.0%
date 2,450,388 2,428,318 -0.9%

All deltas are within the ±5% noise margin of the benchmarking harness. The patch adds no measurable runtime overhead — the function call indirection is negligible compared to the inlined code it replaces.

Full benchmark output (patched)

--- creation ---
🏆 AJV                                      x      35,949,980 ops/sec ±0.05%
   json-accelerator                         x       1,464,802 ops/sec ±5.33%
   compile-json-stringify                   x         862,181 ops/sec ±0.17%
   fast-json-stringify                      x          23,966 ops/sec ±0.30%

--- array ---
🏆 JSON.stringify                           x          29,670 ops/sec ±3.27%
   fast-json-stringify [default]            x          16,330 ops/sec ±0.56%
   fast-json-stringify [json-stringify]     x          16,246 ops/sec ±0.42%
   AJV                                      x          14,999 ops/sec ±0.49%
   json-accelerator                         x          14,019 ops/sec ±0.74%
   compile-json-stringify                   x          13,817 ops/sec ±0.77%

--- large array ---
🏆 fast-json-stringify [json-stringify]     x           1,212 ops/sec ±3.25%
   JSON.stringify                           x           1,204 ops/sec ±3.14%
   fast-json-stringify [default]            x             692 ops/sec ±4.36%
   AJV                                      x             689 ops/sec ±1.72%
   compile-json-stringify                   x             632 ops/sec ±2.23%

--- long string ---
🏆 fast-json-stringify                      x          42,184 ops/sec ±0.26%
   compile-json-stringify                   x          42,151 ops/sec ±0.26%
   json-accelerator                         x          41,951 ops/sec ±0.38%
   JSON.stringify                           x          41,579 ops/sec ±0.40%
   AJV                                      x          28,678 ops/sec ±0.40%

--- short string ---
🏆 fast-json-stringify                      x      36,144,633 ops/sec ±0.37%
   json-accelerator                         x      35,465,879 ops/sec ±0.05%
   compile-json-stringify                   x      25,225,952 ops/sec ±0.04%
   AJV                                      x      24,618,484 ops/sec ±0.18%
   JSON.stringify                           x      24,586,766 ops/sec ±7.33%

--- obj ---
🏆 compile-json-stringify                   x      24,321,656 ops/sec ±4.74%
   AJV                                      x      23,083,249 ops/sec ±0.05%
   JSON.stringify                           x      17,845,006 ops/sec ±0.06%
   json-accelerator                         x      16,171,382 ops/sec ±0.30%
   fast-json-stringify                      x      15,984,577 ops/sec ±0.15%

--- date ---
🏆 fast-json-stringify                      x       2,428,318 ops/sec ±0.26%
   json-accelerate                          x       2,385,754 ops/sec ±0.05%
   JSON.stringify                           x       1,519,450 ops/sec ±0.04%
   compile-json-stringify                   x       1,508,135 ops/sec ±0.04%

Impact

This change benefits any Fastify application that:

  1. Uses addSchema() to register shared schemas with $id
  2. References those schemas via $ref from multiple locations
  3. Uses anyOf/oneOf to express polymorphic fields (very common with ORMs like Mongoose, Prisma, etc.)

The performance improvement scales with schema complexity — applications with deeply nested cross-referencing schemas see the most dramatic improvement, but even simple schemas with repeated $ref targets benefit from reduced code duplication.

Copy link
Copy Markdown
Member

@Eomm Eomm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR description could be transformed into a test

But this PR LGTM

const fullPath = `${schemaId}#${jsonPointer}`

if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema)) {
if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {
if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || schemaId !== '') {

const fullPath = `${schemaId}#${jsonPointer}`

if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema)) {
if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || (schemaId && schemaId !== '')) {
if (context.recursivePaths.has(fullPath) || context.buildingSet.has(schema) || schemaId !== '') {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants