Skip to content

fix(producer): inline base64 frames in injector to unblock video-heavy renders#1630

Open
jrusso1020 wants to merge 3 commits into
mainfrom
fix/render-base64-inline-frames-for-video-heavy-comps
Open

fix(producer): inline base64 frames in injector to unblock video-heavy renders#1630
jrusso1020 wants to merge 3 commits into
mainfrom
fix/render-base64-inline-frames-for-video-heavy-comps

Conversation

@jrusso1020

Copy link
Copy Markdown
Collaborator

What

Drop the frameSrcResolver arg from the createVideoFrameInjector call site in renderOrchestrator.ts. That stops the injector from pointing <img> srcs at fileServer URLs and forces it back to base64-inline data URIs.

One-line code change. The createCompiledFrameSrcResolver builder + the frameSrcResolver option stay in the codebase, just unused — re-enabling them behind a proper gate is a follow-up.

Why

PR #596 added URL-served frames as a speedup: hand the injector a fileServer URL instead of base64-inlining each ~1 MB frame through page.evaluate. That holds when the fileServer is otherwise idle.

But on video-heavy compositions the same fileServer also serves every <video>.src. The runtime's drift-recovery branch (runtime/media.ts:294-302) issues el.load() on the underlying <video> during seeks, kicking off full-file downloads that occupy the fileServer's single Node event loop (it uses readFileSync and offers no Accept-Ranges). The injector's <img>.decode() then queues behind those video fetches and is never serviced before puppeteer's protocol timeout fires, surfacing as Runtime.callFunctionOn timed out in capture_streaming.

This is the failure shape OSS user @daybreakdeath hit on a 24 GB WSL host, and the failure Rahino sees on heavy real-world comps.

How

Per-phase instrumentation localised the hang. Frame 0 succeeds (~290 ms total, beforeCapture 199 ms). Frame 1 hangs in session.onBeforeCapture for the full 300 s protocol timeout. Splitting the injector hook's four page.evaluate calls (syncVideoFrameVisibility, injectVideoFramesBatch, __hfReseekGpu, redrawRuntimeColorGrading) showed injectVideoFramesBatch was the one that never returned. That function ends with await Promise.all(pendingDecodes) where each pendingDecode is img.decode() on an <img> whose new src is a fileServer URL. With 30 concurrent <video>.src downloads already in flight, the PNG fetch never returns.

Disabling the resolver (force base64 inline mode) — dataUri becomes data:image/png;base64,... of 137-284 KB — collapses the hang to a 5-50 ms injectVideoFramesBatch call.

Test plan

Repro corpus: synth 30 × 32 MB videos / 90 s comp on an 8-core / 30 GB host.

Setup Wall Outcome
baseline, broken corpus (no window.__timelines + no <video id>) 537 s render fails (Runtime.callFunctionOn timed out)
baseline, corpus-fixed 428 s render fails (same cascade, faster init)
this fix (drop frameSrcResolver) 121 s render succeeds, 69 MB MP4, calibration succeeds in 11 s

Control corpus: synth 30 × 1.6 MB / 60 s — 137 s with this change vs ~135 s on main, no regression on the no-bottleneck case.

ffprobe on the heavy-comp output confirms 90.000 s duration, 1920×1080, 30 fps, h264 High profile, mid-clip frames extract to legitimate 880 KB PNGs (not black/empty).

  • Manual testing performed (synth-30 heavy + synth-30 small, both confirmed above)
  • Real-world video-heavy comps regression test (pending — alpha release to be cut after this lands)
  • Follow-up PR: gate frameSrcResolver re-enable behind "no fileServer-bound <video>.src traffic" check, so the PR fix: speed up video frame injection renders #596 optimization still pays on the comps it was designed for

Notes

  • Draft until James + Miguel weigh in on whether to land as-is (default off) or wait for the gated version.
  • createCompiledFrameSrcResolver and the option are untouched in the codebase — this change is the single arg deletion at the producer call site.
  • Branch-based alpha release proposed before merge to main, per James's standing request for OSS-heavy changes.

Authored by Jerrai (Rames team).

…y renders

The URL-served frame path (PR #596) hands each injected `<img>` a fileServer URL
instead of a base64 data URI, on the theory that shipping a short URL through
`page.evaluate` beats shipping a multi-MB base64 string per frame. That holds
when the fileServer is otherwise idle.

But on video-heavy compositions, the same fileServer also serves every
`<video>.src`. The runtime's drift-recovery branch (`runtime/media.ts:294-302`)
issues `el.load()` on the underlying `<video>` during seeks, kicking off
full-file downloads that occupy the fileServer's single Node event loop (it
uses `readFileSync` and offers no `Accept-Ranges`). The injector's
`<img>.decode()` then queues behind those video fetches and is never serviced
before puppeteer's protocol timeout fires, surfacing as
`Runtime.callFunctionOn timed out` in `capture_streaming`.

Reproducer (30 × 32 MB videos / 90 s comp / 8-core / 30 GB host):

  baseline (broken corpus)            537 s   render fails
  baseline (corpus-fixed)             428 s   render fails
  this fix (drop frameSrcResolver)    121 s   render succeeds, 69 MB MP4

Control corpus (30 × 1.6 MB / 60 s) shows no regression: 137 s with this
change vs ~135 s on \`main\`. The \`createCompiledFrameSrcResolver\` builder and
the \`frameSrcResolver\` option stay in the codebase, just unused for now —
re-enabling them behind a proper gate ("only use URL-served frames when the
page has zero fileServer-bound \`<video>.src\` traffic") is a follow-up. The
cache memory ceiling (\`frameDataUriCacheBytesLimitMb\`, default 1500 MB above
8 GB hosts) already bounds the cost of base64 inlining.

— Jerrai

@jrusso1020 jrusso1020 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/code-review verdict — leaning approve with two small cleanups + three follow-up notes

Recall pass at xhigh effort: 10 finder angles + cross-file tracer + sweep. No correctness defects survived verification. One initially-promising security finding (path-traversal guard lost) was refuted on code-read — the resolver's null return falls through to fs.readFile() at videoFrameInjector.ts:122, so it was never a guard; just a URL/base64 selector for valid paths.

Findings (8, ranked)

1. Dead code: drop the void call + the orphaned import — THIS PR

renderOrchestrator.ts:1239:

void createCompiledFrameSrcResolver(compiledDir);

createCompiledFrameSrcResolver is a pure factory (shared.ts:268-286) with no side effects, and the imported symbol at renderOrchestrator.ts:83 is now only referenced by this no-op call. Either delete both, or — if the intent is to keep the builder reachable for the future-gate follow-up — leave a // eslint-disable / TODO comment so a future reader doesn't strip the void and accidentally drop the call.

Recommendation: delete the void line and remove the import. The builder lives on in shared.ts and the future-gate PR can re-import it then.

2. Untested edge case: 4K frames + the per-worker 1500 MB cache — FOLLOW-UP

Base64-inline puts every active frame's PNG bytes through the LRU cache. For 4K @ ~5 MB/frame and the default frameDataUriCacheBytesLimitMb = 1500 (config.ts:236), the cache fits ~300 frames. A 90 s @ 30 fps 4K comp = 2700 frames → roughly 9× cache turnover with disk re-reads on every backward seek past frame 300. Wall-clock impact not measured.

Recommendation: doesn't block this PR — the workaround restores correctness on the broken shape — but file a follow-up to measure 4K + long-duration before claiming the gate-less default is universally safe.

3. Untested edge case: ≤8 GB hosts get the 256 MB cache cliff — FOLLOW-UP

systemMemory.ts isLowMemorySystem (≤8192 MB) → memoryAdaptiveCacheBytesMb returns 256 MB (config.ts:248-260). At 4K (~5 MB/frame) that fits 38–64 frames — heavy thrashing on anything beyond a short comp. 1080p is fine (~150 KB/frame, ~1700 frames fit, validated by my 30 × 1.6 MB control corpus at 137 s vs 135 s baseline).

Recommendation: also follow-up. Same shape as #2 — the workaround is correct, but the cache budget on small hosts now matters more.

4. Per-worker cache duplication (raised in sweep) — WORTH A LOOK

videoFrameInjector.ts:187 allocates a fresh frameCache per createVideoFrameInjector call. With workerCount=4 and the 1500 MB ceiling each, that's a 6 GB potential frame cache across workers. The existing 8 GB cliff in memoryAdaptiveCacheBytesMb was tuned for a less cache-hungry path; with base64-inline now the only path on the in-process renderer, the per-worker × cache-bytes product is the real budget.

Recommendation: leave for the follow-up that re-introduces the gate; until then, the contention cap at parallelCoordinator.ts:176-180 (≈3 workers on 8-core) keeps the worst-case product in check.

5. Observability gap: frameCache.stats() exists, isn't emitted

videoFrameInjector.ts:138 exposes entries / bytes / evictions / oversizedRejections for telemetry, but no producer codepath logs it. If cache thrashing surfaces in prod on 4K/low-mem hosts, operators have no signal.

Recommendation: drop a one-liner emit at end-of-render. Not blocking.

6. Regression-lock: no test asserts the resolver isn't passed

renderOrchestrator.test.ts has no assertion on the absence of frameSrcResolver in the createVideoFrameInjector opts. Distributed renderChunk.ts:467 independently happens to use inline mode (never passed the resolver), so both call sites are now consistent — but a future contributor reading just the comment and 're-enabling the optimization' would silently re-introduce the bug. fallow audit won't catch it (syntactically valid).

Recommendation: tiny test in renderOrchestrator.test.ts that mocks createVideoFrameInjector and asserts the opts object doesn't carry frameSrcResolver. Optional but cheap.

7. Altitude: shallower fixes weren't explored

The actual trigger is runtime/media.ts:294-302 calling el.load() on <video> elements during seek-drift recovery, which kicks 30 simultaneous full-file fetches at the local fileServer. Two shallower fixes the comment doesn't acknowledge:

  • Skip the drift-recovery el.load() retry for videos that have a __render_frame_<id>__ injection sibling (visual comes from the <img>, no audio sync needed on muted videos).
  • Stream-based fileServer instead of readFileSync + no Accept-Ranges. (Range support alone was tried by Home and didn't help on the broken-corpus baseline — that may have been a confound, worth re-running on the corpus-fixed baseline.)

The PR's comment names the gating follow-up but doesn't name these. Worth either trying one of them in a follow-up, OR adding a sentence to the in-source comment noting they were considered.

8. Comment is 27 lines — repro numbers can move to PR body

The in-source comment captures the why well (load-bearing for future-you), but lines 1227-1232 (repro numbers + benchmark setup) duplicate what's in the PR body. Trim to ~12 lines, keep the architectural explanation, link to PR #1630 for the numbers.

Validation summary

  • Local CLI: synth 30 × 32 MB / 90 s comp: 537 s → 121 s (4.4×), 30 × 1.6 MB control: 135 s → 137 s parity. ✓
  • Dev cluster on the 0.6.122-alpha.0 image (@hyperframes/engine@alpha):
    • Distributed chunked path (render b0e52263, 8 chunks): 19 s wall, clean MP4. ✓
    • In-process streaming path via min-frames gate (render 06b3770e, 750 frames < 900): 41 s wall, clean MP4. ✓ — this is the path the fix actually changes.
  • Lint/format/typecheck/fallow/commitlint: all green via lefthook pre-commit.

Reviewers — Miguel + Magi: I think #1 is worth folding into this PR (5-line change). #2-#7 are good follow-up issues but shouldn't block merge. WDYT?

— Jerrai

@jrusso1020 jrusso1020 marked this pull request as ready for review June 22, 2026 07:22

@miga-heygen miga-heygen left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent review — confirming Rames' analysis + one additional observation

Reviewed the diff, the PR description, the createCompiledFrameSrcResolver implementation in shared.ts:268-286, the createFrameSourceCache + createVideoFrameInjector in packages/engine/src/services/videoFrameInjector.ts, and the distributed renderChunk.ts:467 call site.

1. Is the fix correct?

Yes. The contention path is clear and well-documented in both the PR body and the new in-source comment:

  • createFrameSourceCache.get() at videoFrameInjector.ts:119 calls frameSrcResolver?.(framePath) first. When the resolver is present, it returns a fileServer URL and skips the fs.readFile + base64 path entirely.
  • The browser-side <img> then fetches that URL from the same Hono fileServer that's simultaneously serving <video>.src downloads triggered by el.load() from drift-recovery.
  • fileServer uses readFileSync with no Accept-Ranges — a single blocking Node event loop.
  • Dropping frameSrcResolver means frameSrcResolver?.() returns undefined, the cache falls through to fs.readFile (Node-side, not competing with the browser's HTTP queue), base64-encodes, and passes the data URI directly through page.evaluate. No HTTP round-trip through the congested fileServer.

The fix eliminates the contention by moving frame data delivery from HTTP (browser → fileServer → disk) to IPC (Node fs.readFile → base64 → CDP page.evaluate). This is the correct approach for the in-process path.

2. Is it truly minimal?

Mostly. The actual behavior change is a single-line deletion (frameSrcResolver removed from the options object passed to createVideoFrameInjector). The other changes are:

  • 12 package.json + 3 plugin manifest version bumps (0.6.1210.6.122-alpha.0) — standard release hygiene, not behavioral.
  • A 27-line explanatory comment — thorough, perhaps a bit long (Rames' finding #8 is fair: the repro numbers could be trimmed and linked to the PR instead), but architecturally valuable for future readers.
  • The void createCompiledFrameSrcResolver(compiledDir) line — see #4 below.

No collateral behavioral changes. The diff is clean.

3. Risks from base64-inlining?

Bounded but real for edge cases. The frameDataUriCacheBytesLimitMb config (default 1500 MB on >8 GB hosts, 256 MB on ≤8 GB) caps the LRU cache. However:

  • 1080p frames (~150 KB base64): negligible. The 30×1.6 MB control corpus shows no regression (137 s vs 135 s). This is the common case and it's fine.
  • 4K frames (~5 MB base64): the cache fits ~300 frames at the 1500 MB ceiling. A 90 s @ 30 fps comp = 2700 frames → 9× cache turnover. Rames flagged this as finding #2. I agree it's follow-up, not blocking.
  • CDP message size: Puppeteer's CDP transport has no hard limit on page.evaluate argument size (it's JSON over WebSocket, no protocol cap), but a 5 MB base64 string per frame per page.evaluate call adds serialization overhead. For 1080p this is irrelevant (~150 KB). For 4K it's measurable but still sub-second — and the alternative (hanging forever) is worse.
  • Per-worker cache duplication: Rames' finding #4 is valid. With workerCount=4, the theoretical ceiling is 4 × 1500 MB = 6 GB of frame cache. The parallelCoordinator caps workers at ~3 on 8-core, so ~4.5 GB worst case. Not great on 8 GB hosts, but the ≤8 GB path already drops to 256 MB/worker. Acceptable for now.

No blocking risk for merge. The base64 path was already the production path for the distributed renderer (renderChunk.ts:467 never passed frameSrcResolver), so this change makes the in-process path consistent with what's already running in production at scale.

4. Dead code cleanup

Should be cleaned up in this PR. I agree with Rames' finding #1:

void createCompiledFrameSrcResolver(compiledDir);  // line 1239

createCompiledFrameSrcResolver is a pure factory function — resolve(compiledDir) is its only operation, and the returned closure is discarded by void. No side effects. The import at line 83 is now dead. Both should be removed.

The void call reads as "I'm keeping this around intentionally" but without an // eslint-disable or // TODO(gate) annotation, it looks like an oversight to anyone who doesn't read the comment. The builder still lives in shared.ts — the future gating PR can re-import it. Removing 2 lines is cleaner than leaving dead code that needs a comment to justify its existence.

5. Additional observations beyond Rames' review

5a. The comment is accurate but could be tighter. Lines 1228-1232 (repro numbers) duplicate the PR body's test plan table. I'd trim to ~15 lines: keep the architectural explanation (fileServer contention, readFileSync, no Accept-Ranges, the el.load() trigger), drop the benchmark numbers (link to this PR instead), and keep the "until properly gated" note. This is style, not blocking.

5b. The distributed path was already safe — worth calling out. renderChunk.ts:467 already omits frameSrcResolver, meaning every distributed/chunked render was already using base64-inline. This PR brings the in-process executeRenderJob path into alignment. The PR body mentions this but it's worth emphasizing: this isn't introducing a new pattern, it's making the in-process path consistent with what's already proven in production.

5c. No test regression risk from removing the option. frameSrcResolver is typed as optional (frameSrcResolver?: ...) in VideoFrameInjectorOptions. The createFrameSourceCache function handles undefined gracefully via optional chaining (frameSrcResolver?.(framePath)). No type errors, no runtime risk.

Verdict

Leaning approve with one small cleanup (#4: remove the void call + dead import). The fix is correct, minimal, well-documented, and aligns the in-process renderer with the already-production distributed path. The follow-up items (cache budget on 4K/low-mem hosts, proper gating, regression test) are valid but don't block this.

Review by Miga

…nder orchestrator

Followup to the previous commit. The void-call and the
`createCompiledFrameSrcResolver` import in `renderOrchestrator.ts` were left
behind as a no-op breadcrumb for the future gating PR. Code review (PR #1630)
correctly flagged this as dead code — the builder is a pure factory with no
side effects, so calling it and discarding the result is just wasted CPU.
Remove both and explain in the in-source comment where the builder still
lives, so the gating PR knows where to re-import from.

— Jerrai
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants