When a hot loop calls functions generated via exec(), the loop body gets traced but the callees don't. Each callee is individually cold (called ~60 times), but the call site is hot (5,000+ calls total across different callees).
Concrete case: tinygrad generates ~5,000 pattern matching functions at startup via exec(). They're called from a tight loop:
for _, match, _ in pats:
if (ret := match(uop, ctx)) is not None: return ret
PYTHON_JIT=1 on 3.14.4 produces no improvement. The specializer works (LOAD_ATTR_SLOT fires correctly), but the JIT never compiles the callees.
Related: faster-cpython/ideas#738, #118093 (tier 2 entry at function calls).
It would be useful to either trace through exec()-generated callees at hot call sites, or detect aggregate hotness across callees sharing a call site.
Linked PRs
When a hot loop calls functions generated via
exec(), the loop body gets traced but the callees don't. Each callee is individually cold (called ~60 times), but the call site is hot (5,000+ calls total across different callees).Concrete case: tinygrad generates ~5,000 pattern matching functions at startup via
exec(). They're called from a tight loop:PYTHON_JIT=1on 3.14.4 produces no improvement. The specializer works (LOAD_ATTR_SLOTfires correctly), but the JIT never compiles the callees.Related: faster-cpython/ideas#738, #118093 (tier 2 entry at function calls).
It would be useful to either trace through
exec()-generated callees at hot call sites, or detect aggregate hotness across callees sharing a call site.Linked PRs