@@ -456,6 +456,108 @@ Tests should cover four buckets:
456456 - nested dict/list payload
457457 - code object payload
458458
459+ ## Benchmark Summary
460+
461+ This section records the cleaner performance rerun taken after the earlier
462+ noisy measurements.
463+
464+ Baseline commit:
465+
466+ - ` 7c214ea52efbcf12261128b458db8fe025cbc61b `
467+
468+ Current commit:
469+
470+ - ` 38351499d915a61a71e9eeefe7b7af571c3a4e21 `
471+
472+ ### Method
473+
474+ Two benchmark layers were rerun:
475+
476+ 1 . Targeted ` pyperformance ` comparison:
477+ - baseline and current interpreters built locally
478+ - ` pyperformance run --affinity 0 `
479+ - current run used ` --same-loops ` from the baseline JSON
480+ - benchmark slice:
481+ `python_startup, python_startup_no_site, pickle, pickle_dict,
482+ pickle_list, pickle_pure_python, unpickle, unpickle_list,
483+ unpickle_pure_python, unpack_sequence`
484+ 2 . Direct marshal microbenches:
485+ - both interpreters pinned with ` taskset -c 0 `
486+ - 11 repeats per benchmark
487+ - loop counts increased by 10x over the earlier quick pass
488+ - report medians as the primary statistic, with mins retained in the raw
489+ JSON artifacts
490+
491+ Artifacts:
492+
493+ - ` /tmp/pyperf-baseline-targeted-rerun.json `
494+ - ` /tmp/pyperf-current-targeted-rerun.json `
495+ - ` /tmp/marshal-baseline-stable.json `
496+ - ` /tmp/marshal-current-stable.json `
497+
498+ ### Targeted pyperformance
499+
500+ The official targeted ` pyperformance ` slice is effectively flat. The earlier
501+ ` python_startup ` regression did not reproduce under the cleaner rerun.
502+
503+ | Benchmark | Baseline | Current | Delta | Significance |
504+ | --- | ---: | ---: | ---: | --- |
505+ | ` python_startup ` | ` 6.64 ms +- 0.28 ms ` | ` 6.61 ms +- 0.09 ms ` | ` 1.00x faster ` | not significant |
506+ | ` python_startup_no_site ` | ` 4.24 ms +- 0.09 ms ` | ` 4.14 ms +- 0.05 ms ` | ` 1.02x faster ` | significant |
507+ | ` pickle ` | ` 6.95 us +- 0.07 us ` | ` 6.98 us +- 0.07 us ` | ` 1.01x slower ` | not significant |
508+ | ` pickle_dict ` | ` 16.4 us +- 0.3 us ` | ` 16.3 us +- 0.4 us ` | ` 1.00x faster ` | not significant |
509+ | ` pickle_list ` | ` 2.64 us +- 0.04 us ` | ` 2.63 us +- 0.06 us ` | ` 1.01x faster ` | not significant |
510+ | ` pickle_pure_python ` | ` 182 us +- 3 us ` | ` 183 us +- 2 us ` | ` 1.00x slower ` | not significant |
511+ | ` unpickle ` | ` 8.66 us +- 0.26 us ` | ` 8.51 us +- 0.20 us ` | ` 1.02x faster ` | not significant |
512+ | ` unpickle_list ` | ` 2.64 us +- 0.03 us ` | ` 2.69 us +- 0.07 us ` | ` 1.02x slower ` | not significant |
513+ | ` unpickle_pure_python ` | ` 122 us +- 2 us ` | ` 120 us +- 1 us ` | ` 1.01x faster ` | not significant |
514+ | ` unpack_sequence ` | ` 20.5 ns +- 0.4 ns ` | ` 20.0 ns +- 0.2 ns ` | ` 1.02x faster ` | significant |
515+
516+ Interpretation:
517+
518+ - No marshal-adjacent benchmark in this official slice shows a statistically
519+ significant regression.
520+ - The earlier ` python_startup ` slowdown was not stable.
521+ - The two significant wins are small and likely unrelated to marshal itself.
522+
523+ ### Direct marshal microbenches
524+
525+ The direct marshal-focused microbenches are more sensitive to this change than
526+ the broader ` pyperformance ` slice. Here the load path remains consistently
527+ slower, while dumps stay close to flat.
528+
529+ Median results from the pinned stable rerun:
530+
531+ | Benchmark | Operation | Baseline median | Current median | Delta |
532+ | --- | --- | ---: | ---: | ---: |
533+ | ` small_tuple ` | ` loads ` | ` 0.028996168 s ` | ` 0.032467121 s ` | ` +12.0% ` |
534+ | ` small_tuple ` | ` dumps ` | ` 0.015875994 s ` | ` 0.015498953 s ` | ` -2.4% ` |
535+ | ` nested_dict ` | ` loads ` | ` 0.077889564 s ` | ` 0.085107413 s ` | ` +9.3% ` |
536+ | ` nested_dict ` | ` dumps ` | ` 0.072245140 s ` | ` 0.073205785 s ` | ` +1.3% ` |
537+ | ` code_obj ` | ` loads ` | ` 0.090201660 s ` | ` 0.097551488 s ` | ` +8.1% ` |
538+ | ` code_obj ` | ` dumps ` | ` 0.039133891 s ` | ` 0.039431035 s ` | ` +0.8% ` |
539+
540+ Load-path deltas from the same rerun using the best observed sample were
541+ similar:
542+
543+ - ` small_tuple ` loads: ` +12.5% `
544+ - ` nested_dict ` loads: ` +9.3% `
545+ - ` code_obj ` loads: ` +6.9% `
546+
547+ ### Conclusion
548+
549+ The benchmark story is mixed but clear:
550+
551+ - The broader targeted ` pyperformance ` slice does not show a stable
552+ user-visible regression.
553+ - The marshal-specific hot path does show a repeatable load slowdown of roughly
554+ ` 7% ` to ` 12% ` on the synthetic microbenches above.
555+ - Dump performance is approximately flat.
556+
557+ So this design currently looks behaviorally correct and broadly acceptable at
558+ the application level, but it is not yet honest to call it performance-neutral
559+ for marshal's load fast path.
560+
459561## Non-Goals
460562
461563- Do not add a whole-graph pre-scan.
0 commit comments