Skip to content

Commit eb1c4b7

Browse files
committed
marshal: add benchmark summary note
1 parent 3835149 commit eb1c4b7

1 file changed

Lines changed: 102 additions & 0 deletions

File tree

Misc/marshal-recursive-ref-design.md

Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -456,6 +456,108 @@ Tests should cover four buckets:
456456
- nested dict/list payload
457457
- code object payload
458458

459+
## Benchmark Summary
460+
461+
This section records the cleaner performance rerun taken after the earlier
462+
noisy measurements.
463+
464+
Baseline commit:
465+
466+
- `7c214ea52efbcf12261128b458db8fe025cbc61b`
467+
468+
Current commit:
469+
470+
- `38351499d915a61a71e9eeefe7b7af571c3a4e21`
471+
472+
### Method
473+
474+
Two benchmark layers were rerun:
475+
476+
1. Targeted `pyperformance` comparison:
477+
- baseline and current interpreters built locally
478+
- `pyperformance run --affinity 0`
479+
- current run used `--same-loops` from the baseline JSON
480+
- benchmark slice:
481+
`python_startup, python_startup_no_site, pickle, pickle_dict,
482+
pickle_list, pickle_pure_python, unpickle, unpickle_list,
483+
unpickle_pure_python, unpack_sequence`
484+
2. Direct marshal microbenches:
485+
- both interpreters pinned with `taskset -c 0`
486+
- 11 repeats per benchmark
487+
- loop counts increased by 10x over the earlier quick pass
488+
- report medians as the primary statistic, with mins retained in the raw
489+
JSON artifacts
490+
491+
Artifacts:
492+
493+
- `/tmp/pyperf-baseline-targeted-rerun.json`
494+
- `/tmp/pyperf-current-targeted-rerun.json`
495+
- `/tmp/marshal-baseline-stable.json`
496+
- `/tmp/marshal-current-stable.json`
497+
498+
### Targeted pyperformance
499+
500+
The official targeted `pyperformance` slice is effectively flat. The earlier
501+
`python_startup` regression did not reproduce under the cleaner rerun.
502+
503+
| Benchmark | Baseline | Current | Delta | Significance |
504+
| --- | ---: | ---: | ---: | --- |
505+
| `python_startup` | `6.64 ms +- 0.28 ms` | `6.61 ms +- 0.09 ms` | `1.00x faster` | not significant |
506+
| `python_startup_no_site` | `4.24 ms +- 0.09 ms` | `4.14 ms +- 0.05 ms` | `1.02x faster` | significant |
507+
| `pickle` | `6.95 us +- 0.07 us` | `6.98 us +- 0.07 us` | `1.01x slower` | not significant |
508+
| `pickle_dict` | `16.4 us +- 0.3 us` | `16.3 us +- 0.4 us` | `1.00x faster` | not significant |
509+
| `pickle_list` | `2.64 us +- 0.04 us` | `2.63 us +- 0.06 us` | `1.01x faster` | not significant |
510+
| `pickle_pure_python` | `182 us +- 3 us` | `183 us +- 2 us` | `1.00x slower` | not significant |
511+
| `unpickle` | `8.66 us +- 0.26 us` | `8.51 us +- 0.20 us` | `1.02x faster` | not significant |
512+
| `unpickle_list` | `2.64 us +- 0.03 us` | `2.69 us +- 0.07 us` | `1.02x slower` | not significant |
513+
| `unpickle_pure_python` | `122 us +- 2 us` | `120 us +- 1 us` | `1.01x faster` | not significant |
514+
| `unpack_sequence` | `20.5 ns +- 0.4 ns` | `20.0 ns +- 0.2 ns` | `1.02x faster` | significant |
515+
516+
Interpretation:
517+
518+
- No marshal-adjacent benchmark in this official slice shows a statistically
519+
significant regression.
520+
- The earlier `python_startup` slowdown was not stable.
521+
- The two significant wins are small and likely unrelated to marshal itself.
522+
523+
### Direct marshal microbenches
524+
525+
The direct marshal-focused microbenches are more sensitive to this change than
526+
the broader `pyperformance` slice. Here the load path remains consistently
527+
slower, while dumps stay close to flat.
528+
529+
Median results from the pinned stable rerun:
530+
531+
| Benchmark | Operation | Baseline median | Current median | Delta |
532+
| --- | --- | ---: | ---: | ---: |
533+
| `small_tuple` | `loads` | `0.028996168 s` | `0.032467121 s` | `+12.0%` |
534+
| `small_tuple` | `dumps` | `0.015875994 s` | `0.015498953 s` | `-2.4%` |
535+
| `nested_dict` | `loads` | `0.077889564 s` | `0.085107413 s` | `+9.3%` |
536+
| `nested_dict` | `dumps` | `0.072245140 s` | `0.073205785 s` | `+1.3%` |
537+
| `code_obj` | `loads` | `0.090201660 s` | `0.097551488 s` | `+8.1%` |
538+
| `code_obj` | `dumps` | `0.039133891 s` | `0.039431035 s` | `+0.8%` |
539+
540+
Load-path deltas from the same rerun using the best observed sample were
541+
similar:
542+
543+
- `small_tuple` loads: `+12.5%`
544+
- `nested_dict` loads: `+9.3%`
545+
- `code_obj` loads: `+6.9%`
546+
547+
### Conclusion
548+
549+
The benchmark story is mixed but clear:
550+
551+
- The broader targeted `pyperformance` slice does not show a stable
552+
user-visible regression.
553+
- The marshal-specific hot path does show a repeatable load slowdown of roughly
554+
`7%` to `12%` on the synthetic microbenches above.
555+
- Dump performance is approximately flat.
556+
557+
So this design currently looks behaviorally correct and broadly acceptable at
558+
the application level, but it is not yet honest to call it performance-neutral
559+
for marshal's load fast path.
560+
459561
## Non-Goals
460562

461563
- Do not add a whole-graph pre-scan.

0 commit comments

Comments
 (0)