Proposal: ~2.5–3× faster parsing, conformance-preserving

## TL;DR

I have a set of parser-only optimizations that make `tomlkit.parse()` roughly
**2.4–2.9× faster** (content-dependent) with **no behavioural change**: the full
test suite passes, including **680/680 of the `toml-test` conformance corpus**,
and `ruff` + `mypy --strict` are clean.

Before opening a large PR, I'd like to check your appetite and preferred shape —
the work is naturally splittable into independent, separately-reviewable pieces.

## Why this matters / why I'm being careful

tomlkit is a hot dependency of Poetry (and poetry-core), so I treated
*correctness as non-negotiable*: every change is gated on the conformance suite,
not just round-trip of valid files. (In fact, while validating I caught a subtle
EOF-sentinel collision a naive char-interning optimization can introduce — the
fix is included and is itself covered by `toml-test invalid/control/*-null`.)

## Reproducing the numbers

A dependency-free benchmark is attached (`bench_standalone.py`): run it on
`master`, then on the branch, and compare medians. Measured here (macOS,
CPython 3.11):

```
# master (8694e4d)
$ python bench_standalone.py
  median :  791.6 ms   (1583 us/parse)

# branch
$ python bench_standalone.py
  median :  316.8 ms   ( 634 us/parse)   -> ~2.5x faster
```

Cross-checked with a drift-immune A/B (both versions timed, interleaved, in one
process) on other workloads:

| Workload | master | branch | factor |
|---|---|---|---|
| embedded sample (`bench_standalone.py`) | 792 ms | 317 ms | **×2.50** |
| number-heavy ~3 KB doc | ~2970 ms | ~1045 ms | **×2.85** |
| 262 distinct `toml-test/valid` files | 1747 ms | 717 ms | **×2.44** |

## What the changes do (grouped, so you can pick)

The core idea: the parser advanced **one character per call** (`inc()` ~2.2 M
times on the benchmark). The big wins replace those with **bulk scans** over the
underlying string. Grouped by theme, independently reviewable:

| # | Theme | Idea | Risk |
|---|---|---|---|
| A | **Bulk char scanning** | index-based `Source`; `advance_while`/`advance_until` replace per-char `inc()` loops in bare-keys, whitespace, numbers, comments, single-line strings | medium — hot path, but pure scan-equivalence |
| B | **Dispatch micro-wins** | `frozenset` membership, precomputed enum values, `is` vs set membership, hoisting loop invariants, binding `Source` delegates | low — mechanical |
| C | **Object interning** | module-level `TOMLChar` cache (+ the NUL/EOF fix) | medium — includes the correctness fix above |
| D | **`__deepcopy__`** | hand-rolled `Container`/`Trivia` deepcopy to skip the reflective machinery (super-table merge path) | higher — touches copy/alias semantics, wants careful review |

Themes A + B alone deliver most of the gain at the lowest risk; C and D can be
deferred or dropped.

## What I've verified — and what I haven't (yet)

- ✅ Full `pytest tests` green incl. `toml-test` (680/680), matching `master`.
- ✅ `ruff check`, `ruff format`, `mypy --strict` clean.
- ✅ Drift-immune A/B benchmark (numbers above).
- ⏳ **Only CPython 3.11 / macOS so far** — not the full CI matrix (3.9–3.14 ×
  3 OS) nor the Poetry/poetry-core integration jobs. I'll run those before any PR
  if you're interested.

## Question for the maintainers

1. Is a parser speedup of this size something you'd consider merging?
2. If so, do you prefer **one PR per theme** (A, then B, …) or a single curated
   PR of A+B with C/D as follow-ups?
3. Any themes you'd rather not take (e.g. the `__deepcopy__` rewrite) on
   maintainability grounds?

Happy to adapt to whatever keeps the hot path readable for you.

---

<details>
<summary><code>bench_standalone.py</code> (zero-dependency, copy-paste & run)</summary>

```python
#!/usr/bin/env python3
"""Standalone, dependency-free parse benchmark for tomlkit.

Run it on `main`, then on the optimization branch, and compare the medians:

    python bench_standalone.py            # default: 500 iters x 15 runs
    python bench_standalone.py --iters 1000 --runs 25

Only stdlib + the locally-installed `tomlkit` are used, so the numbers are
trivially reproducible. The sample document below is embedded (no external
files) and exercises the common value kinds: strings, ints/floats, bools,
datetimes, arrays, inline tables, nested tables, AoT and comments.

Methodology: warmup, then `runs` timed batches of `iters` parses each;
we report the median batch time (robust to OS scheduling noise) and stdev.
"""
from __future__ import annotations

import argparse
import statistics
import time

import tomlkit

# Representative ~3 KB TOML covering the value kinds the parser dispatches on.
SAMPLE = """\
# A representative TOML document
title = "tomlkit benchmark"
version = 2
ratio = 3.14159
enabled = true
created = 2024-01-15T08:30:00Z
tags = ["alpha", "beta", "gamma", "delta"]
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
inline = { x = 1, y = 2, z = 3 }

[server]
host = "localhost"
port = 8080
timeout = 30.5
backends = ["10.0.0.1", "10.0.0.2", "10.0.0.3"]

[server.tls]
enabled = true
ciphers = ["TLS_AES_256", "TLS_CHACHA20"]

[database]
url = "postgres://localhost:5432/app"  # inline comment
pool_size = 20
read_only = false

[[products]]
name = "Widget"
sku = 738594937
price = 19.99

[[products]]
name = "Gadget"
sku = 284758393
price = 49.95
in_stock = false

[logging]
level = "info"
format = "%(asctime)s %(message)s"
rotate_mb = 100
"""


def measure(content: str, iters: int) -> float:
    t0 = time.perf_counter()
    for _ in range(iters):
        tomlkit.parse(content)
    return (time.perf_counter() - t0) * 1000.0  # ms


def main() -> None:
    ap = argparse.ArgumentParser(description=__doc__)
    ap.add_argument("--iters", type=int, default=500)
    ap.add_argument("--runs", type=int, default=15)
    ap.add_argument("--warmup", type=int, default=3)
    args = ap.parse_args()

    for _ in range(args.warmup):
        measure(SAMPLE, args.iters)

    samples = sorted(measure(SAMPLE, args.iters) for _ in range(args.runs))
    median = samples[len(samples) // 2]
    stdev = statistics.stdev(samples) if len(samples) > 1 else 0.0

    print(f"tomlkit {tomlkit.__version__ if hasattr(tomlkit, '__version__') else '?'}")
    print(f"{args.iters} parses x {args.runs} runs (warmup x{args.warmup})")
    print(f"  median : {median:8.1f} ms   ({median / args.iters * 1000:.1f} us/parse)")
    print(f"  stdev  : {stdev:8.1f} ms")
    print(f"  min/max: {samples[0]:.1f} / {samples[-1]:.1f} ms")


if __name__ == "__main__":
    main()
```

</details>


#	Theme	Idea	Risk
A	Bulk char scanning	index-based `Source`; `advance_while`/`advance_until` replace per-char `inc()` loops in bare-keys, whitespace, numbers, comments, single-line strings	medium — hot path, but pure scan-equivalence
B	Dispatch micro-wins	`frozenset` membership, precomputed enum values, `is` vs set membership, hoisting loop invariants, binding `Source` delegates	low — mechanical
C	Object interning	module-level `TOMLChar` cache (+ the NUL/EOF fix)	medium — includes the correctness fix above
D	`__deepcopy__`	hand-rolled `Container`/`Trivia` deepcopy to skip the reflective machinery (super-table merge path)	higher — touches copy/alias semantics, wants careful review

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: ~2.5–3× faster parsing, conformance-preserving #483

TL;DR

Why this matters / why I'm being careful

Reproducing the numbers

What the changes do (grouped, so you can pick)

What I've verified — and what I haven't (yet)

Question for the maintainers

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Workload	master	branch	factor
embedded sample (`bench_standalone.py`)	792 ms	317 ms	×2.50
number-heavy ~3 KB doc	~2970 ms	~1045 ms	×2.85
262 distinct `toml-test/valid` files	1747 ms	717 ms	×2.44

Uh oh!

Proposal: ~2.5–3× faster parsing, conformance-preserving #483

Description

TL;DR

Why this matters / why I'm being careful

Reproducing the numbers

What the changes do (grouped, so you can pick)

What I've verified — and what I haven't (yet)

Question for the maintainers

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions