Skip to content

Proposal: ~2.5–3× faster parsing, conformance-preserving #483

@tfoutrein

Description

@tfoutrein

TL;DR

I have a set of parser-only optimizations that make tomlkit.parse() roughly
2.4–2.9× faster (content-dependent) with no behavioural change: the full
test suite passes, including 680/680 of the toml-test conformance corpus,
and ruff + mypy --strict are clean.

Before opening a large PR, I'd like to check your appetite and preferred shape —
the work is naturally splittable into independent, separately-reviewable pieces.

Why this matters / why I'm being careful

tomlkit is a hot dependency of Poetry (and poetry-core), so I treated
correctness as non-negotiable: every change is gated on the conformance suite,
not just round-trip of valid files. (In fact, while validating I caught a subtle
EOF-sentinel collision a naive char-interning optimization can introduce — the
fix is included and is itself covered by toml-test invalid/control/*-null.)

Reproducing the numbers

A dependency-free benchmark is attached (bench_standalone.py): run it on
master, then on the branch, and compare medians. Measured here (macOS,
CPython 3.11):

# master (8694e4d)
$ python bench_standalone.py
  median :  791.6 ms   (1583 us/parse)

# branch
$ python bench_standalone.py
  median :  316.8 ms   ( 634 us/parse)   -> ~2.5x faster

Cross-checked with a drift-immune A/B (both versions timed, interleaved, in one
process) on other workloads:

Workload master branch factor
embedded sample (bench_standalone.py) 792 ms 317 ms ×2.50
number-heavy ~3 KB doc ~2970 ms ~1045 ms ×2.85
262 distinct toml-test/valid files 1747 ms 717 ms ×2.44

What the changes do (grouped, so you can pick)

The core idea: the parser advanced one character per call (inc() ~2.2 M
times on the benchmark). The big wins replace those with bulk scans over the
underlying string. Grouped by theme, independently reviewable:

# Theme Idea Risk
A Bulk char scanning index-based Source; advance_while/advance_until replace per-char inc() loops in bare-keys, whitespace, numbers, comments, single-line strings medium — hot path, but pure scan-equivalence
B Dispatch micro-wins frozenset membership, precomputed enum values, is vs set membership, hoisting loop invariants, binding Source delegates low — mechanical
C Object interning module-level TOMLChar cache (+ the NUL/EOF fix) medium — includes the correctness fix above
D __deepcopy__ hand-rolled Container/Trivia deepcopy to skip the reflective machinery (super-table merge path) higher — touches copy/alias semantics, wants careful review

Themes A + B alone deliver most of the gain at the lowest risk; C and D can be
deferred or dropped.

What I've verified — and what I haven't (yet)

  • ✅ Full pytest tests green incl. toml-test (680/680), matching master.
  • ruff check, ruff format, mypy --strict clean.
  • ✅ Drift-immune A/B benchmark (numbers above).
  • Only CPython 3.11 / macOS so far — not the full CI matrix (3.9–3.14 ×
    3 OS) nor the Poetry/poetry-core integration jobs. I'll run those before any PR
    if you're interested.

Question for the maintainers

  1. Is a parser speedup of this size something you'd consider merging?
  2. If so, do you prefer one PR per theme (A, then B, …) or a single curated
    PR of A+B with C/D as follow-ups?
  3. Any themes you'd rather not take (e.g. the __deepcopy__ rewrite) on
    maintainability grounds?

Happy to adapt to whatever keeps the hot path readable for you.


bench_standalone.py (zero-dependency, copy-paste & run)
#!/usr/bin/env python3
"""Standalone, dependency-free parse benchmark for tomlkit.

Run it on `main`, then on the optimization branch, and compare the medians:

    python bench_standalone.py            # default: 500 iters x 15 runs
    python bench_standalone.py --iters 1000 --runs 25

Only stdlib + the locally-installed `tomlkit` are used, so the numbers are
trivially reproducible. The sample document below is embedded (no external
files) and exercises the common value kinds: strings, ints/floats, bools,
datetimes, arrays, inline tables, nested tables, AoT and comments.

Methodology: warmup, then `runs` timed batches of `iters` parses each;
we report the median batch time (robust to OS scheduling noise) and stdev.
"""
from __future__ import annotations

import argparse
import statistics
import time

import tomlkit

# Representative ~3 KB TOML covering the value kinds the parser dispatches on.
SAMPLE = """\
# A representative TOML document
title = "tomlkit benchmark"
version = 2
ratio = 3.14159
enabled = true
created = 2024-01-15T08:30:00Z
tags = ["alpha", "beta", "gamma", "delta"]
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
inline = { x = 1, y = 2, z = 3 }

[server]
host = "localhost"
port = 8080
timeout = 30.5
backends = ["10.0.0.1", "10.0.0.2", "10.0.0.3"]

[server.tls]
enabled = true
ciphers = ["TLS_AES_256", "TLS_CHACHA20"]

[database]
url = "postgres://localhost:5432/app"  # inline comment
pool_size = 20
read_only = false

[[products]]
name = "Widget"
sku = 738594937
price = 19.99

[[products]]
name = "Gadget"
sku = 284758393
price = 49.95
in_stock = false

[logging]
level = "info"
format = "%(asctime)s %(message)s"
rotate_mb = 100
"""


def measure(content: str, iters: int) -> float:
    t0 = time.perf_counter()
    for _ in range(iters):
        tomlkit.parse(content)
    return (time.perf_counter() - t0) * 1000.0  # ms


def main() -> None:
    ap = argparse.ArgumentParser(description=__doc__)
    ap.add_argument("--iters", type=int, default=500)
    ap.add_argument("--runs", type=int, default=15)
    ap.add_argument("--warmup", type=int, default=3)
    args = ap.parse_args()

    for _ in range(args.warmup):
        measure(SAMPLE, args.iters)

    samples = sorted(measure(SAMPLE, args.iters) for _ in range(args.runs))
    median = samples[len(samples) // 2]
    stdev = statistics.stdev(samples) if len(samples) > 1 else 0.0

    print(f"tomlkit {tomlkit.__version__ if hasattr(tomlkit, '__version__') else '?'}")
    print(f"{args.iters} parses x {args.runs} runs (warmup x{args.warmup})")
    print(f"  median : {median:8.1f} ms   ({median / args.iters * 1000:.1f} us/parse)")
    print(f"  stdev  : {stdev:8.1f} ms")
    print(f"  min/max: {samples[0]:.1f} / {samples[-1]:.1f} ms")


if __name__ == "__main__":
    main()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions