gh-94345: Fix dataclasses.asdict()/astuple() crash on circular references#151904
gh-94345: Fix dataclasses.asdict()/astuple() crash on circular references#151904iamsharduld wants to merge 2 commits into
Conversation
…references
asdict() and astuple() recursed into nested dataclasses and containers
without tracking which objects were already being processed, so a circular
reference recursed until a RecursionError (or crashed the interpreter on a
release build). Track the objects on the current recursion path and raise
ValueError("Circular reference detected") -- matching json.dumps() -- when
one is revisited.
Documentation build overview
|
| @@ -0,0 +1,4 @@ | |||
| :func:`dataclasses.asdict` and :func:`dataclasses.astuple` now raise | |||
| :exc:`ValueError` when the dataclass instance contains a circular reference, | |||
| instead of recursing until a :exc:`RecursionError` (or crashing on a release | |||
There was a problem hiding this comment.
Is there a crash on release builds (e.g. segmentation fault) or a python exception?
There was a problem hiding this comment.
It's a Python exception, not a crash. On a release build (current main, clang) a self-referential dataclass raises RecursionError and the process survives — the recursion limit stops the C stack from overflowing, so there's no segfault. The original report shows the same RecursionError.
I've reworded the NEWS entry to drop "(or crashing on a release build)"; it now just says "instead of recursing until a RecursionError".
| reference. | ||
|
|
||
| .. versionchanged:: next | ||
| A circular reference now raises :exc:`ValueError` instead of |
There was a problem hiding this comment.
The PR adds overhead to the conversion to dict or tuple. Can you benchmark this?
Adding overhead just to change the type of error message might be ok, but I would like to have some quantative numbers
There was a problem hiding this comment.
Benchmarked on a release build (best of 9, over a ~1093-node nested structure of dataclasses + lists + dicts):
asdict: 15.5 ms → 19.3 ms (+24%)astuple: 21.4 ms → 25.1 ms (+17%)
For typical shallow dataclasses the absolute cost is negligible; the percentage only shows up on deeply-nested, container-heavy data. I tried gating the cycle-tracking behind a recursion-depth threshold to make the common case cheaper, but it gave no measurable improvement — the cost is inherent to tracking object identities through the recursion, not the set operations specifically — so I kept the straightforward implementation.
On whether it's worth it beyond the error type: this also brings asdict/astuple in line with json.dumps, which detects circular references by default and raises the same ValueError("Circular reference detected"). Since asdict() output is so often passed straight to json.dumps, a cyclic structure currently dies with a deep RecursionError rather than the clear error json would give; copy.deepcopy likewise tracks identities on every call. If the default overhead is still a concern, I'd be happy to mirror json's check_circular= opt-out.
asdict()/astuple() on a circular reference raise RecursionError on a release build, not a segfault; drop the inaccurate crash claim.
asdict()andastuple()recurse into nested dataclasses and built-in containers without tracking which objects are already being processed, so a circular reference recurses until aRecursionError(or crashes the interpreter on a release build).This tracks the objects on the current recursion path and raises
ValueError("Circular reference detected")— matchingjson.dumps()— when one is revisited. Objects referenced more than once without forming a cycle (a DAG) are still converted normally.