Fast & bounded goto-instrument --generate-function-body#9058
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Improves performance and termination guarantees for goto-instrument --generate-function-body by eliminating an O(N²) symbol suffix scan and adding a hard cap on dynamic object allocations during nondet initialisation.
Changes:
- Add a per-prefix suffix “search start” cache to
symbol_table_baset::next_unused_suffix(prefix)to amortise repeated allocations under one prefix. - Add
max_dynamic_object_instancesparameter and enforce it insymbol_factoryt::gen_nondet_initto prevent runaway nondet-init expansion. - Add a new regression test to ensure deep/wide non-recursive struct hierarchies don’t hang/OOM during havoc body generation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/util/symbol_table_base.h | Adds per-prefix suffix hint cache to avoid O(N²) repeated scans. |
| src/util/object_factory_parameters.h | Introduces max_dynamic_object_instances parameter with detailed rationale. |
| src/util/object_factory_parameters.cpp | Parses/propagates the new CLI option into object factory params. |
| src/ansi-c/c_nondet_symbol_factory.h | Adds allocation counter to enforce the new hard cap. |
| src/ansi-c/c_nondet_symbol_factory.cpp | Enforces allocation cap by nulling pointers once limit is reached. |
| src/ansi-c/ansi_c_language.h | Exposes --max-dynamic-object-instances via frontend option/help text. |
| regression/goto-instrument/generate-function-body-deep-struct-cap/test.desc | Adds regression harness invoking the new cap option. |
| regression/goto-instrument/generate-function-body-deep-struct-cap/main.c | Defines a deep/wide non-recursive struct hierarchy reproducer. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Running count of dynamic objects this factory has allocated so | ||
| /// far. Compared against | ||
| /// `object_factory_params.max_dynamic_object_instances`; when the | ||
| /// cap is hit, pointers are initialised to NULL instead of | ||
| /// recursively expanded. See the parameter's docstring for the | ||
| /// kernel-struct motivation. | ||
| std::size_t dynamic_object_instance_count = 0; |
| protected: | ||
| /// Per-prefix "next search-start" hint consulted by | ||
| /// `next_unused_suffix(prefix)`. Mutable so the hint can be updated | ||
| /// through the const read API. See that function's docstring for | ||
| /// semantics. | ||
| mutable std::unordered_map<std::string, std::size_t> suffix_hint_cache; |
| LIM-008 regression. Without the per-root allocation cap introduced | ||
| in c_object_factory_parameterst, havoc body generation for a | ||
| function returning a pointer to a wide, deep, non-recursive struct |
| // This test wires up a moderate-depth, branching, non-recursive | ||
| // struct hierarchy and asserts that body generation completes | ||
| // quickly (no hang) and that the resulting goto binary is cbmc- | ||
| // verifiable in reasonable time. |
d162044 to
9d5b3c1
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #9058 +/- ##
========================================
Coverage 80.68% 80.68%
========================================
Files 1714 1714
Lines 189501 189518 +17
Branches 73 73
========================================
+ Hits 152902 152916 +14
- Misses 36599 36602 +3 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
We noticed `goto-instrument --generate-function-body` hanging on kernel
goto binaries. Investigation under bounded ulimit/timeout showed two
independent bugs:
1) O(N^2) scan in symbol_table_baset::next_unused_suffix.
Every allocation done by the C object factory calls
get_fresh_aux_symbol -> next_unused_suffix, which historically
restarted its linear scan from 0 each time. With N allocations under
a single prefix (e.g. 'af_alg_alloc_areq::malloc_site$'), total work
is O(N^2). On a real kernel binary this produces thousands of
allocations for a single havoc body, enough for the polynomial to
dominate.
Fix: add a per-prefix 'next search-start' hint cache to
symbol_table_baset. The one-arg next_unused_suffix(prefix) now
consults and updates the hint, making repeated allocation under the
same prefix amortised O(1). The two-arg form (caller-provided start)
is unchanged.
The derived symbol_table_buildert already had an equivalent hint
cache with the same semantic relaxation (next unused, not strictly
smallest). Moving the logic up to the base removes the performance
cliff for every symbol_table_baset subclass, including plain
symbol_tablet — which is the type goto-instrument's
--generate-function-body path uses. The builder's override is left
in place; its cache shadows the base cache and remains correct.
2) Runaway tree size in symbol_factoryt::gen_nondet_init.
The existing max_nondet_tree_depth cap only fires when the SAME
struct tag appears twice on the pointer chain. Kernel struct
hierarchies are typically wide and deep but rarely re-enter the same
type early (a single 'struct sock *' reaches hundreds of
pointer-to-struct fields transitively without cycling back), so the
depth cap never fires and the factory generates an exponentially
large init body.
Fix: add a new c_object_factory_parameterst field
max_dynamic_object_instances (default 1000) and a
dynamic_object_instance_count counter on symbol_factoryt. Before
each dynamic object allocation in the pointer branch, check if the
counter has reached the cap; if so, initialise the pointer to NULL
and return. This is a hard belt-and- braces termination guarantee
independent of the depth cap and of the struct-hierarchy topology.
Exposed as `--max-dynamic-object-instances N` via the
OPT_ANSI_C_LANGUAGE front-end option set (goto-instrument, cbmc,
goto-analyzer, etc. all pick it up). Parsed by
object_factory_parameterst::set() the same way as
max_nondet_tree_depth.
Measured impact on the reproducer (goto-instrument
--generate-function-body af_alg_alloc_areq
--generate-function-body-options havoc on Linux 5.10's
crypto/algif_aead.c goto binary):
before this commit: hangs indefinitely (still running after
4 hours of wall-clock without the ulimit).
after this commit: 2 seconds, 10 MB output binary.
Regression test: regression/goto-instrument/
generate-function-body-deep-struct-cap/ wires up a wide, deep,
non-recursive struct hierarchy whose pre-fix havoc-body generation
exhibited the exponential blowup, uses --max-dynamic-object-instances 50
to keep the test fast, and asserts EXIT=0 (i.e. goto-instrument did not
hang or abort).
Validated: all 15 ctest CORE labels under goto-harness / goto-instrument
/ goto-cc / contracts / symbol-table pass. All seven integration/linux
regressions remain green.
Co-authored-by: Kiro <kiro-agent@users.noreply.github.com>
9d5b3c1 to
ea8cb8a
Compare
LIM-008 in integration/linux/CBMC_LIMITATIONS.md documented
goto-instrument --generate-function-bodyhanging on kernel goto binaries. Investigation under bounded ulimit/timeout showed two independent bugs:O(N^2) scan in symbol_table_baset::next_unused_suffix.
Every allocation done by the C object factory calls
get_fresh_aux_symbol -> next_unused_suffix, which historically
restarted its linear scan from 0 each time. With N allocations
under a single prefix (e.g. 'af_alg_alloc_areq::malloc_site$'),
total work is O(N^2). On a real kernel binary this produces
thousands of allocations for a single havoc body, enough for
the polynomial to dominate.
Fix: add a per-prefix 'next search-start' hint cache to
symbol_table_baset. The one-arg next_unused_suffix(prefix)
now consults and updates the hint, making repeated allocation
under the same prefix amortised O(1). The two-arg form
(caller-provided start) is unchanged.
The derived symbol_table_buildert already had an equivalent
hint cache with the same semantic relaxation (next unused,
not strictly smallest). Moving the logic up to the base
removes the performance cliff for every symbol_table_baset
subclass, including plain symbol_tablet — which is the type
goto-instrument's --generate-function-body path uses. The
builder's override is left in place; its cache shadows the
base cache and remains correct.
Runaway tree size in symbol_factoryt::gen_nondet_init.
The existing max_nondet_tree_depth cap only fires when the
SAME struct tag appears twice on the pointer chain. Kernel
struct hierarchies are typically wide and deep but rarely
re-enter the same type early (a single 'struct sock *' reaches
hundreds of pointer-to-struct fields transitively without
cycling back), so the depth cap never fires and the factory
generates an exponentially large init body.
Fix: add a new c_object_factory_parameterst field
max_dynamic_object_instances (default 1000) and a
dynamic_object_instance_count counter on symbol_factoryt.
Before each dynamic object allocation in the pointer branch,
check if the counter has reached the cap; if so, initialise
the pointer to NULL and return. This is a hard belt-and-
braces termination guarantee independent of the depth cap
and of the struct-hierarchy topology.
Exposed as
--max-dynamic-object-instances Nvia theOPT_ANSI_C_LANGUAGE front-end option set (goto-instrument,
cbmc, goto-analyzer, etc. all pick it up). Parsed by
object_factory_parameterst::set() the same way as
max_nondet_tree_depth.
Measured impact on the LIM-008 reproducer
(goto-instrument --generate-function-body af_alg_alloc_areq --generate-function-body-options havoc on Linux 5.10's crypto/algif_aead.c goto binary):
before this commit: hangs indefinitely (still running after
4 hours of wall-clock without the ulimit).
after this commit: 2 seconds, 10 MB output binary.
Regression test: regression/goto-instrument/
generate-function-body-deep-struct-cap/ wires up a wide, deep, non-recursive struct hierarchy whose pre-fix havoc-body generation exhibited the exponential blowup, uses
--max-dynamic-object-instances 50 to keep the test fast, and asserts EXIT=0 (i.e. goto-instrument did not hang or abort).
Validated: all 15 ctest CORE labels under goto-harness / goto-instrument / goto-cc / contracts / symbol-table pass. All seven integration/linux regressions remain green.